Finding and Deleting Duplicates in Excel

Sometimes when working with large excel spreadsheets, you run into the problem of duplicates, and it helps to be able to find them and delete them without having to manually go through the spreadsheet. If you have two or more columns of data, and you need to find and delete the duplicates, this is one way to do it. Say you have all your data in columns A and B:

  1. Create a column that concatenates the information in A and B. You can either use the “CONCATENATE” function or use the ampersand (&) by entering the formula A1&B1 to cell C1. Remember, as with all formulae, you have to use the = sign before the formula, thus =A1&B1.
  2. Copy and paste the formula down the column to the end of your data. For example, if your data runs from rows 1 to 120, copy and paste your data down to cell C120. An easy way to do this is, in Windows, select from the cell where your formula is to the last cell, and press Ctrl+D, or Edit -> Fill Down. For a mac, I’m not sure there’s a short-cut, except using Edit -> Fill Down.
  3. The next step is to find and mark the duplicate entries. Go to column D, and in the first cell enter the formula enter the formula =IF(COUNTIF($C$1:C1,C1)>1,”Duplicate”,”Unique”
  4. Copy and paste this formula down the column to the end of your data using the same process as in step 2.
  5. This column now shows you what rows have duplicates and which have uniques. You now want to delete, or do something else to just the duplicates, so you want to sort your spreadsheet by duplicates and uniques. You can’t do this with your spreadsheet as it is now because of the formula, so copy all the data to a new worksheet, but you only want to paste values, not the formulas, so that you can sort them by column C. Use Edit -> Paste Special -> Values only.
  6. Sort your spreadsheet using Column C, so that you all the duplicates are at the end of your spreadsheet, and once you delete them, or do whatever you want to with them, you can then delete column C.

This leaves you with only unique entries in your worksheet. You can extend this method to multiple columns for more complex spreadsheets and it should work fine.

Happy Coding!

From csv to sql – in brief

In my last few posts I explained in a four-part tutorial how to convert a csv file to a sql file. I am going to summarize the steps in this post so that you have them all in one reference post. The tools that I prefer for this process are PHPMyAdmin and HeidiSQL, and of course your csv file. The assumption is that you have Apache, PHP, and MySQL all running on your machine, and you can log in as root admin.

  1. Create a mySQL database and then create a table that corresponds to your csv file. Make sure that the fields correspond, and I recommend that you create an autoincrement unique ID field as your primary field, even if it doesn’t exist in your csv file. Read more on Creating the Database.
  2. Using HeidiSQL, import your csv file into the database table that you have created, making sure to deselect the ID field if it doesn’t exist in your csv file. Read more on Importing the csv file.
  3. Look through your newly populated database table to ensure that the data has all been imported correctly and that you have the correct number of entries. The best tool for this is PHPMyAdmin.
  4. Finally, create the sql dump file using the export to sql command in either PHPMyAdmin or HeidiSQL. Make sure to select the correct exporting options depending on whether your sql dump file will create the database and populate it or will be updating an already existing database. Read more on Creating the sql dump file.

I hope that you have found the series of tutorials helpful, and that you are more comfortable working with all the database tools that I have been talking about.

Happy Coding!

Convert csv to sql – Creating the sql file

This is the last part in a series of tutorials on how to create a sql file from a csv file. First, we determined what was needed for the process, then we created the MySQL database and tables, and finally we imported the csv file into the MySQL database.

In this last step, we are going to generate a sql file from our database that we can then export to another user, or use for any other purpose that we need a sql dump file for. Sql dump files are a common format used in selling or buying databases and database products.

To create the sql file, we can use either PHPMyAdmin or HeidiSQL. Both work equally well, and I will describe the process for each:

Creating a sql file using PHPMyAdmin

  1. Fire up PHPMyAdmin, log in as root, and select the database and table that you’re working with. In our case this would be the database “streetsuffix” and the table “suffix”.
  2. Click on “Export” in the menu, and make sure that SQL is the selected format in the file that opens.
  3. The thing that is important now is to select the correct SQL options. I am assuming that this is the first time you are creating a sql dump file for this database, and so you want the query to create and populate the database on the other end. So you would make sure Structure was selected, and then select the options that you want. If you want the query to delete any existing tables by the same name and create a new table, then select “Add DROP TABLE”. If you want the query to check if the table exists and add it if not, then select “Add IF NOT EXISTS”. When you look through the generated sql file this will make more sense.
  4. The maximal length of the created query can be left at the default value unless you have an extremely large database, in which case you may need to increase it, or if you have size restrictions you may want to reduce it.
  5. The Export type is determined by what your query is going to do. In our case, we want the query to insert. If you were issuing an update to a database then you would want to select update, and so on.
  6. Make sure that you have the right number of rows for your database selected under Dump.
  7. Finally, click Go. PHPMyAdmin will generate the query and open it in a new page.
  8. You can then copy this query and paste it in Notepad++ or your favorite editor and save it as yourfilename.sql, in our case streetsuffix.sql.
  9. You now have a sql dump file!

Creating a sql file using HeidiSQL

  1. Fire up HeidiSQL and log in as root, select the database and the table you want to work with.
  2. On the main menu, select Export -> Export tables as sql
  3. A dialog will open that has two tabs: Source and Destination.
  4. In the Source tab, select the table you want to export, in our case “suffix”.
  5. In the Destination tab, there are two sides: The Options, and the Output.
  6. Under Options, you will need to select what you want your query to do, similar to what we did in PHPMyAdmin. Do you want your query to create the database and table, replace them if they exist, or delete them and start afresh? Do you want your query to insert data, or replace/update data in the existing table? These are things that are dictated by your application and usage of your database. For this tutorial, it’s a fresh install so we want to create database and table and insert data.
  7. On the Output side, select the folder where you want to save the file and what you want to call it. Unlike PHPMyAdmin, HeidiSQL creates and saves the file for you.
  8. You can also use HeidiSQL to output the data to another database 0n your host or on another database on another host.
  9. You now have a sql dump file. You can examine this file using Notepad++ or your favorite editor.

You probably noticed that there are other options in PHPMyAdmin and HeidiSQL that I did not touch on. This is because this is intended to be a newbie/basic tutorial, so I didn’t want to go into too much detail on all the possible options.

This completes our four-part tutorial on converting a csv file to a sql dump file without any coding or scripting knowledge, and without spending a dime. I hope you find this helpful, and if you do, please leave me a comment. If you notice any errors, or want to add something, or know a better way to do it, please let us know also by leaving a comment.

All questions, suggestions, comments, and corrections are welcome!

Convert csv to sql – Importing the csv file

We are going step by step through the process of converting a csv file to a sql file. In the first part, we discussed what tools we need to accomplish this task, as well as where to find a sample csv file to play with. We then covered creating the sql database in the second part.

In this third section, I am going to explain how to import your csv data into the MySQL database that you created, and to make sure that the data is all in the database and looks like you expect it to.

From the last tutorial, you now have in your server a database named “streetsuffix”, and that database contains a table named “suffix” or whatever you decided to name it. Your table has fields corresponding to your csv file, with the right properties set to them as discussed in the last tutorial.

Now we want to import our data from the US Street Suffixes csv file to the streetsuffix database. This, for me, is where HeidiSQL does a beautiful job.

  1. Fire up HeidiSQL and log into your server with root access.
  2. Select the database “streetsuffix” and open the table “suffix”.
  3. On the main menu, click on Import -> Import CSV file.
  4. A dialog opens up that allows you to select the file you want to import from (Filename), and the specify delimiters and all that other good stuff on the left side. On the right side of that dialog box you should the name of the database you’re importing the data into, the table, and all the fields that you created. Check that this data is correct before proceeding.
  5. Recall that last time I mentioned that in some instances your csv file does not have an ID field, but that it was a good idea to create one and set it to auto-increment? Well in this case our csv file does have one, so we can leave the ID field checked. If our file did not have one, we would uncheck the ID field, so that MySQL would populate that field instead of looking at our csv file for an ID column.
  6. Once you’ve made sure that everything on the left side is correct, look at the right side. The first thing is to select the csv file from your computer so that it appears under filename.
  7. Then set your delimiters. In our case, our fields are separated by commas (,) so we need to enter that into the “fields terminated by” field. Click on the check box to activate the field.
  8. The “enclosed by” field is useful when your fields have long sentences or other data that are enclosed in “”, or ”, so if these are present, select this now and enter what the enclosing marks are. You can tell this by looking at your file. I find that it helps to check optionally because some entries might be and others might not. In this case, do select it just for practice.
  9. As you deal with more complex files you will see the use of the other options. The best option is usually to have your csv file open in Notepad++ so that you can see all delimiters and markings clearly as you make your selections.
  10. Once you’re done, hit Import, and let HeidiSQL do it’s thing!!

You now want to look at your database to make sure all the data was imported and looks like you envisioned it. You can do this in either HeidiSQL or PHPMyAdmin.

  1. In HeidiSQL, while you have your table selected, click on the tab right above the fields that’s labeled “Data”, this should allow you to see your data. Sometimes in HeidiSQL you don’t see the actual information, but you can still verify the import by clicking on the database (or on the Database tab to see the total number of entries and confirming that they correspond to your entries.
  2. PHPMyAdmin is actually what I prefer for this verification part. In your browser, navigate to localhost and open up your PHPMyAdmin. Log in with root access and select your database “streetsuffix”. On the left side, select the table “suffix” and on the main part of the browser, select Browse. Your database should open up showing you all the fields and the entries that you just imported. I recommend that you keep PHPMyAdmin open as we are now going to go to the final step – creating the sql dump file!

The major part of the work is done, we now have everything we need for the final part, creating the sql dump file, which I will discuss shortly in the last post of this series.

Remember if you have any problems so far, any comments, suggestions, or corrections, you can leave a comment here. I welcome all comments and will respond to them as fast as I can.

Happy Coding!

Converting csv to sql – Creating the Database

Continuing on from my previous post, we are now going to start the conversion process. The first step is to create the database and the table that we need. You can do this using PHPMyAdmin, but I generally prefer to use HeidiSQL. The process is basically the same in each case, but I find HeidiSQL to have a better interface.

I am working with the database that I referred to in my last post, the US Street Suffixes database.

  1. The first step is to fire up HeidiSQL, and log into your server. You get an interface that has all your databases listed and all that good stuff.
  2. Next, we want to create the database that we will store our data in. Click on Tools -> Create Database. Let’s call this database streetsuffix. HeidiSQL immediately creates and selects the database. You can tell what database is selected by checking the left panel to see which little cylinder is yellow.
  3. The next step is to create the table that will hold our data. Again, Tools -> Create Table. Let’s name our table suffix. You can also name it streetsuffix if you want. Because this is a simple project, we are only creating one table, but you may have more than one in some instances. At the table creation stage in HeidiSQL you can go ahead and create all your fields and set their properties right there. Or you can create the table, add at least one field, and then go back and create, edit, delete, rename fields.
  4. To create the fields, you need to know what you want to name each field, what kind of data it will hold, and what fields you want to use as indexed. This information requires that you know a lot more about databases than I want to cover in this post, and you can learn a lot about this by looking at the Heidi Help file and using google. For now, we are going to go ahead and create our fields for the US Street Suffix file. Open the csv file in Excel so that it’s easier to see the field names and the data.
  • Create a field named ID, set it as Primary, NOT NULL, and auto-increment. Its data type should be INT with a size of 3. These parameters will vary depending on your data, and you’ll learn to play with them as you go. It’s always advisable to have a field like this as the primary field. In our case, the csv file already has an ID field and so we will import it directly into this field. In some cases the csv file does not have an ID column. You would still create one in Heidi SQL with auto-increment turned on, and then deselect it during data import. Don’t worry about this for now, it will make more sense in the next post.
  • Create the other fields in similar fashion, selecting the appropriate data type (mostly text), except that none of the rest should be set as primary, nor should they have auto-increment selected. Remember that the names should not have any spaces or special characters, and you should match the field names in your csv file to your table field names.
  • Once this is done, you’ve created your database and its table. Remember if you forget a field, or mislabel it, or need to change it’s properties, you can.

After creating the table, go back to HeidiSQL and look at the table structure to make sure it matches what you entered. You can also do this using PHPMyAdmin. Check all your fields thoroughly, because once your data is imported you really don’t want to mess with your fields unless you know what you’re doing.

You now have your database ready, and your table in it. Remember that you can do the same thing we have done above using PHPMyAdmin, and you can experiment with both in creating databases and tables until you find the one you prefer.

Next, we are going to import our data from the csv file into the database that we just created.

A little more reading for you:
http://onlinetutorials.net/tutorials/SQL/mysql_for_beginners.shtml
http://www.phpmyadmin.net/documentation/#faqusing
http://www.php-editors.com/articles/sql_phpmyadmin.php

Happy Coding!