Finding and Deleting Duplicates in Excel
Sometimes when working with large excel spreadsheets, you run into the problem of duplicates, and it helps to be able to find them and delete them without having to manually go through the spreadsheet. If you have two or more columns of data, and you need to find and delete the duplicates, this is one way to do it. Say you have all your data in columns A and B:
- Create a column that concatenates the information in A and B. You can either use the “CONCATENATE” function or use the ampersand (&) by entering the formula A1&B1 to cell C1. Remember, as with all formulae, you have to use the = sign before the formula, thus =A1&B1.
- Copy and paste the formula down the column to the end of your data. For example, if your data runs from rows 1 to 120, copy and paste your data down to cell C120. An easy way to do this is, in Windows, select from the cell where your formula is to the last cell, and press Ctrl+D, or Edit -> Fill Down. For a mac, I’m not sure there’s a short-cut, except using Edit -> Fill Down.
- The next step is to find and mark the duplicate entries. Go to column D, and in the first cell enter the formula enter the formula =IF(COUNTIF($C$1:C1,C1)>1,”Duplicate”,”Unique”
- Copy and paste this formula down the column to the end of your data using the same process as in step 2.
- This column now shows you what rows have duplicates and which have uniques. You now want to delete, or do something else to just the duplicates, so you want to sort your spreadsheet by duplicates and uniques. You can’t do this with your spreadsheet as it is now because of the formula, so copy all the data to a new worksheet, but you only want to paste values, not the formulas, so that you can sort them by column C. Use Edit -> Paste Special -> Values only.
- Sort your spreadsheet using Column C, so that you all the duplicates are at the end of your spreadsheet, and once you delete them, or do whatever you want to with them, you can then delete column C.
This leaves you with only unique entries in your worksheet. You can extend this method to multiple columns for more complex spreadsheets and it should work fine.