Removing duplicates is a very important part of any kind of data analysis. And your duplicate occurring can be quite different from others. So, knowing how to remove duplicates in excel is essential and it is an easy process also.
In this article, I will discuss how to use Excel to remove duplicate rows based on two columns. I will show several methods so that you can choose the best one for you.
1) Remove duplicate rows based on two columns (Using ‘Remove Duplicates’ feature)
Check out the following data. Two columns are holding some names and their locations.
What I want is: only unique rows will be on the list; all the duplicate rows will be deleted/removed from the list.
Follow these steps.
1) Select a cell in the range. Open the Data tab. And click on the Remove Duplicates command in the Data Tools ribbon.
2) Remove Duplicates dialog box appears. Our data has headers, so ‘My data has headers’ is a correctly checked mark. We want to remove duplicates based on both columns, so all the columns are checked mark, it is also alright. Click on the OK button.
This is the result we get:
7 duplicate values found and removed; 6 unique values remain. Cool, right?
Now, what if you want to keep your old data and get the results (data without duplicate rows) in a new place?
There is also a way. Use the Advanced Filter feature of Excel. See this method in the next section.
2) Using Advanced Filter feature to remove duplicate rows (in a new location)
We shall apply the same data again for the Advanced Filter feature.
1) Select a cell in the range => Data tab => Sort & Filter ribbon => click on the Advanced command button.
2) ‘Advanced Filter’ dialog box appears. You’re seeing that our data range $A$1: $B$14 is already input in the ‘List range’ field.
We want to show only the Unique values in a new location, so I select this checkmark ‘Unique records only’ and I choose ‘Copy to another location’ radio button.
In the ‘Copy to’ field, I input a cell reference ($D$1) from the same worksheet (you can also input a cell from another worksheet).
Everything is done. I just click OK. You see, 6 unique rows are placed in cell reference D1.
3) Removing duplicates based on some specific columns of a range
This is a little bit different.
You have some columns and you want that finding and removing duplicates will happen based on some specific columns, not based on every column in the range.
In the following image, you’re seeing some dummy data. And I want to remove duplicate rows based on columns A, B, and C. Column D will not affect this filtering.
For example, row 5, row 9, and row 13 are duplicates though their Sales column has different values ($5200, $6000, $7000).
Follow these steps to remove these types of duplicates.
1) Select a cell in the range => Data tab => Data Tools ribbon => click on the Remove Duplicates command button.
2) ‘Remove Duplicates’ dialog box appears. All the columns are by default selected. But we want to exclude our ‘Sales’ column from this criterion. So, I just unselect the checkmark on the left side of the ‘Sales’.
3) Setup is done. I just click OK. And this is the result we get. 4 duplicate values found and removed; 9 unique values remain in the range.
4) Removing duplicates using Excel formula
This is an interesting way if you like to do things with Excel formulas.
We shall use Excel COUNTIFS () function. Before working with the main formula, let’s take a look at COUNTIFS () function’s syntax and uses.
COUNTIFS () function returns the total number occurring from multiple ranges based on multiple criteria. If you want to know more about this function, check out this page.
Let’s now remove duplicates using the formula.
Going to apply the formula on the same data set as we used in our first example.
Under column C, we add a new header as ‘Find Duplicates’ and use the following formula in cell C2:
=IF(COUNTIFS<span style="color: #ff6600;">(</span><span style="color: #0000ff;">$A$2:A2</span>, <span style="color: #0000ff;">A2</span>, <span style="color: #ff0000;">$B$2:B2</span>, <span style="color: #ff0000;">B2</span><span style="color: #ff6600;">)</span>>1, "I'm duplicate. Delete me", "Original")
I press Enter and drag down the formula for all the cells in the column.
So, we get only 6 original values and 7 duplicate values.
Now you can delete these duplicate values manually or converting this range into an Excel table.
I am leaving the manual thing on you.
Here is the table-making way.
Just select a cell in the range and convert the range into a table.
Filter this table for showing only the duplicate values.
Now delete these rows.
Get back your Original values.
Here is the result.
6 original values.
Download Working File
So, these are my ways of deleting duplicate rows based on two columns. I have excluded the VBA part intentionally (to make it easy for all).
If you know of any other better technique to remove duplicate rows from two columns, please let me know in the comment box.
Happy excelling 🙂