How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

In this tutorial, we will show how to use Table.Distinct to remove duplicate data using Power Query M language.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)
Image by Editor
 

Power Query provides a powerful function called Table.Distinct to easily remove duplicate records from your dataset. In this tutorial, we will show how to use Table.Distinct to remove duplicate data using Power Query M language.

What is Table.Distinct?

Table.Distinct removes duplicate rows from a table, keeping unique records. It can remove duplicates based on all columns or just specific columns.

Syntax:

Table.Distinct(table as table, optional equationCriteria as any) as table

Where:

  • table: The input table to remove duplicates from.
  • comparisonCriteria [Optional]: It specifies which columns to use for comparison.

To show the use of Table.Distnict function, we will use  sales data that contains multiple duplicate data. We loaded the data from Excel to Power Query.

Sales Data:

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Basic Use: Remove Complete Duplicates

Let’s assume a sales data set that has some duplicate data. You can use Table.Distinct function to remove the duplicates from an entire dataset.

Power Query M Code:

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   UniqueRows = Table.Distinct(Source)
in
   UniqueRows

All identical rows are removed, preserving unique records only.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Remove Duplicates Based on Specific Columns

There are cases when duplicates are defined by one or more specific columns rather than entire rows. Removing duplicates based on the Region column.

Power Query M Code:

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   UniqueRows = Table.Distinct(Source, {"Region"})
in
   UniqueRows

Only the first occurrence of each unique Region is retained.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Use Multiple Columns As Criteria

For more granular control, specify multiple columns to determine which rows are considered duplicates. Removing duplicates based on Product and Salesperson column.

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   UniqueRows = Table.Distinct(Source, {"Product", "Salesperson"})
in
   UniqueRows

This retains rows unique across both columns.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Use Comparison Criteria (Case Sensitivity)

By default, Table.Distinct is case-sensitive. To handle duplicates in a case-insensitive manner, you can use the Comparer.OrdinalIgnoreCase function. Let’s remove case-insensitive duplicates from the Product column.

let
   Source = SalesData,
   UniqueRows = Table.Distinct(Source, {"Product", Comparer.OrdinalIgnoreCase})
in
   UniqueRows

This considers “Product A”, “product A”, “Product a” as duplicates.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Advanced Techniques:

Create Custom Comparers

Advanced users can specify custom functions as the criteria for Table.Distinct. This allows for more complex logic. Let’s create a table in Power Query to create a custom comparer.

let
Source = #table(
{"OrderID", "CustomerID", "Product", "Quantity"},
{
{1001, "C001", "Keyboard", 5},
{1002, "C-001", "Mouse", 10},
{1003, "C_002", "Monitor", 2}
}
),

// Add a normalized column by removing '-' and '_'
AddNormalizedID = Table.AddColumn(
Source,
"NormalizedCustomerID",
each Text.Remove([CustomerID], {"-", "_"}),
type text

),

// Remove duplicates based on normalized Customer IDs
DistinctCustomers = Table.Distinct(
AddNormalizedID,
{"NormalizedCustomerID"}
),

// Optionally remove the helper column
Result = Table.RemoveColumns(
DistinctCustomers,
{"NormalizedCustomerID"}
)

in
Result

The row with C-001 is removed because it duplicates C001 after normalization.

How to Use Table.Distinct to Remove Duplicate Data in Power Query (M Language)

Use Table.DistinctRows for Legacy Support

In older versions of Power Query, you might see Table.DistinctRows instead.

Table.DistinctRows(YourTable, {"Column1", "Column2"})

This function is similar but has a slightly different syntax where you specify columns directly as the second parameter.

Tips for Working with Table.Distinct

  • Performance: For large datasets, consider using Table.Distinct only on necessary columns to improve performance.
  • Debugging: If your results aren’t as expected, use Table.View to inspect the data before and after applying Table.Distinct.
  • Data Cleaning: Consider cleaning your data (standardizing case, trimming whitespace) before using Table.Distinct for better results.
  • Column Selection: Only include columns that matter for your duplicate identification to improve results.

Conclusion

Table.Distinct is an essential function in the Power Query M language for data cleansing and preparation. By understanding its parameters and options, you can effectively remove duplicate data based on your requirements. Always ensure the columns specified in Table.Distinct accurately represents your intended duplicate criteria. Preview results in Power Query are used to verify that only intended duplicates have been removed.

Get FREE Advanced Excel Exercises with Solutions!

Shamima Sultana
Shamima Sultana

Shamima Sultana, BSc, Computer Science and Engineering, East West University, Bangladesh, has been working with the ExcelDemy project for 3+ years. She has written and reviewed 1000+ articles for ExcelDemy. She has also led several teams with Excel VBA and Content Development works. Currently, she is working as the Project Manager and oversees the day-to-day work, leads the services team, allocates resources to the right area, etc. Her work and learning interests vary from Microsoft Office Suites, and... Read Full Bio

We will be happy to hear your thoughts

Leave a reply

Advanced Excel Exercises with Solutions PDF

 

 

ExcelDemy
Logo