Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

In this tutorial, we will show how data scientists can use Power Query to reshape, merge, and clean at scale.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

 

Data preparation consumes a crucial amount of a data scientist’s time. As they frequently work with datasets from multiple sources, data can be messy, wide, inconsistent, unstructured, etc. Power Query eliminates coding from ETL operations through its visual interface, reducing data preparation time from hours to minutes while handling datasets from thousands to millions of rows.

In this tutorial, we will show how data scientists can use Power Query to reshape, merge, and clean at scale.

Why Power Query for Data Scientists?

  • No-Code ETL: Drag-and-drop operations with visual previews.
  • Efficient Engine: Handles large datasets with ease using lazy evaluation.
  • Seamless Integration: Works inside Excel and Power BI, exportable to other tools.
  • Reproducible Steps: Each action is recorded as a step for easy audit and automation.

Let’s use a sample dataset to show the use of Power Query.

  • Customer Info: Contains customer details.
  • Monthly Transactions: Purchase amounts per month.
  • Support Tickets: Customer-reported issues.

Load the Data into Power Query

In Excel:

  • Go to the Data tab >> select Get Data >> select From Workbook or From CSV.
  • Load each of the three data sets into Power Query.
  • Click Transform Data to open the Power Query Editor.

In Power BI:

  • Go to Home tab >> select Get Data >> select Excel/CSV.
  • Load each of the datasets.
  • Select Transform Data after loading.

1. Clean at Scale: Data Quality Operations

Power Query offers multiple tools to clean large datasets efficiently.

1.1. Remove Rows

Remove Duplicates:

  • Select one or more columns.
  • Go to the Home tab >> select Remove Rows >> select Remove Duplicates.

Remove Blank Rows:

  • Select one or more columns.
  • Go to the Home tab >> select Remove Rows >> select Remove Blank Rows.

Handle Nulls and Errors:

  • Use filters to exclude null values or select Remove Errors.
  • Filter Out Nulls: Click the dropdown on the column header >> Uncheck (null).

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Replace Values:

  • Select one or more columns.
  • Go to the Transform tab >> select Replace Values.
    • Value to Find: null
    • Replace With: 0
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

1.2 Fix Data Types

  • Ensure data types are correct: date, decimal, text.
  • You can click the icon next to the column name (shows “ABC”) to change the data type.
  • Right-click >> select Change Date Type >> select Date Types.
    • JoinDate, TicketDate: Date.
    • Customer Name, Product, Category, Region, etc: Text.
    • Quantity: Whole Number.
    • Amount: Decimal Number.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

1.3 Trim & Clean Text

  • Go to the Transform tab >> select Format >>  select Trim (remove leading/trailing spaces).
  • Select Clean to remove non-printable characters.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

2. Reshape: Structure Data for Modeling

Unpivoting is useful when you need to transform columns into rows for analysis, such as monthly sales in separate columns.

2.1 Unpivoting Monthly Transactions

The Monthly Transactions dataset is in wide format, with columns Jan_2023, Feb_2023, and Mar_2023. This format is difficult to work with when analyzing trends or building time-based models.

Steps to Unpivot:

  • Select columns Jan_2023, Feb_2023, Mar_2023.
  • Go to the Transform tab >> choose Unpivot Columns.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • Go to the Transform tab >> select Rename.
    • Attribute → Month.
    • Value → Amount.
  • Confirm correct data types: CustomerID as Whole Number, Month as Text, Amount as Decimal.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Each row now represents a single transaction month per customer. It is ideal for filtering, plotting, or summarizing by time.

2.2 Pivoting Support Tickets

The Support Tickets dataset contains CustomerID, Issue, and TicketID, with multiple rows per customer with different issue types.

To make each issue a column:

Steps to Pivot:

  • Select the Issue column.
  • Go to the Transform tab >> choose Pivot Columns.
    • Use TicketD as the Value column.
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • Select Issue columns.
  • Go to the Transform tab >> select Replace Values.
    • Value to Find: null
    • Replace With: 0
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

You now have a wide format where each column (e.g., Login Error, Billing Issue) shows how often each customer reported that issue.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

3. Merge: Combine Data Sources

Once each dataset is cleaned and reshaped, merge them into a unified table for analysis.

Merge Customer Data with Transaction:

  • Select Customers query.
  • Go to the Home tab >> select Merge Queries.
  • Top table: Customers, Bottom table: Transactions.
  • Select CustomerID in both tables (highlighted in green).
  • Join Kind: Left Outer (keeps all sales records).
  • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • Expand merged columns to include Month, Amount.
    • Click the Expand icon >> select Month, Amount.
    • Uncheck “Use original column name as prefix”.
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • Merged the Month and Amount from the Transactions query table to the Customers table.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Merge Result with the Pivoted Support Tickets:

  • Go to the Home tab >> select Merge Queries.
  • Top table: Customers, Bottom table: Support Tickets.
  • Select CustomerID in both tables.
  • Join Kind: Left Outer (keeps all sales records).
  • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • Click the expand button on the Support Tickets column.
    • Select the following columns.
    • Uncheck “Use original column name as prefix”.
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

A Full-Featured Table with:

  • Demographics (from Customer Info).
  • Monthly purchase behavior (from Transactions).
  • Support load by issue type (from Support Tickets).

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

4. Transform and Enrich: Advanced Data Operations

Use Power Query to derive insightful features for analysis or modeling. You can use merged sales and customer data, create advanced analytics columns.

Purchase Frequency Score:

  • Select the Customers query table.
  • Go to the Home tab >> select Group By.
    • Group by: CustomerID
    • New column: OrderCount, Operation: Count Rows.
    • Click OK.
  • This shows purchase frequency per customer.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Total Spend by Customer:

  • Select the Transactions query table.
  • Go to the Home tab >> select Group By.
    • Group by: CustomerID
    • New column: TotalSpend, Operation: Sum, Column: Amount.
    • Click OK.
  • This shows total spend per customer.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Advanced Conditional Logic:

Flag customers with total spend >= $1000.

  • Go to the Add Column tab >> select Conditional Column.
    • New column name: HighSpender.
    • Column Name: TotalSpend, Operator: is greater than or equal, Value: 1000, Output: Yes.
    • Else: No.
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

  • This shows Yes where totalspend is greater than or equal to 1000, otherwise No.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Add Custom Column:

Let’s calculate the engagement score.

  • Go to the Add Column tab >> select Custom Column.
    • New column name: EngagementScore.
    • In the Custom column formula, insert the following M Code;
let
  recent = if [Month] = "Mar_2023" then 1 else 0,
  highSpend = if [Amount] >= 500 then 1 else 0
in
  recent + highSpend
    • Click OK.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

Create Time Intelligence:

  • Go to the Transform tab >> select Date >> select Date format.

Unleashing Power Query for Data Scientists: Reshape, Merge, Clean at Scale

5. Scale: Performance Optimization

You can implement performance strategies for production-scale datasets.

5.1 Query Folding Optimization

Check Folding Status:

  • Right-click any transformation step.
  • Find the View Native Query option.
  • If available, transformation is “folded” (optimized).

Maintain Folding:

  • Keep filtering and basic transformations early.
  • Avoid complex custom columns until necessary.
  • Use built-in functions over custom formulas.

5.2 Memory Management

Filter Early Strategy:

  • Add a date filter immediately after loading.
  • Go to the Home tab >> select Keep Rows >> select Keep Top Rows (for testing).
  • Or filter by date range: last 90 days for analysis.

Remove Unnecessary Columns:

  • Right-click unused columns >> select Remove.
  • Reduces memory footprint significantly.
  • Improves refresh performance.

6. Automate and Refresh

Power Query keeps a full record of each step. That means you can refresh the workflow without redoing transformations.

In Excel:

  • Go to the Data tab >> select Refresh All.

In Power BI:

  • Click Home >> select Refresh.
  • Or schedule a refresh on the Power BI service

This makes your ETL pipeline repeatable and scalable across time and projects.

Conclusion

Power Query is not just a transformation tool; it’s a powerful, scalable no-code ETL engine for data scientists and analysts. Whether you’re reshaping monthly reports, merging support logs, or cleaning customer records, Power Query offers all types of features for basic to advanced analysis. By using the techniques explained in this article, you can streamline your entire data preparation workflow, making it faster, more reliable, and accessible to teams with or without coding backgrounds.

Get FREE Advanced Excel Exercises with Solutions!

Shamima Sultana
Shamima Sultana

Shamima Sultana, BSc, Computer Science and Engineering, East West University, Bangladesh, has been working with the ExcelDemy project for 3+ years. She has written and reviewed 1000+ articles for ExcelDemy. She has also led several teams with Excel VBA and Content Development works. Currently, she is working as the Project Manager and oversees the day-to-day work, leads the services team, allocates resources to the right area, etc. Her work and learning interests vary from Microsoft Office Suites, and... Read Full Bio

We will be happy to hear your thoughts

Leave a reply

Advanced Excel Exercises with Solutions PDF

 

 

ExcelDemy
Logo