Data Cleaning in Excel defines the process of identifying errors and taking proper steps to solve those errors. It also includes the identification of unorganized data and organizing them.
In this free Excel tutorial, we will explain the definition, importance, and several simple ways of data cleaning in Excel.
Here, I have a dataset with employee info in the ‘Employee Name’ and ‘Address’ columns. The values in the ‘Address’ column are a compilation of street names, postal codes, and state names. As part of data cleaning in Excel, we have distributed cell contents to adjacent cells with the Text to Columns option.
In this blog post, you will learn how to perform data cleaning with-
- checking the spelling
- highlighting, hiding, and removing the duplicates
- changing text cases
- replacing texts
- removing spaces, non-printable characters, and formats
- fixing numbers, times, and dates
- sorting and filtering data etc.
We will also discuss the importance and challenges regarding data cleaning in Excel.
⏷What Is Data Cleaning in Excel?
⏷Importance of Data Cleaning
⏷Common Issues with Data That Need Data Cleaning
⏷Basics of Data Cleaning in Excel
⏷How to Clean Data in Excel?
⏵1. Checking the Spellings
⏵2. Highlighting the Duplicates
⏵3. Removing the Duplicates
⏵4. Replacing Text
⏵5. Changing Text Case
⏵6. Removing Spaces
⏵7. Removing Non-Printable Characters
⏵8. Fixing Numbers
⏵9. Fixing Times
⏵10. Fixing Dates
⏵11. Merging Columns
⏵12. Distributing Cell Contents to Adjacent Columns
⏵13. Switching Rows and Columns
⏵14. Reconciling Table Data
⏵15. Deleting All Formatting
⏵16. Highlighting Errors
⏵17. Sorting Data
⏵18. Removing Rows with Blank Cells
⏵19. Adjusting Rows and Columns
⏵20. Filling Blank Cells
⏵21. Formatting Data as Table
⏵22. Cleaning Data with Power Query
⏵23. Cleaning and Filtering Data with Pivot Table
⏵24. Changing Number Format into Percentage
⏵25. Finding Maximum and Minimum Values
⏵26. Filtering Data with “If Text Contains…” Feature
⏵27. Checking Data Types
⏵28. Adding Text at the Beginning
⏵29. Adding Text at the End
⏵30. Inserting Comma Before Nth Character from Right
⏵31. Removing Text Starting from a Particular Character
⏵32. Hiding Duplicate Rows
⏵33. Using Add-ins
⏷Challenges of Data Cleaning
What Is Data Cleaning in Excel?
Data Cleaning in Excel is a combination of processes that includes identifying, highlighting, and taking action against all kinds of errors to have organized data. There are many built-in Excel functions and features that are directly or indirectly used to ease the data-cleaning process in Excel.
Why Is It Important to Clean Data in Excel?
As we use data for many important decisions, it is a must to keep the data clean as far as possible. Some of the reasons are listed below:
Accuracy: Ensures better accuracy while data is analyzed.
Reliability: Instinctively enhances reliability.
Efficiency: Reduces time spent on it as all the data are already sorted and ready to use.
Better Decision-making: Less error-prone and helps to predict better decisions.
What Are the Common Issues with Data That Need Data Cleaning?
Data cleaning in Excel is a very vast section. Some of the most common issues with data that need cleaning are listed below:
- Spelling mistakes
- Duplicate data
- Uncategorized text cases
- Unnecessary spaces, non-printable characters, and formats
- Unorganized formatting of numbers, times, and dates
- Missing text in a specific position, etc.
What Are the Basics of Data Cleaning in Excel?
The basics of data cleaning in Excel are described briefly in the following section:
- First of all, import the raw data from an external data source.
- Keep a copy of the imported data in a separate workbook.
- Organize the data in tabular form with the right data in the right place.
- Now, make the data cleaning according to your needs.
How to Clean Data in Excel?
There are many functions in Excel that are being widely used for data cleaning. We can change text cases, merge columns, reconcile table data, check data type, insert or remove text from a specific position, etc. with Excel functions as part of the data cleaning process. Again, there are many built-in features in Excel like Spelling for correcting misspelled words; Conditional Formatting for highlighting duplicates or cells matching a certain criterion; Number Format for fixing numbers, dates, and times; Sort & Filter for sorting data; Find & Select for finding or replacing texts; Text to Columns for distributing cell contents and so on.
1. Checking the Spellings
Correcting spelling mistakes can be considered one of the basic data-cleaning techniques in Excel.
- Go to the Spelling option from the Review tab.
- A wizard named Spelling: English (United States) will appear with necessary suggestions of the misspelt words.
- Pick the correct spelling from the available spelling suggestions and click on Change All to make corrections in the entire worksheet.
2. Highlighting the Duplicates
We often use highlighting procedures to make the duplicates easily visible.
- Go to Conditional Formatting from the Home tab.
- Select the New Rule… option.
- Pick the Use a formula to determine which cells to format option from the New Formatting Rule wizard.
- Insert the following formula in the Format values where this formula is true section.
=COUNTIF($C$8:$C$19,$C8)>1
- Next, click on the Format option to define the matched values format.
- Pick a background color from the Fill tab.
- You can customize the font style of the matched values. I have set the font color white and style italic for the matched cells.
- Click OK to finish the formatting.
- Click OK again to apply the conditional formatting.
- Now, we have the duplicate values highlighted according to the defined formatting.
3. Removing the Duplicates
After finding & highlighting duplicates following the previous method, now we can remove the duplicates as a data-cleaning process.
- Select the entire data and go to the Data tab.
- Click on Remove Duplicates.
- Pick a column and click on OK to find the duplicates and delete the entire row.
- Thus, we have a dataset with no duplicates along the defined column.
4. Replacing Text
Replacing a specific text with a new one is also an important part of the data-cleaning procedure.
- Go to the Find & Select command from the Home tab.
- Pick Replace… from the available options.
- Now, input the text to be replaced (i.e. In Progress) in the Find what section and the text that will be inserted (i.e. Completed) in the Replace with section.
- Click on Replace All to make the execution.
- Thus, we can simply replace texts.
5. Changing Text Case
We can change the names in proper cases as a part of the data cleaning procedure.
- Apply the following formula with the PROPER function in cell H8 to have the name of cell C8 in the proper case.
=PROPER(C8)
- Use Fill Handle to autofill the formula.
6. Removing Spaces
There might have unnecessary spaces between texts in a dataset. It is better to remove them to have a clean dataset.
- Apply the following formula with the TRIM, CLEAN, and SUBSTITUTE functions to remove spaces between texts as well as leading spaces at the beginning.
=TRIM(CLEAN((SUBSTITUTE(C6,CHAR(160)," "))))
7. Removing Non-Printable Characters
Sometimes, the non-printable characters make the dataset difficult to understand. We can remove those for a better understanding of the dataset. Look at Column C, under the Name header, you’ll find some unwanted characters before or between texts that are not printable. So you’ll need to remove those non-printable characters by following the steps defined below.
- Just apply the following formula with the CLEAN function to have the non-printable characters removed.
=CLEAN(C8)
8. Fixing Numbers
Not having the numbers in number format causes errors in calculation. We need to fix it to have the proper results. In the Stock Quantity column, all the numbers are in text format that’s why you’re seeing the green triangles icon on the left-top portion of each cell in that column.
- Use the following formula with the VALUE function to have the numbers in the number format.
=VALUE(F6)
9. Fixing Times
Times in general format show numerical value. We need to change the number format to have them in the time format. In Column F, the times are shown in decimal format, like- 10.25 should have appeared 10:15:00 or 10:15 A.M., right?
- Apply the following formula to have the time in the general mode of the 24-hour time format.
=F6/24
- Now, select all the time in general mode and select Time from the Number Format option.
- Now, we have the time in a proper format.
10. Fixing Dates
Excel stores dates as sequential serial numbers. So if they are not assigned a Date format, they might appear with those vague 5 or 6-digit numbers under the General number format. We need to convert them into date format.
- Select the cells containing the dates represented as numbers.
- Choose Short Date or Long Date from Number Format to have the numerical values in proper date format.
- Thus we can fix the date issue.
11. Merging Columns
We can merge multiple columns into one and make the dataset more presentable. Under the Address header in column F, we’ll show the proper address format by merging street name, postal code & state name from the left 3 columns.
- Apply the following formula with the CONCATENATE function to merge the columns and separate the segments with dashes.
=CONCATENATE(C6,"-",D6,"-",E6)
12. Distributing Cell Contents to Adjacent Columns
We can distribute a cell value into multiple cells to simplify a dataset.
- We have a dataset where the addresses are the combination of the street name, postal code number, and state name. Those are separated with a dash (–) between them.
- To separate the street name, postal code number, and state name into adjacent cells separately, copy the entire column values to the Street Name column.
- Now, select all the values in the Stree Name column and click on the Text to Columns option from the Data tab.
- From Convert Text to Columns Wizard, choose the Delimited – Characters such as commas or tabs separate earache field option as our data is combined with the dash sign.
- Next, click on Next> to march forward.
- Now, define the delimiter based on what the cell value will be separated.
- Click on the Next button.
- Now, click on Finish to end the process.
- Finally, we have the distributed cell values in the adjacent cells.
13. Switching Rows and Columns
Sometimes, we need to change the row-column orientation for a better presentation of our dataset. In the following dataset, we’ll transpose all columns into rows.
- Copy the entire range first where we want to switch rows and columns.
- Then, select a cell to paste the switched rows and columns.
- Go to Paste from the Home tab.
- Pick the Transpose (T) option to make the switch.
- Thus, we can have the switched rows and columns.
14. Reconciling Table Data
We can define the similarities or differences between two data tables by reconciling them. In the following dataset, we have two tables that represent sales amounts in June and July. We will reconcile them to know the status of whether the sales amount has changed or not.
- Apply the following formula to reconcile the data of two tables and have the status “Changed” or “Unchanged”.
=IF(F6=C6,"Unchanged","Changed")
15. Deleting All the Formatting
To have the data in raw format, we need to delete the formatting. We will remove all the formatting from the following dataset by using the Clear Format command.
- Select the entire data and go to Home.
- Next, click on Clear Formats from the Clear option to remove the formatting.
- Now, we have just raw data without any formatting.
16. Highlighting Errors
To make the errors more visible, we can highlight them. In the Per Hour Salary column, some cells provide errors instead of any proper values. We will use conditional formatting to highlight them.
- Select the entire area where you want to highlight errors.
- Then, click on Conditional Formatting from the Home tab and select New Rules…
- Now, choose the Use a formula to determine which cells to format option from the New Formatting Rule wizard.
- Write the following formula in the Format values where this formula is true section.
=ISERROR(F6)
- Next, click on the Format option to define the matched values format.
- Pick a background color from the Fill tab and click on OK.
- Click OK again to apply the conditional formatting.
- The cells containing errors will be highlighted.
17. Sorting Data
Sorted data are easily understandable. In the following dataset, we will sort the data in descending order based on the total marks
- Select the numbers based on what you want to do in the sorting of the rows.
- Go to the Home tab and click on Sort Largest to Smallest from Sort & Filter to have the sorting in descending order.
- Choose the Expand the selection option and click on Sort to make the sorting.
- Thus, we have the sorting in the descending order.
18. Removing Rows with Blank Cells
Rows with blank cells might have no importance in the dataset. So, it’s better to delete the entire rows with blank cells for data cleaning in Excel. In the Working Hour column, we can see that some of the cells are empty. We will remove the entire rows with the zero working hour criterion.
- Select the entire dataset first and go to Home.
- Pick the Go To Special… option from the Find & Select feature.
- Select the Blank option and click on OK from the Go To Special wizard.
- We will have all the blank cells selected within the selected range. Now, go to the Home tab again.
- Click on the Delete Sheet Rows option from the Delete feature.
- Finally, we have a clean dataset without rows having blank cells.
19. Adjusting Rows and Columns
We can do the adjustment of rows and columns to make the dataset well decorated. In our dataset, the row heights & column widths are not identical in all cases which looks too messy. We can use the AutoFit option to adjust the row height & column width easily.
- Select the entire dataset and go to Home.
- Go to the Format feature and pick the AutoFit Row Height option for row adjustment.
- Select the AutoFit Column Width option from the Format feature for column adjustment.
- Thus, we have a more organized dataset with adjusted rows and columns.
20. Filling Blank Cells
Blank cells make a dataset unfulfilled. We can insert zeros in those cells to have a better representation. There are some blank cells in the following dataset which represent the students’ absence from the exam. We will use the Go To Special.. command to fill those cells with zeros.
- Select the entire range first.
- Go to the Home tab and select Find & Select from the ribbon.
- Pick Go To Select… from the available options.
- Now, select Blanks and click on OK.
- The blank cells within the selected range will be selected.
- Now, insert zero and press Ctrl + Enter.
- Thus, we will have the blank cells filled with zeros.
21. Formatting Data as Table
We often format data as a table to make it easily understandable and ensure better visualization. We have a dataset with no proper formatting as a table. We will use a built-in template to format those data as a table.
- Select the entire range of data.
- Go to Home and select Format as Table from the ribbon.
- A dialog box named Create Table will appear to make the confirmation of your table range. You can change the table range that was previously defined by selection.
- Check the My table has headers box if you have headers.
- Click on OK to finish the process.
- Thus, we will have data as a table.
22. Cleaning Data with Power Query
Power Query is an efficient feature in Excel for data cleaning. We have a dataset where names are written in capital letters. There are also some empty cells in the Per Hour Salary column. We will use Power Query to change the names into the proper case and delete those rows with empty cells.
- First of all, define a range to apply the Power Query feature.
- Go to Home and select From Table/Range from the ribbon.
- Confirm the table range and check the My table has headers box if you have headers.
- Click on OK.
- The table will appear in the Power Query Editor.
- To change the case of the names into the proper format, select the Full Name column and go to the Transform tab.
- Now, pick the Capitalize Each Word option from Format.
- We will have the names in the proper case.
- To remove the entire rows based on the blank cells in the Per Hour Salary column, click on the filter button and pick the Remove Empty option.
- Click OK.
- Thus, we will have the filtered table with no rows with blank cells.
- Now, go to Home and click Close & Load from Close & Load.
- We will have the filtered dataset in a new worksheet.
23. Cleaning and Filtering Data with Pivot Table
To summarize a big dataset, Pivot Table is a very handy tool. Based on the following dataset, we will sort data that contains a specific text. We will also use date as a filter to organize data.
- In order to create a pivot table, select the range of cells and go to the Insert tab.
- Click on Pivot Table from the ribbon and choose From Table/Range.
- Define the pivot table location from the PivotTable from table or range It could be in the current worksheet or an entirely new worksheet.
- Define rows and values. I have defined the products as rows and prices as values.
- To summarize the dataset with rows containing certain text, click on the filter button on the column header.
- Go to Label Filters and choose the Contains… option.
- Insert the specific word (i.e. Asus) based on what you want to filter the table and click on OK.
- The product with that specific text will be sorted.
- To add Order Date as a parameter to filter the dataset, we can drag Order Date to the Filter section in PivotTable Fields.
- Now, pick a date as a filtering parameter.
- We will have our desired pivot table.
24. Changing Number Format into Percentage
Sometimes we might face difficulties in making the conversion from number to percentage format. We will convert the total number into percentages from the following dataset.
- Divide the obtained value with the total value. Here, I have applied the following formula to have the total obtained marks in decimal format.
=SUM(D8:F8)/$G$5
- Now, select all the decimal numbers and change the number format to Percentage from the Number Format option under the Home tab.
- We have the numbers into percentages in a simple way.
25. Finding Maximum and Minimum Values
We often need to find the maximum and minimum values to find the limit of the dataset. According to the total sales amount listed in the following dataset, we will find the maximum and minimum amount.
- Apply the following formula with the MAX function to find out the maximum value.
=MAX(G7:G21)
- You can use the following formula with the MIN function to find the minimum value from the defined range.
=MIN(G7:G21)
26. Filtering Data with “If Text Contains…” Feature
We can easily filter a table based on the presence of certain texts. The following dataset is categorized into electrical and mechanical. We will filter data that belong to the mechanical category.
- Select a certain range of cells and click on Table from the Insert tab.
- Confirm the table range by clicking on OK.
- Click on the filter button from the column header.
- Select the Contains… option from Text Filters.
- Now, insert the specific text (i.e. Mechanical) on the contains category based on what you want to filter the dataset.
- Click on OK to finish the process.
- Finally, we have the filtered table that contains a specific text.
27. Checking Data Types
We can know the type of data with the TYPE function in Excel. It will return an integer number based on the type of data according to the following table.
Category | Return value |
Number | 1 |
Text | 2 |
Logical Value | 16 |
Error | 4 |
Array | 64 |
- Apply the following formula with the TYPE function to check the data type.
=TYPE(F6)
28. Adding Text at the Beginning
We can use the CONCATENATE function in order to add text at the beginning. Here, column B under the Event header, needs to be represented more clearly with additional “COP- “ text at the beginning.
- Use the following formula with the CONCATENATE function to add text at the beginning.
=CONCATENATE(“COP- “,B6)
- Now, copy those updated values and paste as values in the desired location.
29. Adding Text at the End
We might need to add extra text at the end of a cell value to complete the statement or a text structure. In column E, under the Address header, we’ll add the country name USA for all state names at the end.
- Apply the following formula with the CONCAT function to add text at the end of a cell value.
=CONCAT(E6,"-USA")
30. Inserting Comma Before Nth Character from Right
While inserting data, we might miss some characters within a value. Then, we need to add that missing texts or characters at the specific position of that data. In column E under the Address header, we will try to insert a comma and a space.
- To insert a comma before the Nth character from the right, apply the following formula with the LEFT, LEN, and RIGHT
=LEFT(E6, LEN(E6) - 3)& ", " &RIGHT(E6, 3)
31. Removing Text Starting from a Particular Character
We might need to remove a specific portion of text starting from a specific position to make the data more presentable. In column E under the Address header, we will remove the entire part after the comma portion including that comma.
- Apply the following formula with the LEFT and SEARCH functions to remove text starting from a particular character.
=LEFT(E6,SEARCH(",",E6)-1)
32. Hiding Duplicate Rows
Instead of deleting the duplicate rows, we can hide them and have them back if needed. In the following marksheet, there is duplicate student information. We will hide the rows with duplicate names.
- Select the range of data and go to the Data tab.
- Click on the Advanced options from the ribbon.
- From the Advanced Filter wizard, define the data range again in the Criteria range section and check the Unique records only box.
- Click on OK to finish the process.
- Now, we have the duplicate data hidden.
33. Using Add-ins
There are many websites that provide free add-ins that lessen our workloads to execute data cleaning in Excel. Some of the websites are listed below:
What Are the Challenges of Data Cleaning in Excel?
Data cleaning has been introduced a long time ago and going pretty well for decades. However, there are some challenges that we might face while cleaning the data in Excel.
- There is a limitation in data size in Excel that can lead to difficulties while cleaning data.
- As data cleaning in Excel is a manual process, it is tedious, time-consuming, and human error-prone.
- Missing data can lead to a wrong prediction while cleaning data.
- Dealing with outliers is a very difficult and error-prone task.
- Excel can not directly process unstructured data like images, audio, text documents, etc.
Download Practice Workbook
You can download the practice workbook from here and try ypurself.
In this article, we have learned different data cleaning procedures in Excel like checking the spellings, highlighting, hiding, and removing the duplicates, changing text cases, replacing texts, removing spaces, non-printable characters, and formats, fixing numbers, times, and dates, sorting and filtering data, etc. We have also discussed the basic procedures, importance, and challenges regarding data cleaning in Excel.
Data Cleaning in Excel: Knowledge Hub
- How to Remove Numbers from a Cell in Excel
- How to Remove Outliers in Excel
- How to Remove Dotted Lines in Excel
- How to Remove Compatibility Mode in Excel
- How to Remove Value in Excel
- How to Remove 0 from Excel
- How to Remove Panes in Excel
- How to Remove Drop Down Arrow in Excel
- How to Remove HTML Tags from Text in Excel
- How to Remove Partial Data from Multiple Cells in Excel
- Using Excel to Clean and Prepare Data for Analysis
- How to Clean Up Raw Data in Excel
- How to Clean Survey Data in Excel
- 19 Practical Data Cleaning Techniques in Excel
- How to Do Automated Data Cleaning in Excel
- How to Use Macro to Clean Up Data in Excel
<< Go Back To Learn Excel
Get FREE Advanced Excel Exercises with Solutions!
Dear Naimul Hasan Arif and Nehad Ulfat,
I hope this message finds you well. My name is AKhtar Khilji, and I’m reaching out to express my sincere appreciation for the outstanding training blog you have crafted to assist users in addressing data cleaning challenges.
Currently, I am conducting a training program on Excel through a nonprofit organization, focusing on empowering the youth of marginalized communities. I was deeply impressed by the quality of your training material and would like to request your permission to utilize the exercises from your blog. Of course, I will ensure proper attribution by mentioning “Excel Dummy” along with the web address. Furthermore, I intend to share the reading material with the trainees to enhance their learning experience.
Thank you so much for considering my request. Your generosity will undoubtedly contribute to the success of this educational initiative.
Best regards,
AKhtar Khilji from Pakistan
Dear AKhtar Khilji,
I hope this message finds you well too. Thank you very much for your kind words and your interest in using the training material from my blog for your nonprofit Excel training program. I’m thrilled to hear that you found the content helpful and that it will be used to empower the youth of marginalized communities in Pakistan.
I’m more than happy to grant you permission to utilize the exercises from the blog for your educational initiative. Please feel free to mention “ExcelDemy” along with the web address to provide proper attribution. It’s wonderful to know that the material will be shared with the trainees to enhance their learning experience.
If you have any specific requests or need further assistance with anything related to the training material, please don’t hesitate to reach out. I’m here to support your efforts in any way I can.
Wishing you great success with your Excel training program and the empowerment of the youth in marginalized communities.
Best regards,
ExcelDemy