
Image by Editor
Data cleaning is the first step for data analysis and any data-related tasks. Clean data ensures accuracy, consistency, and usability, which helps to perform calculations and analyses. Excel offers powerful functions, features, and tools for cleaning and transforming messy data. In this tutorial, we’ll explore five useful data cleaning functions in Excel.
To demonstrate these functions, let’s use a messy dataset that contains several issues. Such as inconsistent name formatting (spacing, capitalization), different formats, invisible characters, etc.
Let’s clean it up using the five functions.
1. TRIM Function Removes Unwanted Spaces
The TRIM function removes all spaces from text except for single spaces between words.
Syntax:
=TRIM(text)
To apply the TRIM function;
- Select cell G2 and insert the following formula.
Formula:
=TRIM(A3)
This formula returns john SMITH. It removes the unwanted extra spaces.
2. PROPER, UPPER, and LOWER Functions Fixes Text Case
These functions help to standardize the case of text data. Commonly we use PROPER case.
PROPER() – Capitalizes the first letter of each word.
- Select a cell and insert the following formula.
Formula:
=PROPER(TRIM(A3))
This formula returns John Smith.
UPPER() – Converts text to all uppercase.
- Select a cell and insert the following formula.
Formula:
=UPPER(TRIM(A3))
This formula returns JOHN SMITH. All text is in uppercase.
LOWER() – Converts text to all lowercase.
- Select a cell and insert the following formula.
Formula:
=LOWER(TRIM(A3))
This formula returns john smith. All text is in lowercase.
3. TEXT Function Format Numbers as Text
TEXT converts a number to text in a specified format.
Syntax:
=TEXT(value, format_text)
To apply the TEXT function;
- Select a cell and insert the following formula.
Formula:
=TEXT(E2, "mm/dd/yyyy")
This formula returns “01/05/2023”, standardizing the Purchase Date format.
- Select a cell and insert the following formula.
Formula:
=TEXT(F2, "$#,##0.00")
This formula returns $1,250.50, formatting the Amount column.
4. CLEAN Function Removes Non-Printable Characters
The CLEAN function removes all non-printable characters from text.
Syntax:
=CLEAN(text)
To apply the CLEAN function;
- Select a cell and insert the following formula.
Formula:
=CLEAN(G2)
If G2 contains “Customer asked for follow-up” with an invisible character. This formula returns the Customer asked for follow-up without the invisible character.
When to use it:
- When importing data from web sources.
- When copying data from PDFs or other documents.
- When dealing with data from different systems that may insert control characters.
5. SUBSTITUTE Function Replaces Specific Text
SUBSTITUTE replaces specific text within a string.
Syntax:
=SUBSTITUTE(text, old_text, new_text, [instance_num])
To apply the SUBSTITUTE function;
- Select a cell and insert the following formula.
Formula:
=SUBSTITUTE(D4, ".", "-")
This formula removes dots with dashes to format the phone number.
Bonus Tip: Combining Functions
You can combine these functions to clean powerful data.
To standardize phone numbers, you can use:
=TEXT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B7, "-", ""), "(", ""), ")", ""), ".", ""), " ", ""), "000-000-0000")
Explanation:
- SUBSTITUTE(B2, “-“, “”): Removes hyphens.
- SUBSTITUTE(…, “(“, “”) and SUBSTITUTE(…, “)”, “”): Removes parentheses (( and )).
- SUBSTITUTE(…, “.”, “”): Removes dots (.).
- SUBSTITUTE(…, ” “, “”): Removes spaces (” “).
- 000-000-0000: Formats the number using the TEXT function.
Cleaned Dataset
After applying these functions, our dataset looks like this:
Conclusion
By using these five essential functions you can efficiently clean and standardize your data in Excel. These functions will transform the messy, inconsistent data into clean data to perform further calculations and analysis. Mastering these functions will enhance your data management skills and improve data accuracy for analysis.
Get FREE Advanced Excel Exercises with Solutions!
thx
Hello Lazo,
You are most welcome. Keep exploring Excel with ExcelDemy!
Regards
ExcelDemy