While dealing with datasets in Excel, it is often required to find outliers from the dataset. In this article, we are going to learn 5 effective methods on how to find the outliers in Microsoft Excel.
Download Practice Workbook
What Are the Outliers?
While dealing with datasets, by the word “Outliers” we mean data that are way higher or lower than the other data in the dataset. Outliers can quickly derail your analysis if they are not taken into account. In a word, finding outliers is therefore critical for any analysis.
As an illustration, think of a school bus full of 10 kids aged 8 years. But the bus driver is a very old man, who is 65 years old. Now, if we want to calculate the average age of the people on that bus, we get around 13 years, which is misleading information.
5 Ways to Find Outliers in Excel
In this article, we are going to learn 5 easy ways of finding Outliers in Excel. These methods are quick, precise, easy to use, and suitable for handling both small and large size of data sets.
1. Using Data Sorting Option to Find Outliers
Simply sorting the data and manually going through some of the top and bottom results is an easy approach to finding outliers in Excel.
Let me explain it to you with an example.
In the table below, there is data of some Swimmers about their swimming Time in seconds. Now let’s sort them and try to find the outliers.
Step 1: Select Header
First, select the header of the column that you want to sort. In this case, it is the Time (s) column (C4 cell).
Step 2: Select Home Tab
Next, click on the Home tab. It’s on the top-left side of your screen.
Step 3: Go to Sort & Filter Option
Then, click on the Sort & Filter. You can find it in the editing group at the top-right side of your screen.
Step 4: Choose Custom Sort Option
After that, click on Custom Sort from the drop-down.
Step 5: Edit Sort Dialogue Box
In the Sort dialogue box, from the Sort by drop-down select Time (s).
Now, from the Order drop-down, select Largest to Smallest.
Finally, click OK.
This should sort your data from the highest value to the lowest value. Now you can scan for the outliers at the top and the bottom of the column.
Here, you can see that the first figure is substantially larger than the remaining ones and the last two figures are substantially lower. So, we can identify them as Outliers.
Note: This method works well if you are dealing with a small data set. It’s not a scientific method but it can get the job done.
Read More: How to Find Outliers in Regression Analysis in Excel (3 Easy Ways)
2. Applying QUARTILE Function in Excel
This method is better than the previous one. In statistics, quartile means one-fourth of a data set. For example, if you have 40 data points in a data set, then the bottom 10 values will be your 1st quartile (Q1), the next 10 data points will be the second quartile (Q2), and so on. To find outliers we need to calculate the 1st quartile, 3rd quartile, and the Inter-Quartile Range. Let’s proceed step by step.
Step 1: Calculate 1st and 3rd Quartile
Firstly, we will use the QUARTILE.INC function to calculate the quartiles. It needs two pieces of information. The range of data you want to select and which quartile of data you want.
To calculate Q1 we can use the following formula in cell G4.
=QUARTILE.INC(C5:C14,1)
Then press ENTER and here is our Q1.
Similarly, we can find the 3rd quartile value by just changing the number from 1 to 3 in cell G5.
To calculate Q3 we can use the following formula in cell G5.
=QUARTILE.INC(C5:C14,3)
Step 2: Calculate Inter-Quartile Range
Now we will find the Inter-Quartile Range (IQR) by simply subtracting the 1st quartile from the 3rd quartile. So, in cell G6 we can use the following formula.
=G5-G4
Press ENTER and we will get our IQR.
Step 3: Calculate Upper Limit and Lower Limit
For calculating the upper limit, we need to add the 3rd quartile value with 1.5 times the IQR. We can use the following formula in cell G7.
=G5+(1.5*G6)
Hit ENTER, and you will get the Upper Limit.
To calculate the Lower Limit we need to subtract 1.5 times IQR from the 1st Quartile. The formula for Lower Limit is given below.
=G4-(1.5*G6)
Now press ENTER, and you will get the Lower Limit.
Step 4: Identify the Outliers
Now we have all our necessary data to find the outliers. We will use the OR function here. In the OR function, we need to give some logical conditions. If these conditions are satisfied, it will return TRUE and if the conditions are not satisfied, it returns FALSE.
Now, any value that is greater than our Upper Limit or less than our Lower Limit, is an outlier. In cell D5 we can use the following formula.
=OR(C5>$G$7,C5<$G$8)
Now hit ENTER, and we will get the result for the first data point.
Note: We need to use absolute cell reference while selecting the cells of Upper Limit and Lower Limit. Because if we don’t then the cell reference will change when we drag the formula to find all the outliers. Just hit F4 after selecting the cell. This will make it an absolute cell reference.
Now just drag the Fill Handle up to the end of the data set or just double click it.
The cells in which TRUE is displayed are our Outliers.
Here, the times of Jonathan and Peter are lower than our Lower Limit and the time of Andy is higher than our Upper Limit. So, these three values are the Outliers.
Read More: How to Calculate Outliers in Excel (5 Easy Ways)
3. Utilizing LARGE and SMALL Functions in Excel to Find Outliers
This comes in handy when we deal with a large data set. By utilizing the LARGE and SMALL functions in Excel, we can extract the highest 3 data or the lowest 3 data. We can extract more data if we want. After that, we can look into those data and see if there are any outliers.
The LARGE function needs two pieces of information. The range of data from where to extract and how much data you want to be extracted at a time.
Step 1: Extract Top 3 Values
We can use the following formula in cell E5. This will bring the top 3 values.
=LARGE($C$5:$C$14,ROW($1:3))
Press ENTER, and we will get our top 3 values.
Note: If we wanted to bring the top 5 values, then ROW($1:3) would be changed to ROW($1:5).
Step 2: Extract Smallest 3 Values
We can use the following formula in cell F5. This will bring the bottom 3 values.
=SMALL($C$5:$C$14,ROW($1:3))
Press ENTER and we will get our bottom 3 values.
Step 3: Identify the Outliers
Now, we can identify the Outliers by looking at our top and bottom 3 values.
4. Applying Mean and Standard Deviation to Find Outliers
Another effective way to find the Outliers in Excel is by applying the concept of Mean and Standard Deviation.
Mean is simply the average of a data set. On the other hand, Standard Deviation is a measure of how much the points in a distribution deviate from the distribution’s mean value. Generally, a rule of thumb is that if a value is 2 to 3 times the standard deviation of the distribution, then it is considered an Outlier.
Let’s proceed one step at a time.
Step 1: Calculate the Mean
To calculate the Mean we need to use the AVERAGE function. This function needs only one piece of information, that is the range of the data. We can use the following formula at cell G4 to determine the mean of our data set.
=AVERAGE(C5:C14)
Now press ENTER, and we will have the mean of our data.
Step 2: Calculate Standard Deviation
In Excel to calculate standard deviation, we can use the STDEV.S function. It only requires the range of data set of which we want to calculate the standard deviation. We can use the following formula at cell G5 to determine the standard deviation of the distribution.
=STDEV.S(C5:C14)
Now press ENTER.
Step 3: Evaluate the Upper Limit
The formula to evaluate the Upper Limit is Mean+(2*Standard Deviation). We can use the following formula at cell G6 to determine the Upper Limit.
=G4+(2*G5)
Press ENTER and here is our Upper Limit.
Step 4: Evaluate the Lower Limit
The formula to evaluate the Lower Limit is Mean-(2*Standard Deviation). We can use the following formula at cell G7 to determine the Lower Limit.
=G4-(2*G5)
Hit ENTER and here is our Lower Limit.
Step 5: Identify the Outliers
Now, we will again use the OR function. So, the logic that we will use here is any value that is higher than our Upper Limit or lower than our Lower Limit is an Outlier. We can use the following formula in cell D5.
=OR(C5>$G$6,C5<$G$7)
Note: We need to use absolute cell reference while selecting the cells of Upper Limit and Lower Limit. Because if we don’t then the cell reference will change when we drag the formula to find all the outliers. Just hit F4 after selecting the cell. This will make it an absolute cell reference.
Finally, to obtain the rest of the values drag the Fill Handle or double-click it.
As a result, the cells that are displaying TRUE are our Outliers.
Read More: How to Find Outliers with Standard Deviation in Excel (with Quick Steps)
5. Using Z-Score to Find Outliers
Another effective way to find the outliers in Excel is by using the Z-Score. The z value is also known as the Standardized Value. We will again need the Mean and the Standard Deviation of the data set. The formula to calculate the Z-score of data is given below.
Z=( X- Mean ) / Starndard Deviation
We can also use the STANDARDIZE function in Excel to calculate the Z-score of data directly.
Step 1: Calculate Mean
Initially, we can use the following formula in cell H4 for calculating the Mean of our data set.
=AVERAGE(C5:C14)
After that, press ENTER we will have our Mean.
Step 2: Calculate Standard Deviation
Next, we can use the following formula in cell H5 for calculating the Standard Deviation of our data set.
=STDEV.S(C5:C14)
Then hit ENTER.
Step 3: Evaluate Z-Score
The STANDARDIZE function needs 3 pieces of information to execute. 1st information is the data of which we are determining the Z-score, 2nd information is the Mean and the 3rd information is the Standard Deviation.
So, we will use the following formula in cell D5 to calculate the Z-score.
=STANDARDIZE(C5,$H$4,$H$5)
Note: We need to use absolute cell reference while selecting the cells of Mean and Standard Deviation. Because if we don’t then the cell reference will change when we drag the formula to find all the outliers. Just hit F4 after selecting the cell. This will make it an absolute cell reference.
Dragging the Fill Handle to the end of the data set will you the rest of the Z-scores.
Step 4: Identify Outliers
Again, we will use the OR function. Theoretically, for a value to be an Outlier, its Z-Score should be greater than +3 or less than -3. But in practice, the z-values that are close to 3 are pretty far from the mean and we can consider them as Outliers.
So, the logic here is any z-score that is greater than +2.5 or less than -2.5 will be an Outlier.
We can use the following formula in cell E5.
=OR(D5>2.5,D5<-2.5)
Eventually, after dragging the Fill Handle, you’ll get the following output.
In the above screenshot, Our Outlier is the cell in column E with the value TRUE.
Read More: How to Find Outliers Using Z Score in Excel (with Quick Steps)
Conclusion
Thank you for taking the time to read this article. I’m hoping that the processes described above will be able to handle any difficulty with finding outliers in an Excel data set. Please feel free to leave a comment if you have any questions or any recommendations for improving the article.