In this article, we demonstrate the way of making correlations between variables in a dataset and make a heatmap out of it. This process generally involves heaps of data collected from different places and analyzing them to find any relation that can come in handy in later use. Analysis of these huge data also created also creates a huge dataset containing results. Imbibing these results is quite difficult in a short period of time. A heat map of these results gives a solid overview of these. How to make the correlation between two datasets and how to create a heatmap in Excel are discussed here with adequate examples.
Download Practice Workbook
Download this practice workbook below.
Step-by-Step Procedure to Make Correlation Heatmap in Excel
We are going to use the below dataset to calculate the correlation between them and create a heat map of the correlation coefficients. This dataset primarily contains the marks of school students in Exams in various subjects. This type of Dataset will help us to correlate the marks. That means it will evaluate whether students perform uniformly or not in all subjects. Then we will make a dynamic or static heatmap of this correlated dataset for better understanding.
Step 1: Enable Data Analysis Tool
By default, there is no direct Data Analysis option in Excel, We need to enable that from the options.
- At first, we need to enable the Data Analysis command in the Data tab, because by default they are not added to Excel.
- Click on File in the worksheet to go to the Options.
- Click on the Options.
- A new window will open, from the window go to Add-ins, then select Excel Add-in from Manage drop-down menu.
- Click on Go.
- A small new window will open.
- Tick on the Analysis ToolPak box and click OK.
After this Data Analysis command will be available in the Data tab.
Step 2: Use Data Analysis Tool to Create a Correlation Matrix
As we enable the Data Analysis command, we can use it to calculate correlation directly.
- Now if you head to the Data tab, then you will notice that the Data Analysis command is available now.
- Click on the Data Analysis command
- Then one small window will open, and from that window select Correlation..
- Click OK after this.
- A small new window will open.
- From that window, select the input range of where your dataset is.
- Then click on the Columns in Grouped By option if you want to group the data by columns.
- Tick on Labels in the first row if your data has table headers or labels.
- Choose where you want to place your output in Output Options. Here, we want to place these output results in the new worksheet, choose that option, and input that sheet name.
- Click OK after this.
- Next, you will see a new worksheet opened and there is our dataset with their correlation value with each of the cell values present.
Now we make the correlation between the variables in Excel and now we apply conditional formatting to it to make a heatmap.
Step 3: Conditional Formatting to Make Correlation Heatmap
We are going to apply conditional formatting to the correlated values to visualize them according to their cell values.
- To apply conditional formatting, click on Conditional Formatting from the Home tab in the Styles group.
- A drop-down menu will appear, from that menu. Click on New Rule.
- After clicking the new rule, a new window will open, in that window select Format all cells based on their values in Select a Rule Type.
- Now change Format Style from 2 to 3.
- Then in the minimum drop-down window. Select Number.
- Then enter in Values -1.
- Choose the color Red.
- Then in the minimum drop-down window. Select Number.
- Then enter in Values.
- Choose the color, White.
- Then in the minimum drop-down window. Select Number.
- Then enter in Values.
- Choose the color Blue.
- Click OK after this.
- After clicking OK you will notice that the correlated dataset is now arranged as a color-coded heatmap.
Step 4: Interpretation of the Output
As stated before, the correlation coefficient indicates how the variables perform with each other. And here we created a heatmap to get an overview of how those coefficients are distributed and their intensity.
For example, the more positive the value towards +1 indicates means better the relation between variables. This means if one of those values increases, another value also increases. And vice versa. We color-coded this as blue, so the bluer the cell. The better symmetrical relation between them.
On the other hand, the more negative value toward -1, the worse relation between them, meaning if one goes up, another variable is likely to go down. We coded this as red. The worse the correlation value, the redder the cell color. Cell E4 and E9 exhibit this type of negative relation
And that’s how you can make a correlation between variables in a dataset and make a heatmap out of it in Excel.
Read More: How to Create a Zip Code Heat Map in Excel (with Useful Steps)
How to Create a Dynamic Correlation Heat Map in Excel
In this method, we will use a dataset in which correlation was done between the variables, and then we make a dynamic heatmap in Excel with a check box. That can show or hide the conditional formatting with just a press on the box.
Step 1: Create a Correlation Dataset
First, we need to have the dataset that will be modified to have the dynamic heat map. Using the correlated dataset created in the previous method is a good way to start.
Read More: How to Create a Risk Heat Map in Excel (3 Easy Methods)
Step 2: Add Checkbox in the Worksheet
We can add a check box using the Developer tab.
- Now from the Developer tab, click on the Insert command.
- From the drop-down menu, click on the Check Box icon.
- Then you will notice that there is a Check Box spawned in the worksheet.
- You can edit and resize the box as you desire.
- Select the box and right-click on the mouse. There will be a context menu. Then click on the Format Control option.
- After clicking on Format Control, a new window will appear. From that window, enter the location of the cell which will be liked with the box.
- Click OK after this.
- After this, you will notice that the location we just choose, now has the text “FALSE”.
- Clicking the check box will toggle the value of cell L4 between TRUE and FALSE.
Step 3: Apply Conditional Formatting
In order to apply conditional formatting properly, we need to use the following functions starting with the IF, MIN, and MAX functions.
- Now we have the check box with a link attached to it.
- Next, we will select the dataset and click on Conditional Formatting in the Style group from the Home
- Then click on the New Rule.
- After clicking the new rule, a new window will open, in that window select Format all cells based on their values in Select a Rule Type.
- Now change Format Style from 2 to 3.
- Then in the minimum drop-down window. Select Formula.
- Then enter the following formula in Value (Minimum) field:
=IF($L$4=TRUE,MIN($C$5:$I$11),FALSE)
- Choose color Red
- Then in the minimum drop-down window. Select Number.
- Then enter in Values 0.
- Choose the color, White.
- Then in the minimum drop-down window. Select Formula.
- Then enter the following formula in Value (Maximum) field:
=IF($L$4=TRUE,MAX($C$5:$I$61),FALSE)
- Choose the color Blue.
- Click OK after this.
- In the following window, we will find a Conditional Formatting Rules Manager, complete with an overview of all rules.
- Click OK after that.
Breakdown of the Formulas
- MIN($C$5:$I$11): This formula will return the minimum value in the range of cells $C$5:$I$11.
- IF($L$4=TRUE, MIN($C$5:$I$11), FALSE): This formula means that the If the value of L4 is true. Then the minimum value of the range of cell $C$5:$I$11 will be the input value. Otherwise, there will be no value available to input. Hence there will be no formatting.
- MAX($C$5:$I$61): This formula will return the minimum value in the range of cells $C$5:$I$11
- IF($L$4=TRUE, MAX($C$5:$I$61), FALSE): This formula means that the If the value of L4 is true. Then the maximum value of the range of cell $C$5:$I$11 will be the input value. Otherwise, there will be no value available to input. Hence there will be no formatting.
Step 4: Interpreting the Output
- After we click OK, we will see that the dataset now has Conditional Formatting which is dynamic. When the cell value of cell L4 is TRUE. conditional formatting with the heat map will be visible.
- Clicking the check box will switch the value in L4 back which, in turn, will hide the heat map from the conditional formatting. Which is shown in the image below.
That is how you can make a dynamic heatmap with a correlation dataset in Excel.
Potential Problems with Correlation in Excel
There is only a linear relationship between two variables in the Pearson Product Moment Correlation. In other words, your variables may be strongly related in another way, curvilinearly, and still have a correlation coefficient close to or equal to zero or close to zero.
Pearson correlation cannot differentiate dependent from independent variables. for example, if we have a correlation value-0.413(indicating a negative correlation between variables), and shuffle the variable, we still gonna have the same correlation value. which can lead to a faulty and misguiding conclusion that can make no sense. That while doing the correlation analysis, independent and dependent variables must be noted carefully.
Correlation can be susceptible to outliers, having a single outlier can give totally misleading insight, which can convolute the overview relationship between the two variables. This type of problem arises if the Pearson correlation coefficient is used. In this case, it’s better to rank those two variable values and perform the Spearman Correlation Coefficient analysis.
Conclusion
To sum it up, the question “how to make a correlation heatmap in Excel” is answered here by creating the correlation between the variables in the dataset and then making the heat map out of it. How we can create a dynamic heatmap with checkboxes is also elaborated. Furthermore, the potential difficulties usually we face with correlations and how we can alleviate them are also presented.
For this problem, a workbook is attached where you can practice these methods.
Feel free to ask any questions or feedback through the comment section. Any suggestion for the betterment of the Exceldemy community will be highly appreciable