How to Use the Data Analysis Toolpak in Excel

Steps to Enable Data Analysis Toolpak in Excel

Steps:

In File, go to Options.

Select Add-ins.
In Manage, select Excel Add-ins.
Click Go.

In the new window, check Analysis ToolPak.
Click OK.

In the Data tab, select Analyze and choose Data Analysis.

Feature 1 – Anova Analysis

ANOVA is a statistical method used to analyze variance observed within a dataset by dividing it into two sections: 1) Systematic factors and 2) Random factors.

Anova can determine which factors have a significant effect on a given set of data and supports additional analysis on methodological factors and on regression analysis.

The Formula of Anova is:

F=MSE / MST

Here:

F = Anova coefficient

MST = Mean sum of squares due to treatment

MSE = Mean sum of squares due to error

Anova is of two types: single factor and two factors, according to variance analysis.

In two factors, there are multiple dependent variables and in one factor, there will be one dependent variable.
Single-factor Anova calculates the effect of a single factor on a single variable.
Single-factor Anova identifies the differences that are statistically significant between the average means of numerous variables.

1.1 Single-Factor Anova Analysis

The sample dataset showcases a group of factors.

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Anova: Single Factor.
Click OK.

In the Anova: Single Factor window, enter data in Input Range.
Check Labels in First Row.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels in the first row (if you select the input data range with the label).
Click OK.

This is the output..

Interpretation of the Result:

In the Summary table, you will find each group’s average and variances. The average level is 60.4 for Group A but the variance is 315.15 which is lower than other groups: elements in that group are less valuable.
Anova results are not very significant as you are calculating only the variances.
P-values interpret the relation between the columns: the values are greater than 0.05, which is not statistically significant.

1.2 Anova: Two Factor with Replication

The dataset showcases data on different exam scores on two shifts in that school.

You want analyze data to find a relation between the shifts and students’ marks.

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Anova: Two-Factor With Replication.
Click OK.

In the new window, enter data in Input Range.
Enter 4 in Rows per sample as you have 4 rows per shift.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply (here) or in a new workbook by selecting New Workbook.
Click OK.

A new worksheet was created.
The two-way Anova result is displayed.

Interpretation of the Result:

The average score on the Morning shift in Math score is 65.5 but on the Day shift is 83.75
In the Chemistry exam, the average score on the Morning shift is 87.25, but on the Day shift is 77.25.
Variance is very high at 91 on the morning shift in the Math exam.
You will get a complete overview in the summary.

You can summarize the interactions and individual effects.

The P-value of Columns is 0.037 which is statistically significant: there is an effect of shifts on the performance of the students in the exam. But the value is close to the alpha value of 0.05 (the effect is less significant).
The P-value of interactions is 0.000967 which is much less than the alpha value. It is very significant statistically: the effect of the shift on both exams is very high.

1.3 Anova: Two Factor without Replication

You have data on different exam scores on two shifts in that school. T

To find a relation between shifts and students’ marks:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select “Anova: Two-Factor Without Replication”.
Click OK.

In the new window, enter data in Input Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply (here) or in a new workbook by selecting New Workbook.
Check Labels.
Click OK.

A new worksheet was created.
The two-way Anova result is displayed.

Interpretation of the Result:

The average score on the Morning shift in Math score is 65.5 but on the Day shift is 83.75
In the Chemistry exam, the average score on the Morning shift is 87 but on the Day shift is 77.25.
The P-value of Columns is 0.24, which is statistically significant: there is an effect of shifts on the performance of the students. The value is close to the alpha value of 0.05, so the effect is less significant.

Read More: How to Use Analyze Data in Excel

Feature 2. Correlation Analysis

In statistics, correlation or correlation coefficient is the parameter to show coherence between two variables in response to the continuous fluctuating quantity of another. Its value ranges from -1 to +1. It has three states of defining variable relations:

-1 indicates a negative correlation, which means the variables change in the opposite direction.
+1 indicates a positive correlation, which means the variables change in the same direction.
0 indicates no correlation, which means no apparent movements in any direction of a variable upon changing other variables’ values.

The dataset showcases two stock prices in different periods of time.

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Correlation.
Click OK.

In the new window, enter data in Input Range.
Check Columns in Grouped By.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels in the first row.
Click OK.

This is the correlation result.

There is a positive correlation, which means the variables change in the same direction.

Feature 3 – Covariance Analysis

The covariance of two variables is a measure of how one of them influences the other. It’s a necessary evaluation of the deviation between two variables (they do not have to be dependent on each other).

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Covariance.
Press Enter.
Click OK.

In the new window, enter data in Input Range.
Check Columns in Grouped By.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels in the first row.
Click OK.

This is the covariance result.

Explanation of the Result:

The highlighted section in the following image indicates the variance for each subject.

The variance of Math is 154.9375 and the variance of Physics is 76.484375. The variance of Chemistry is 154.9375.

The highlighted section in the following image indicates the values of variance between the subjects: Math and Physics, Maths and Chemistry, and Physics and History have a variance value of respectively 60.9375, 3.65625, and 8.8125. When the covariance is positive, it indicates that the variables are proportionate: when one increases, the other tends to rise.

Feature 4 – Descriptive Statistics Analysis

To do a descriptive statistics analysis.:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Descriptive Statistics.
Click OK.

In the new window, enter data in Input Range.
Check Columns in Grouped By.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Summary statistics.
Click OK.

The covariance result will be displayed.

The result gives the characteristics of two variables, mean, median, standard deviation, and maximum and minimum value of the dataset, which are respectively 77.25, 80, 19.088, 45, and 100.

Feature 5 – Exponential Smoothing Analysis

The dataset contains items sold in different weeks by a manufacturing company. To get an exponential smoothing analysis:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Exponential Smoothing.
Click OK.

In the new window, enter data in Input Range.
Enter 0.9 in Damping factor. Here, 1-alpha.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Check Chart Output.
Click OK.

The exponential smoothing result and the chart for 0.1 alpha are displayed.

Excel can not provide data for the first value in this method. If you use a large damping factor, you will get more smooth peaks and valleys.

Feature 6 – F-Test Two-Sample for Variances Analysis

To do an F-test for two sample variances:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select F-Test Two-Sample for Variances.
Click OK.

In the new window, enter data in Input Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Click OK.

The F-test for two sample variances is displayed.

Explanation of the Result:

The mean value of variable 2 is greater than variable 1.
If F is greater than F Critical one-tail, it doesn’t follow the null hypothesis. In the above data, the value of F is .8016 and the value of F Critical one-tail is 0.3145 which indicates F is greater than F critical one-tail: the variances between the two variables don’t match.

Feature 7 – Moving Average Analysis

The moving average means the period of time of the average is the same, but it keeps moving when new data is added. A moving average smooths irregularities (peaks and valleys) from data to easily recognize trends. The larger the interval period is, the more smoothing occurs, as more data points are included in each calculated average.

The dataset contains items sold in different weeks by a manufacturing company. To do a moving average analysis:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Moving Average.
Click OK.

In the Moving Average box:

Enter data in Input Range.
Enter the number of intervals in Interval.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
To see the trendline with a chart, check Chart Output.
Check Labels in the first row.
Click OK.

The moving average and the Excel trendline will be displayed, showing both the original data and the moving average value with smooth fluctuations.

Feature 8 – Random Number Generation

To generate random numbers with different criteria:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis, select Descriptive Statistics.
Click OK.

In the Random Number Generation box:

Enter the Number of Variables (the number of columns of random numbers that you want in your worksheet. Here, 2).
Enter the Number of Random Numbers (the number of rows that you want in your worksheet. Here, 7).
Select Uniform in Distribution.
In parameters, indicate the boundaries of your distribution. Here 40 to 60.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Click OK.

Random numbers will be generated.

Feature 9 – Rank and Percentile Analysis

The dataset showcases students’ ID and their Math exam scores. To calculate the rank and percentile of each student’s math exam score:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Rank and Percentile.
Click OK.

In the Rank and Percentile box:

Enter data in Input Range.
Check Columns in Grouped By section.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels in the first row.
Click OK.

The rank and percentile for each student’s exam score will be displayed.

Feature 10 – Regression Analysis

Regression analysis helps to predict values depending on two or more variables.

The dataset contains the player name, the number of matches played, and the number of goals each player scored.

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Regression.
Click OK.

In the Regression box,

Enter data in Input Range: the data ranges in Input X Range and Input Y Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Check Residuals.
Click OK.

The result of the regression analysis is displayed.

Explanation of the Regression Analysis Result:

Regression Statistics:

Regression statistics is an array of various parameters that describe how well the measured linear regression is.

Multiple R is a correlation coefficient parameter that indicates the correlation between variables. Its value ranges from -1 to +1. The bigger the value, the stronger the correlative relationships are.
R Square symbolizes the coefficient of determination. It indicates the scale by how well the data model fits the regression analysis.
The adjusted R square is used in multiple variables in regression analysis.
Standard Error is another parameter that shows a healthy fit of any regression analysis. The smaller the standard error the more accurate the linear regression equation. It shows the average distance of data points from the linear equation.

Anova:

It analyses the variance of the data model.

Here, df represents the degree of freedom.
SS( sum of squares) symbolizes the good-to-fit parameter.
MS means the Mean Square.
F refers to the Null Hypothesis. It tests the overall significance of the regression model
Significance of F means the P-value of F.

Co-efficient Outcome:

It helps to determine Y values.

Residual Output:

It compares the estimated value with the calculated value.

Feature 11 – t-Test Analysis

There are three types of T-Test:

Paired two samples for means
Two samples assuming equal variances
Two- samples using unequal variances

11.1 t-Test: Paired Two Sample for Means

The dataset contains Students’ IDs and each student’s Math and Physics scores.

To do a t-Test: Paired Two Sample for Means:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select t-Test: Paired Two Samples for Means.
Click OK.

In the t-Test: Paired Two Sample for Means box:

Enter data in Input: the data ranges in Variable 1 Range and Variable 2 Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Click OK.

The result of the t-Test: Paired Two Sample for Means will be displayed.

Explanation of the Result:

The Mean Value of the Math Score is greater than the mean value of the Physics Score.
The variance of the Math Score is also greater than the variance of the Physics Score.
If t Stat is greater than t Critical two-tail, you can’t eliminate null hypothesis. In the above calculation, t Stat and t Critical two-tail value are respectively 1.603 and 2.36464. That means 1.603< 2.36464: it doesn’t match the null hypothesis. The variances between the two variables don’t match.

11.2 t-Test Two-Sample Assuming Equal Variances

To do a t-Test: Two-Sample Equal Variances:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select t-Test: Two-Sample Equal Variances.
Click OK.

In the t-Test: Paired Two Sample Equal Variances box,

Enter data in Input: the data ranges in Variable 1 Range and Variable 2 Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Click OK.

The t-Test: Two-Sample Equal Variances will be displayed.

Explanation of the Result:

The Mean Value of the Math Score is greater than the mean value of the Physics Score.
The variance of the Math Score is also greater than the variance of the Physics Score.
If t Stat is greater than t Critical two-tail, you can’t eliminate the null hypothesis. In the above calculation, t Stat is and t Critical two-tail values are respectively 1.48 and 2.144. 1.48< 2.144: it doesn’t match the null hypothesis. The variances between the two variables don’t match.

11.3 t-Test: Two-Sample Assuming Unequal Variances

To do a t-Test: Two-Sample Unequal Variances:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select t-Test: Two-Sample Unequal Variances.
Click OK.

In the t-Test: Paired Two Sample Unequal Variances box,

Enter data in Input: the data ranges in Variable 1 Range and Variable 2 Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Check Labels.
Click OK.

The t-Test: Two-Sample Unequal Variances will be displayed.

Explanation of the Result:

The Mean Value of the Math Score is greater than the mean value of the Physics Score.
The variance of the Math Score is also greater than the variance of the Physics Score.
If t Stat is greater than t Critical two-tail, you can’t eliminate null hypothesis. In the above calculation, t Stat and t Critical two-tail values are respectively 79 and 2.131. 1.79< 2.131: it doesn’t match the null hypothesis. The variances between the two variables don’t match.

Feature 12 – z-Test: Two Sample for Means

Use the VAR.P function to calculate the variance of both variables.

Steps:

To calculate the variance of the Math score, use the following formula in D14:

=VAR.P(C5:C12)

Press Enter.
You will see the variance of Match Score.

To calculate the variance of the Physics score, use the following formula in D15:

=VAR.P(D5:D12)

Press Enter.
You will see the variance of Physics Score.

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select z-Test: Two-Sample Means.
Click OK.

In the z-Test: Two-Sample Means box:

Enter data in Input: the data ranges in Variable 1 Range and Variable 2 Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Enter the value variance of Math and Physics Score in Variable 1 Variance (known) and Variable 2 Variance (known)
Check Labels.
Click OK.

The z-Test: Two-Sample Means will be displayed.

Explanation of the Result:

The Mean Value of the Math Score is greater than the mean value of the Physics Score.
The variance of the Math Score is also greater than the variance of the Physics Score.
If Z is less than Z critical two-tail, you can’t eliminate the null hypothesis. In the above calculation, z and z Critical two-tail values are 52 and 1.95. 1.52 < 1.95: matches the null hypothesis. The variances between the two variables match.

Feature 13. Sampling Analysis

To do a sampling analysis:

Steps:

Go to the Data tab.
Select Data Analysis.

In the Data Analysis window, select Sampling.
Click OK.

In the Sampling box,

Enter data in the Input Range.
In the Output Range box, enter the data range. You can also see the output in a new worksheet by selecting New Worksheet Ply or in a new workbook by selecting New Workbook.
Enter data in Number of Samples.
Check Labels.
Click OK.