Regression analysis is in almost every type of statistical software like SPSS, R, and not to mention Excel. Regression can give us the big picture about relationships between variables. Linear regression can be done pretty quickly in Excel using the Data Analysis tool. This article will show how you can interpret regression results in Excel.
What Is Regression?
Regression analysis is often used in data analysis to determine the associations among multiple variables. Regression analysis allows you to choose what happens to the dependent variable if one of the independent variables alters. It also lets you figure out mathematically which independent variables have an influence.
Simple linear regression is distinct from multiple linear regression in statistics. Using a linear function, simple linear regression analyses the association between the variables and one independent variable. Multiple linear regression is when two or more explanatory factors are used to determine the variables. Using nonlinear regression instead of the dependent variable is described as a nonlinear function since the data relationships are not linear. This article will concentrate on multiple linear regression on a data set to demonstrate how you can interpret regression results in Excel.
How to Do Regression in Excel
For regression purposes, we will use the below dataset for analysis purposes. Here the independent variable will be the Price column and Sold column. The independent column will be the Demand column.
- We need to go to the Data tab and click on the Data Analysis to do regression.
- There will be a new window; select the dependent variable and independent variable data range.
- Then tick the Labels box and Confidence box.
- Then click on the output cell range box to select the output cell address
- Next, tick on the Residual to calculate the residuals.
- After that, tick the Residual plots and Line Fit Plots boxes
- Click OK after this.
- After clicking OK, the primary output parameters of the analysis will be at the specified cells.
- Then you will also get some parameters such as Significance value etc in the ANOVA (Analysis of Variance) table.
- Here, df denotes the degree of freedom related to the source of variance.
- SS denotes the sum of squares. Your model will reflect the data better if the Residual SS is smaller than the Total SS.
- MS means square.
- F denotes the F-test for the null hypothesis.
- Significance F denotes the P-value of F.
- Then you will also get the variable’s coefficients, significance value, etc in a table.
- Then you will get a final table below the coefficient table which contains the residual value for each entry.
- Next, you will get the Demand vs Price regression chart, with a trendline.
- After this, you get the Demand vs Sold regression chart with a trendline.
- There is another chart showing the distribution of residuals of each entry from the Sold variable.
- There is another chart showing the distribution of residuals of each entry from the Price variable.
Next, we will show you how you can interpret these regression results in Excel.
Read More: How to Do Logistic Regression in Excel
How to Interpret Regression Results in Excel: Detailed Analysis
The next thing you need to do after doing the regression analysis and interpret them. The outcomes are described and elaborated on below.
Multiple R-Squared Regression Value Analysis
The R-squared number indicates how closely the dataset’s elements are related and how well the regression line matches the data. We are going to use the multiple linear regression analysis, in which we are going to determine the impact of two or more variables on the main factor. This refers to how the dependent variable changes as one of the independent variables changes. The range of this coefficient is from -1 to 1. Here,
- 1 means a close positive relationship
- 0 means there are no relationships among variables. In other words, the data points are random.
- -1 means inverse or negative relationship among variables.
In the output results shown above, the multiple R-value of the given data sets is o.7578(approx), which indicates strong relations between the variables.
R squared value explains how the response of dependent variables varies to the independent variable. In our case, the value is 0.574(approx), which can be interpreted as a reasonably okay relationship between the variables.
This is merely an alternate version of the R squared value. This simply shuffles the predictor variables while forecasting the response variable. It calculates as
R^2 = 1 – [(1-R^2)*(n-1)/(n-k-1)]
Here, R^2: The R^2 value we got from the dataset.
n: the number of observations.
K: the number of predictor variables.
The significance of this value arises while doing regression analysis between two predictor variables. If there is more than one predictor variable in the dataset, then the R squared value will be inflated, which is highly undesirable. The adjusted R squared value adjusts this inflation and gives an accurate picture of the variables.
Another goodness-of-fit metric that indicates the accuracy of your regression analysis; the lower the value, the more sure you can be in your regression analysis.
Standard Error is an empirical metric representing the average distance that the points deviate from the trendline. In contrast, R2 represents the proportion of dependent variable variation. In this case, the value of Standard Error is 288.9 (approx), which denotes that our data points, on average, drop 288.9 from the trendline.
Indicate the number of observations or entries.
Determine Significant Variable
The Significance value indicates the trustworthiness (statistically sound) of our analysis. In other words, it denotes the probability of our dataset being wrong. This value should be below 5%. But in this case, our significance value is 0.00117, which translates to 0.1%, which is well below the 5%. So our analysis is ok. Otherwise, we may have to choose different variables for our analysis.
P-value in Regression Analysis
Closely linked to a significant value, the P-value denotes the probability of the coefficient value being wrong. P-value denotes the association of the null hypothesis with the variables.
If your p-value < the Significance number, there is enough evidence to reject the null value hypothesis. This means there is a non-zero correlation between the variables.
But if the p-value > Significance value, there will be insufficient evidence to dismiss the null hypothesis. That denotes that there could be no correlation between the variables.
In or case, the P-value of variable Price =0.000948 < 0.00117 (significance value),
So there is no null hypothesis going on here, and there is enough evidence to declare a correlation between variables.
On the other hand, for the variable Sold, the (P-value) 0.0038515 < 0.0011723 (Significance value)
So there could be a null hypothesis going on here, and there is not enough evidence to declare a non-zero correlation between variables.
In most cases, this P-value determines whether a variable will be in the dataset or not. For instance, we should remove the Sold variable to preserve the robustness of the dataset.
As we determine the linear regression analysis in Excel, the trend line should also be linear. The general form is:
Here, Y is the dependent variable.
And X is the independent variable here, meaning that we will determine the effect of the change of variable x on variable Y.
C will just be the value of the Y-axis intersection of the line.
In this case, the value of the C intercept is equal to 9502.109853
And the value of m for the two variables is -809.265 and 0.424818.
So, we have the final equation for the two separate variables.
The first one is:
And the equation for the second variable is:
The coefficients we got are m1=-809.2655 and m2=04248. And interceptor, C= 9502.12.
- First, the interceptor value indicates that the demand will be 9502 when the price is zero.
- And the values of m denote the rate at which demand changes per unit of price change. The price coefficient value is -809.265, indicating that a per unit increase in price will drop the demand by roughly 809 units.
- For the second variable, Sold, the m value is 0.424. This denotes that the change per unit sold item will result in a 0424-time unit increase of the product.
The Residual difference between the original and the calculated entry from the regression line is the difference. Residuals indicate how distant the actual value is from the line. For example, the computed entry from the regression analysis for the first entry is 9497. And the first original value is 9500. So the residual is around 2.109.
T-statics value is the division of coefficient by the standard value. The higher the value is, the better reliability of the coefficient indicates.
There is another significance of this value, which is required to calculate the P-value in linear regression.
The 95% Confidence Interval
Here the confidence of the variable we set as 95 at the beginning. It can change, though.
- Here, the coefficient value of the lower 95% is calculated as 8496.84 means the upper 95% is calculated as 10507.37,
- This means that while our main coefficient is about, 9502.1.there is a high chance that the value could be below 8496 for 95% of cases and a 5% chance of it being over 10507.37
Things to Remember
✎ The regression analysis method solely assesses the relationship among variables under examination. It doesn’t establish causation. In other ways, only the aspect of correlation considers. When some act causes something, it’s become causation. When an alteration of one variable creates changes, it can be considered causation.
✎ Regression analysis hampers heavily by outliers. All kinds of outliers must be removed before analysis is done. To analyze and interpret regression analysis results in Excel, you need to consider these points.
Download Practice Workbook
Download this practice workbook below.
To sum it up, the question “how to interpret regression results in Excel” is answered by elaborately analyzing and later interpreting it. The analysis is done through the Data Analysis tool in the Data tab.
For this problem, a workbook is available for download where you can practice regression analysis and interpret it.
Feel free to ask any questions or feedback through the comment section.