A Review of Multiple Linear Regression ...
Multiple linear regression describes the variation of the dependent variable by using two... Read More
Sometimes the simple linear regression model does not describe the relationship between two variables. To use regression analysis effectively, we must be able to differentiate the two cases.
The sum of squares total is the sum of squares regression (SSR) and the sum of squares error. The sum of squares regression is the difference of the sum squared between the mean of the dependent variable and the value of the dependent variable based on the estimated regression line . Hence, SST= SSR+SSE. Let us use an example to explain this.
$${\text{Sum of Squares Total (SST)}}= \sum_{i=1}^n(Y_i-\bar{Y})2 $$
$${\text{Sum of Squares Regression (SSR) }}= \sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2 $$
$${\text{Sum of Squares Error (SSE) }}= \sum_{i=1}^n(Y_i-\hat{Y})2 $$
Exhibit 1: Breakdown of Sum of Squares Total for ROA Model.
$$\small{\begin{array}{l|l|l|l|l|l}\textbf{Company}&{\textbf{ROA}\\ (\textbf{Y}_{\textbf{i}})}&{\textbf{CAPEX}\\ (\textbf{X}_{\textbf{i}})}&{\textbf{Predicted}\\ \textbf{ROA} (\widehat{\textbf{Y}})}&{\textbf{Variation}\\ \textbf{to be}\\ \textbf{Explained}\\(\textbf{Y}_{\textbf{i}}-\bar{\textbf{Y}})^{2}}&{\textbf{Variation}\\ \textbf{Unexplained}\\ (\textbf{Y}_{\textbf{i}}-\widehat{\textbf{Y}_{\textbf{i}}})^{2}}&{\textbf{Variation}\\ \textbf{Explained}\\ (\widehat{\textbf{Y}_{\textbf{i}}}-\bar{\textbf{Y}})^{2}}\\ \hline\text{A} & 15 & 5 & 8.969 & 39.0625 & 23.698& 1.909 \\ \hline \text{B} & 6 & 0.7 & 6.103 & 7.5625 & 0.0107 & 7.005 \\ \hline \text{C} & 10 & 8 & 12.942 & 1.5625 & 8.658 & 17.58\\ \hline\text{D} & 4.0 & 0.4 & 5.822 & 22.5625 & 3.321 & 8.57\\ \hline \textbf{Total} & & & & 70.75 & 35.687 &35.064\\ \hline\text{Mean} & 8.75 & & & &\\ \end{array}}$$
From Exhibit 1 above, we see that
Sum of squares error= 35.687
Sum of squares regression= 35.064
Sum of squares total = 35.687+35.064= 70.75
This sum of squares will be an important input when we come to measure the fit of the regression line.
The standard error of the regression, the F-statistic, and the coefficient of determination for the test of fit are all measures used to evaluate how well the regression model fits the data (goodness fit). The coefficient of determination or R2 measures the proportion of the total variability of the dependent variable explained by the independent variable. R2 is calculated using the formula:
$${\text{Coefficient of Determination}}=\frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}}$$
$${\text{Coefficient of Determination}} =\frac{{\sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2}}{{\sum_{i=1}^n(Y_i-\bar{Y})2}}$$
The coefficient of determination will range from 0% to 100%. From Exhibit 1 on the ROA regression model, our R2 would be 35.064÷ 70.75=0.4956= 49.56% which means that CAPEX explains 49.56% of the variation in ROA. The coefficient of determination is not a statistical test. It is descriptive. To show the statistical significance of a regression model, we use the F-distributed statistic, which is used to compare two variances. For simple regression analysis, F- distributed test statistic is used to determine if the slopes in regression are equal to zero against the alternative hypothesis that at least one slope is not equal to zero.
The F- distributed statistic is formed by using the sum of squares error and the sum of squares regression, with each being adjusted for degrees of freedom. The sum of square regression is divided by the number of independent variables to arrive at the mean square regression (MSR). In simple linear regression, the independent variables are represented by k, which is equal to 1.
$${\text{MSR}}=\frac{\text{Sum of Squares Regression}}{\text{k}}$$
$${\text{MSR}}=\frac{{\sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2}}{{1}}$$
Next, we go ahead and divide the sum of square errors by the degrees of freedom to calculate the mean square error (MSE). In simple linear regression, the degrees of freedom \(n-k-1\) becomes \(n-2\).
$${\text{MSE}}=\frac{\text{Sum of Squares Error}}{\text{n-k-l}}$$
$${\text{MSE}}=\frac{{\sum_{i=1}^n(Y_i-\hat{Y})2}}{{n-2}}$$
Therefore F- distributed test statistic is:
$${\text{F}}=\frac{\text{MSR}}{\text{MSE}}$$
The F- statistic in regression analysis is one-sided. The right side contains the rejection region because we want to determine if the variation in the numerator (Y explained) is larger than the variation in the denominator (Y unexplained).
Question
James, an analyst at QPC LTD, has estimated a model that regresses return on equity ROE against its growth opportunities (GO), which is its three-year compounded annual growth rate in sales over the past 15 years. He was able to estimate the sum of squares error and sum of squares regression as follows:
Sum of Squares Error= 48.99
Sum of Squares Regression= 192.3
The Coefficient of Determination is closest to:
- 214.29
- 0797
- 0.8927
Solution
The correct answer is B.
$${\text{The Coefficient of Determination}} = \frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}}$$
First, we calculate the sum of squares total by adding sum of squares regression to sum of squares error. 192.3+48.99= 241.29
R2=192.3÷241.29= 0.797 or 79.7%
A is incorrect. 241.29 is the sum of squares total
B is incorrect. 0.8927 is R which is the square root of the coefficient of determination.
Reading 0: Introduction to Linear Regression
LOS 0 (d) Calculate and interpret the coefficient of determination and the F-statistic in a simple linear regression