# Analysis of Variance

Sometimes the simple linear regression model does not describe the relationship between two variables. To use regression analysis effectively, we must be able to differentiate the two cases.

### Breaking down the sum of squares total into its components.

The sum of squares total is the sum of squares regression (SSR) and the sum of squares error. The sum of squares regression is the difference of the sum squared between the mean of the dependent variable  and the value of the dependent variable based on the estimated regression line . Hence, SST= SSR+SSE. Let us use an example to explain this.

$${\text{Sum of Squares Total (SST)}}= \sum_{i=1}^n(Y_i-\bar{Y})2$$

$${\text{Sum of Squares Regression (SSR) }}= \sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2$$

$${\text{Sum of Squares Error (SSE) }}= \sum_{i=1}^n(Y_i-\hat{Y})2$$

Exhibit 1: Breakdown of Sum of Squares Total for ROA Model.

$$\small{\begin{array}{l|l|l|l|l|l}\textbf{Company}&{\textbf{ROA}\\ (\textbf{Y}_{\textbf{i}})}&{\textbf{CAPEX}\\ (\textbf{X}_{\textbf{i}})}&{\textbf{Predicted}\\ \textbf{ROA} (\widehat{\textbf{Y}})}&{\textbf{Variation}\\ \textbf{to be}\\ \textbf{Explained}\$$\textbf{Y}_{\textbf{i}}-\bar{\textbf{Y}})^{2}}&{\textbf{Variation}\\ \textbf{Unexplained}\\ (\textbf{Y}_{\textbf{i}}-\widehat{\textbf{Y}_{\textbf{i}}})^{2}}&{\textbf{Variation}\\ \textbf{Explained}\\ (\widehat{\textbf{Y}_{\textbf{i}}}-\bar{\textbf{Y}})^{2}}\\ \hline\text{A} & 15 & 5 & 8.969 & 39.0625 & 23.698& 1.909 \\ \hline \text{B} & 6 & 0.7 & 6.103 & 7.5625 & 0.0107 & 7.005 \\ \hline \text{C} & 10 & 8 & 12.942 & 1.5625 & 8.658 & 17.58\\ \hline\text{D} & 4.0 & 0.4 & 5.822 & 22.5625 & 3.321 & 8.57\\ \hline \textbf{Total} & & & & 70.75 & 35.687 &35.064\\ \hline\text{Mean} & 8.75 & & & &\\ \end{array}} From Exhibit 1 above, we see that Sum of squares error= 35.687 Sum of squares regression= 35.064 Sum of squares total = 35.687+35.064= 70.75 This sum of squares will be an important input when we come to measure the fit of the regression line. ### Measures of Goodness of Fit The standard error of the regression, the F-statistic, and the coefficient of determination for the test of fit are all measures used to evaluate how well the regression model fits the data (goodness fit). The coefficient of determination or R2 measures the proportion of the total variability of the dependent variable explained by the independent variable. R2 is calculated using the formula: {\text{Coefficient of Determination}}=\frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}} {\text{Coefficient of Determination}} =\frac{{\sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2}}{{\sum_{i=1}^n(Y_i-\bar{Y})2}} The coefficient of determination will range from 0% to 100%. From Exhibit 1 on the ROA regression model, our R2 would be 35.064÷ 70.75=0.4956= 49.56% which means that CAPEX explains 49.56% of the variation in ROA. The coefficient of determination is not a statistical test. It is descriptive. To show the statistical significance of a regression model, we use the F-distributed statistic, which is used to compare two variances. For simple regression analysis, F- distributed test statistic is used to determine if the slopes in regression are equal to zero against the alternative hypothesis that at least one slope is not equal to zero. The F- distributed statistic is formed by using the sum of squares error and the sum of squares regression, with each being adjusted for degrees of freedom. The sum of square regression is divided by the number of independent variables to arrive at the mean square regression (MSR). In simple linear regression, the independent variables are represented by k, which is equal to 1. {\text{MSR}}=\frac{\text{Sum of Squares Regression}}{\text{k}} {\text{MSR}}=\frac{{\sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2}}{{1}} Next, we go ahead and divide the sum of square errors by the degrees of freedom to calculate the mean square error (MSE). In simple linear regression, the degrees of freedom \(n-k-1$$ becomes $$n-2$$.$${\text{MSE}}=\frac{\text{Sum of Squares Error}}{\text{n-k-l}}{\text{MSE}}=\frac{{\sum_{i=1}^n(Y_i-\hat{Y})2}}{{n-2}}$$Therefore F- distributed test statistic is:$${\text{F}}=\frac{\text{MSR}}{\text{MSE}}$$The F- statistic in regression analysis is one-sided. The right side contains the rejection region because we want to determine if the variation in the numerator (Y explained) is larger than the variation in the denominator (Y unexplained). ## Question James, an analyst at QPC LTD, has estimated a model that regresses return on equity ROE against its growth opportunities (GO), which is its three-year compounded annual growth rate in sales over the past 15 years. He was able to estimate the sum of squares error and sum of squares regression as follows: Sum of Squares Error= 48.99 Sum of Squares Regression= 192.3 The Coefficient of Determination is closest to: 1. 214.29 2. 0797 3. 0.8927 ### Solution The correct answer is B.$${\text{The Coefficient of Determination}} = \frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}}

First, we calculate the sum of squares total by adding sum of squares regression to sum of squares error. 192.3+48.99= 241.29

R2=192.3÷241.29= 0.797 or 79.7%

A is incorrect. 241.29 is the sum of squares total

B is incorrect. 0.8927 is R which is the square root of the coefficient of determination.

Reading 0: Introduction to Linear Regression

LOS 0 (d) Calculate and interpret the coefficient of determination and the F-statistic in a simple linear regression

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

Subscribe to our newsletter and keep up with the latest and greatest tips for success

Daniel Glyn
2021-03-24
I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
michael walshe
2021-03-18
Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
Nyka Smith
2021-02-18
Every concept is very well explained by Nilay Arun. kudos to you man!