Analysis of Variance

Analysis of Variance

Sometimes the simple linear regression model does not describe the relationship between two variables. To use regression analysis effectively, we must be able to differentiate the two cases.

Breaking down the sum of squares total into its components.

The sum of squares total is the sum of squares regression (SSR) and the sum of squares error. The sum of squares regression is the difference of the sum squared between the mean of the dependent variable  and the value of the dependent variable based on the estimated regression line . Hence, SST= SSR+SSE. Let us use an example to explain this.

$${\text{Sum  of  Squares  Total (SST)}}= \sum_{i=1}^n(Y_i-\bar{Y})2 $$

$${\text{Sum  of  Squares  Regression (SSR) }}= \sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2 $$

$${\text{Sum  of  Squares  Error (SSE) }}= \sum_{i=1}^n(Y_i-\hat{Y})2 $$

Exhibit 1: Breakdown of Sum of Squares Total for ROA Model.

$$\small{\begin{array}{l|l|l|l|l|l}\textbf{Company}&{\textbf{ROA}\\ (\textbf{Y}_{\textbf{i}})}&{\textbf{CAPEX}\\ (\textbf{X}_{\textbf{i}})}&{\textbf{Predicted}\\ \textbf{ROA} (\widehat{\textbf{Y}})}&{\textbf{Variation}\\ \textbf{to be}\\ \textbf{Explained}\\(\textbf{Y}_{\textbf{i}}-\bar{\textbf{Y}})^{2}}&{\textbf{Variation}\\ \textbf{Unexplained}\\ (\textbf{Y}_{\textbf{i}}-\widehat{\textbf{Y}_{\textbf{i}}})^{2}}&{\textbf{Variation}\\ \textbf{Explained}\\ (\widehat{\textbf{Y}_{\textbf{i}}}-\bar{\textbf{Y}})^{2}}\\ \hline\text{A} & 15 & 5 & 8.969 & 39.0625 & 23.698& 1.909 \\ \hline \text{B} & 6 & 0.7 & 6.103 & 7.5625 & 0.0107 & 7.005 \\ \hline \text{C} & 10 & 8 & 12.942 & 1.5625 & 8.658 & 17.58\\ \hline\text{D} & 4.0 & 0.4 & 5.822 & 22.5625 & 3.321 & 8.57\\ \hline \textbf{Total} & & & & 70.75 & 35.687 &35.064\\ \hline\text{Mean} & 8.75 & & & &\\ \end{array}}$$

From Exhibit 1 above, we see that
Sum of squares error= 35.687

Sum of squares regression= 35.064

Sum of squares total = 35.687+35.064= 70.75

This sum of squares will be an important input when we come to measure the fit of the regression line.

Measures of Goodness of Fit

The standard error of the regression, the F-statistic, and the coefficient of determination for the test of fit are all measures used to evaluate how well the regression model fits the data (goodness fit). The coefficient of determination or R2 measures the proportion of the total variability of the dependent variable explained by the independent variable. R2 is calculated using the formula:

 $${\text{Coefficient  of  Determination}}=\frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}}$$

$${\text{Coefficient of Determination}} =\frac{{\sum_{i=1}^n(\hat{Y_i}-\bar{Y})^2}}{{\sum_{i=1}^n(Y_i-\bar{Y})2}}$$

The coefficient of determination will range from 0% to 100%. From Exhibit 1 on the ROA regression model, our R2 would be 35.064÷ 70.75=0.4956= 49.56% which means that CAPEX explains 49.56% of the variation in ROA. The coefficient of determination is not a statistical test. It is descriptive. To show the statistical significance of a regression model, we use the F-distributed statistic, which is used to compare two variances. For simple regression analysis, F- distributed test statistic is used to determine if the slopes in regression are equal to zero against the alternative hypothesis that at least one slope is not equal to zero.

The F- distributed statistic is formed by using the sum of squares error and the sum of squares regression, with each being adjusted for degrees of freedom. The sum of square regression is divided by the number of independent variables to arrive at the mean square regression (MSR). In simple linear regression, the independent variables are represented by k, which is equal to 1.

$${\text{MSR}}=\frac{\text{Sum of Squares Regression}}{\text{k}}$$


Next, we go ahead and divide the sum of square errors by the degrees of freedom to calculate the mean square error (MSE). In simple linear regression, the degrees of freedom \(n-k-1\) becomes \(n-2\).

$${\text{MSE}}=\frac{\text{Sum of Squares Error}}{\text{n-k-l}}$$


Therefore F- distributed test statistic is:


The F- statistic in regression analysis is one-sided. The right side contains the rejection region because we want to determine if the variation in the numerator (Y explained) is larger than the variation in the denominator (Y unexplained).


James, an analyst at QPC LTD, has estimated a model that regresses return on equity ROE against its growth opportunities (GO), which is its three-year compounded annual growth rate in sales over the past 15 years. He was able to estimate the sum of squares error and sum of squares regression as follows:

Sum of Squares Error= 48.99

Sum of Squares Regression= 192.3

The Coefficient of Determination is closest to:

  1. 214.29
  2. 0797
  3. 0.8927


The correct answer is B.

$${\text{The Coefficient of Determination}} = \frac{\text{Sum of Squares Regression}}{\text{Sum of Squares Total}}$$

First, we calculate the sum of squares total by adding sum of squares regression to sum of squares error. 192.3+48.99= 241.29

R2=192.3÷241.29= 0.797 or 79.7%

A is incorrect. 241.29 is the sum of squares total

B is incorrect. 0.8927 is R which is the square root of the coefficient of determination.

Reading 0: Introduction to Linear Regression

LOS 0 (d) Calculate and interpret the coefficient of determination and the F-statistic in a simple linear regression

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep

    Daniel Glyn
    Daniel Glyn
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    Crisp and short ppt of Frm chapters and great explanation with examples.