ANOVA Table and Measures of Goodness of Fit

ANOVA Table and Measures of Goodness of Fit

R-squared \(\bf{(R^2)}\) measures how well an estimated regression fits the data. It is also known as the coefficient of determination and can be formulated as:

$$ R^2=\frac{\text{Sum of regression squares}}{\text{Sum of squares total}}=\frac{{\sum_{i=1}^{n}{(\widehat{Y_i}-\bar{Y})}}^2}{{\sum_{i=1}^{n}{(Y_i-\bar{Y})}}^2} $$

Where:

\(n\) = Number of observations.

\(Y_i\) = Dependent variable observations.

\(\widehat{Y_i}\) = Dependent variables predicted value to the independent variable.

\(\bar{Y}\)= Dependent variable mean.

In the presence of independent variables, \(R^2\) will either increase or remain constant. However, \(R^2\) cannot be used to measure the goodness of fit of a model as it will not decrease with the addition of independent variables.

Limitations of R2

  • It is impossible to determine the statistical significance of the coefficients from \(R^2\).
  • A bias in the predicted coefficients or estimates cannot be determined with \(R^2\).
  • When a model is good, it has a high \(R^2\); when it is bad, it has a low \(R^2\), usually due to overfitting and biases in the model.

An overfitted regression model is one with too many independent variables to the number of observations in a sample. Overfitting may produce coefficients that do not reflect the true relationship between the independent and dependent variables.

Multiple regression software packages usually produce an adjusted \(\bf{R^2} (\bar{R}^2)\) as an alternative measure of goodness of fit. Using adjusted \(R^2\) in regression is beneficial since it does not automatically increase when more independent variables are included, given that it adjusts for degrees of freedom.

$$ \bar{R^2}=1-\left[\cfrac{\frac{\text{Sum of squares error} }{n-k-1}}{\frac{\text{Sum of squares total}}{n-1}}\right] $$

Therefore, the relationship between \(\bar{R^2}\) and \(R^2\) can mathematically be derived as follows:

$$ \bar{R^2}=1-\left[\left(\frac{n-1}{n-k-1}\right)\ \left(1-R^2\right)\right] $$

Note that:

  • If \(k \geq 1\) then \(R^2 > \text{adjusted } R^2\) the result is that adjusted \(R^2\) can be negative while \(R^2\) is zero at minimum.

When including a new variable in the regression, the following should be taken into consideration:

  • \(\bar{R^2}\) increases when the coefficient t-statistic is \(> \left|1.0\right|\).
  • \(\bar{R^2}\) decreases when the coefficient t-statistic is \(< \left|1.0\right|\).
  • At typical significance levels, 5% and 1%, a t-statistic with an absolute value of 1.0 does not indicate that the independent variable is different from zero. Therefore, the adjusted \(R^2\) doesn’t demonstrate that it will increase significantly.

ANOVA Table

One of the outputs of multiple regression is the ANOVA table. The following shows the general structure of an Anova table.

$$ \begin{array}{c|c|c|c} \textbf{ANOVA} & \textbf{Df (degrees} & \textbf{SS (Sum of squares)} & \textbf{MSS (Mean sum} \\ & \textbf{of freedom)} & & \textbf{of squares)}\\ \hline \text{Regression} & k & \text{RSS} & MSR \\ & & \text{(Explained variation)} & \\ \hline \text{Residual} & n-(k+1) & \text{SSE} & MSE \\ & & \text{(Unexplained variation)} & \\ \hline \text{Total} & n-1 & \text{SST} & \\ & & \text{(Total variation) } & \end{array} $$

We can use the information in an ANOVA table to determine \(R^2\), the F-statistic, and the standard error estimates (SEE) as expressed below:

$$ R^2=\frac{RSS}{SST} $$

$$ F=\frac{MSR}{MSE} $$

$$ SEE=\sqrt{MSE} $$

Where:

$$ \begin{align*} MSR & =\frac{RSS}{k} \\ MSE & =\frac{SSE}{n-k-1} \end{align*} $$

Example: Interpreting Regression Output

Consider the following regression results generated from multiple regression analysis of the price of the US Dollar index on the inflation rate and real interest rate.

$$ \begin{array}{cccc} \text{ANOVA} & & & \\ \hline & \text{df} & \text{SS} & \text{Significance F} \\ \hline \text{Regression} & 2 & 432.2520 & 0.0179 \\ \text{Residual} & 7 & 200.6349 & \\ \text{Total} & 9 & 632.8869 & \\ \hline \\ & \text{Coefficients} & \text{Standard Error} & \\ \hline \text{Intercept} & 81 & 7.9659 & \\ \text{Inflation rates} & -276 & 233.0748 & \\ \text{Real interest Rates} & 902 & 279.6949 & \\ \hline \end{array} $$

Given the above information, the regression equation can be expressed as:

$$ P=81-276INF+902IR $$

Where:

\(P\) = Price of USDX.

\(INF\) = Inflation rate.

\(IR\) = Real interest rate.

\(R^2\) and adjusted \(R^2\) can also be calculated as follows:

$$ \begin{align*} R^2 & =\frac{RSS}{SST}=\frac{432.2520}{632.8869}=0.6830=68.30\% \\ \\ \text{Adjusted } R^2 & =1-\left(\frac{n-1}{n-k-1}\right)\left(1-R^2\right)=1-\frac{10-1}{10-2-1}\left(1-0.6830\right) \\   & =0.5924 = 59.24\% \end{align*} $$

It’s important to note the following:

  • Multiple regression does not provide a straightforward explanation of adjusted \(R^2\) in terms of the variance explained by the dependent variable, as is the case in simple regression.
  • Adjusted \(R^2\) does not indicate whether a regression coefficient’s predictions are true or biased. Residual plots and other statistics are required to determine whether or not the predictions are accurate.
  • To assess the significance of the model’s fit, we use the F-Statistic and other goodness-of-fit metrics from the ANOVA rather than \(R^2\) and adjusted \(R^2\).

Question

Which of the following is most appropriate for adjusted \(R^2\)?

  1. It is always positive.
  2. It may or may not increase when one adds an independent variable.
  3. It is non-decreasing in the number of independent variables.

Solution

The correct answer is B.

The value of the adjusted \(R^2\) increases only when the added independent variables improve the fit of the regression model. Moreover, it decreases when the added variables do not improve the model fit sufficiently.

A is incorrect: The adjusted \(R^2\) can be negative if \(R^2\) is low enough. However, multiple \(R^2\) is always positive.

C is incorrect: The adjusted \(R^2\) can decrease when the added variables do not improve the model fit by a good enough amount. However, multiple \(R^2\) is non-decreasing in the number of independent variables. For this reason, it is less reliable as a measure of goodness of fit in regression with more than one independent variable than in a one-independent variable regression.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success

    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.