Coefficient of Determination and F-statistic

Coefficient of Determination and F-statistic

Sum of Squares Total (SST) and Its Components

Sum of Squares Total (total variation) is a measure of the total variation of the dependent variable. It is the sum of the squared differences of the actual y-value and mean of y-observations.

$$\text{SST} = \sum_{i=1}^{n}{(Y_i-\bar{Y})^2}$$

The Sum of Squares Total contains two parts:

  1. Sum of Square Regression (SSR).
  2. Sum of Squares Error (SSE).

Sum of Squares Regression (SSR)

The sum of squares regression is the measure of the explained variation in the dependent variable. It is given by the sum of the squared differences of the predicted y-value (\(\widehat{Y}\)), and mean of y-observations, (\(\bar{Y}\)):

$$\text{SSR} = \sum_{i=1}^{n}{(\widehat{Y}_i-\bar{Y})^2}$$

Sum of Squared Errors (SSE)

The sum of squared errors is also called the residual sum of squares. It is defined as the variation of the dependent variable unexplained by the independent variable. SSE is given by the sum of the squared differences of the actual y-value (\(Y_i\)), and the predicted y-values, (\(\widehat{Y}_i\)).

$$\text{SSE} = \sum_{i=1}^{n}{(Y_i-\widehat{Y}_i)^2}$$

Therefore, the sum of squares total is given by:

$$\begin{align}\text{Sum of Squares Total} &= \text{Explained Variation + Unexplained Variation}\\&= \text{SSR+ SSE}\end{align}$$

The components of the total variation are shown in the following figure.


Coefficient of Determination

The coefficient of determination (\(R^2\)) measures the proportion of the total variability of the dependent variable explained by the independent variable. It is calculated using the formula below:

$$R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}}=\frac{\text{Sum of Squares Regression (SSR)}}{\text{Sum of Squares Total (SST)}}$$

Intuitively we can think of the above formula as:

$$\begin{align}R^2&=\frac{\text{Total Variation-Unexplained Variation}}{\text{Total Variation}}\\ &=\frac{\text{Sum of Squares Total (SST)-Sum of Squared Errors (SSE)}}{\text{Sum of Squares Total}}\end{align}$$

Simplifying the above formula gives:

$$R^2 =1-\frac{\text{Sum of Squared Errors (SSE)}}{\text{Sum of Squares Total (SST)}}$$

Features of Coefficient of Determination (\(R^2\))

\(R^2\) lies between 0 and 1. A high \(R^2\) explains variability better than a low \(R^2\). If \(R^2= 0.01\), only 1% of the total variability can be explained. On the other hand, if \(R^2= 0.90\), over 90% of the total variability can be explained. In a nutshell, the higher the \(R^2\), the higher the explanatory power of the model.

For models with one independent variable, \(R^2\) is calculated by squaring the correlation coefficient between the dependent and the independent variables:



\(Cov\ (X,Y)\) = covariance between two variables, X and Y.

\(\sigma_X\) = standard deviation of X.

\(\sigma_{Y}\) = standard deviation of Y.

Example: Calculating Coefficient of Determination (\(R^2\))

An analyst determines that \(\sum_{i\ =\ 1}^{6}{\left(Y_i-\bar{Y}\right)^2=\ }0.0013844\) and \( \sum_{i\ =\ 1}^{6}\left(Y_i-\widehat{Y}\right)^2= 0.0003206\) from the regression analysis of inflation on unemployment. The coefficient of determination (\(R^2\)) is closest to:


$$\begin{align}R^2&=\frac{\text{Sum of Squares Total (SST)-Sum of Squarred Errors (SSE) }}{\text{Sum of Squares Total (SST)}}\\ &=\frac{\left(\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}-\sum_{i=1}^{n}\left(Y_{i}-\widehat{Y}\right)^{2}\right)}{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}}\\ &=\frac{0.0013844-0.0003206}{0.0013844}=0.7684=76.84\%\end{align}$$

F-statistic in Simple Regression Model

Note that the coefficient of variation discussed above is just a descriptive value. To check the statistical significance of a regression model, we use the F-test. The F-test requires us to calculate the F-statistic.

The F-test confirms whether the slope (denoted by \(b_i\)) in a regression model is equal to zero. In a typical simple linear regression hypothesis, the null hypothesis is formulated as: \(H_0:b_1=0\) against the alternative hypothesis, \(H_1:b_1\neq 0\). The null hypothesis is rejected if the confidence interval at the desired significance level excludes zero.

The Sum of Squares Regression (SSR) and Sum of Squares Error (SSE) are employed to calculate the F-statistic. In the calculation, both the Sum of Squares Regression (SSR) and Sum of Squares Error (SSE) are adjusted for the degrees of freedom.

The Sum of Squares Regression is divided by the number of independent variables, \(k\), to get the Mean Square Regression (MSR). That is:

$$MSR=\frac{SSR}{k}\ =\ \frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{k}$$

Since we only have \(k =1\), in a simple linear regression model, the above formula changes to:

$$MSR=\frac{SSR}{1}=\frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{1}=\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2$$

Therefore, in Simple Linear Regression Model, MSR = SSR.

Also, the Sum of Squares Error (SSE) is divided by degrees of freedom given by \(n-k-1\) (this translates to \(n-2\) for simple linear regression) to arrive at Mean Square Error (MSE). That is,

$$MSE=\frac{\text{Sum of Squares Error (SSE)}}{n-k-1}=\frac{\sum_{i=1}^{n}\left(Y_i-\widehat{Y}\right)^2}{n-k-1}$$

For a simple linear regression model,

$$MSE\ =\frac{\text{Sum of Squares Error(SSE)}}{n-2}\ \ =\ \ \frac{\sum_{i\ \ =\ \ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}$$

Finally, to calculate the F-statistic for the linear regression, we find the ratio of MSR to MSE. That is,

$$ F-statistic\ =\ \frac{MSR}{MSE}\ =\ \frac{\frac{SSR}{k}}{\frac{SSE}{n-k-1}}\ =\ \frac{\frac{\sum_{i=1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{k}}{\frac{\sum_{i\ =\ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-k-1}} $$

For simple linear regression, this translates to:

$$F-statistic=\frac{MSR}{MSE}=\frac{\frac{SSR}{k}}{\frac{SSE}{n-k-1}}\ =\ \frac{\frac{\sum_{i=1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{1}}{\frac{\sum_{i\ =\ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}}\ =\ \frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{\frac{\sum_{i\ =\ 1}^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}}$$

The F-statistic in simple linear regression is F-distributed with \(1\) and \(n-2\) degrees of freedom. That is,

$$\frac{MSR}{MSE}\sim F_{1,n-2}$$

Interpretation of F-test Statistic

A large F-statistic value proves that the regression model is effective in its explanation of the variation in the dependent variable and vice versa. On the contrary, an F-statistic of 0 indicates that the independent variable does not explain the variation in the dependent variable.

We reject the null hypothesis if the calculated value of the F-statistic is greater than the critical F-value.

It is worth mentioning that F-statistics are not commonly used in regressions with one independent variable. This is because F-statistic is equal to the square of the t-statistic for the slope coefficient, which implies the same thing as the t-test.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep

    Sergio Torrico
    Sergio Torrico
    Excelente para el FRM 2 Escribo esta revisión en español para los hispanohablantes, soy de Bolivia, y utilicé AnalystPrep para dudas y consultas sobre mi preparación para el FRM nivel 2 (lo tomé una sola vez y aprobé muy bien), siempre tuve un soporte claro, directo y rápido, el material sale rápido cuando hay cambios en el temario de GARP, y los ejercicios y exámenes son muy útiles para practicar.
    So helpful. I have been using the videos to prepare for the CFA Level II exam. The videos signpost the reading contents, explain the concepts and provide additional context for specific concepts. The fun light-hearted analogies are also a welcome break to some very dry content. I usually watch the videos before going into more in-depth reading and they are a good way to avoid being overwhelmed by the sheer volume of content when you look at the readings.
    Kriti Dhawan
    Kriti Dhawan
    A great curriculum provider. James sir explains the concept so well that rather than memorising it, you tend to intuitively understand and absorb them. Thank you ! Grateful I saw this at the right time for my CFA prep.
    nikhil kumar
    nikhil kumar
    Very well explained and gives a great insight about topics in a very short time. Glad to have found Professor Forjan's lectures.
    Great support throughout the course by the team, did not feel neglected
    Benjamin anonymous
    Benjamin anonymous
    I loved using AnalystPrep for FRM. QBank is huge, videos are great. Would recommend to a friend
    Daniel Glyn
    Daniel Glyn
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.