Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10.

Coefficient of Determination and F-statistic

quantitative-methods

Coefficient of Determination and F-statistic

29 Oct 2021

Sum of Squares Total (SST) and Its Components

Sum of Squares Total (total variation) is a measure of the total variation of the dependent variable. It is the sum of the squared differences of the actual y-value and mean of y-observations.

$$\text{SST} = \sum_{i=1}^{n}{(Y_i-\bar{Y})^2}$$

The Sum of Squares Total contains two parts:

Sum of Square Regression (SSR).
Sum of Squares Error (SSE).

Sum of Squares Regression (SSR)

The sum of squares regression is the measure of the explained variation in the dependent variable. It is given by the sum of the squared differences of the predicted y-value (\(\widehat{Y}\)), and mean of y-observations, (\(\bar{Y}\)):

$$\text{SSR} = \sum_{i=1}^{n}{(\widehat{Y}_i-\bar{Y})^2}$$

Sum of Squared Errors (SSE)

The sum of squared errors is also called the residual sum of squares. It is defined as the variation of the dependent variable unexplained by the independent variable. SSE is given by the sum of the squared differences of the actual y-value (\(Y_i\)), and the predicted y-values, (\(\widehat{Y}_i\)).

$$\text{SSE} = \sum_{i=1}^{n}{(Y_i-\widehat{Y}_i)^2}$$

Therefore, the sum of squares total is given by:

$$\begin{align}\text{Sum of Squares Total} &= \text{Explained Variation + Unexplained Variation}\\&= \text{SSR+ SSE}\end{align}$$

The components of the total variation are shown in the following figure.

Coefficient of Determination

The coefficient of determination (\(R^2\)) measures the proportion of the total variability of the dependent variable explained by the independent variable. It is calculated using the formula below:

$$R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}}=\frac{\text{Sum of Squares Regression (SSR)}}{\text{Sum of Squares Total (SST)}}$$

Intuitively we can think of the above formula as:

$$\begin{align}R^2&=\frac{\text{Total Variation-Unexplained Variation}}{\text{Total Variation}}\\ &=\frac{\text{Sum of Squares Total (SST)-Sum of Squared Errors (SSE)}}{\text{Sum of Squares Total}}\end{align}$$

Simplifying the above formula gives:

$$R^2 =1-\frac{\text{Sum of Squared Errors (SSE)}}{\text{Sum of Squares Total (SST)}}$$

Features of Coefficient of Determination (\(R^2\))

\(R^2\) lies between 0 and 1. A high \(R^2\) explains variability better than a low \(R^2\). If \(R^2= 0.01\), only 1% of the total variability can be explained. On the other hand, if \(R^2= 0.90\), over 90% of the total variability can be explained. In a nutshell, the higher the \(R^2\), the higher the explanatory power of the model.

For models with one independent variable, \(R^2\) is calculated by squaring the correlation coefficient between the dependent and the independent variables:

$$R^2=\left(\frac{Cov\left(X,Y\right)}{\sigma_X\sigma_Y}\right)^2$$

Where:

\(Cov\ (X,Y)\) = covariance between two variables, X and Y.

\(\sigma_X\) = standard deviation of X.

\(\sigma_{Y}\) = standard deviation of Y.

Example: Calculating Coefficient of Determination (\(R^2\))

An analyst determines that \(\sum_{i\ =\ 1}^{6}{\left(Y_i-\bar{Y}\right)^2=\ }0.0013844\) and \( \sum_{i\ =\ 1}^{6}\left(Y_i-\widehat{Y}\right)^2= 0.0003206\) from the regression analysis of inflation on unemployment. The coefficient of determination (\(R^2\)) is closest to:

Solution

$$\begin{align}R^2&=\frac{\text{Sum of Squares Total (SST)-Sum of Squarred Errors (SSE) }}{\text{Sum of Squares Total (SST)}}\\ &=\frac{\left(\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}-\sum_{i=1}^{n}\left(Y_{i}-\widehat{Y}\right)^{2}\right)}{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}\right)^{2}}\\ &=\frac{0.0013844-0.0003206}{0.0013844}=0.7684=76.84\%\end{align}$$

F-statistic in Simple Regression Model

Note that the coefficient of variation discussed above is just a descriptive value. To check the statistical significance of a regression model, we use the F-test. The F-test requires us to calculate the F-statistic.

The F-test confirms whether the slope (denoted by \(b_i\)) in a regression model is equal to zero. In a typical simple linear regression hypothesis, the null hypothesis is formulated as: \(H_0:b_1=0\) against the alternative hypothesis, \(H_1:b_1\neq 0\). The null hypothesis is rejected if the confidence interval at the desired significance level excludes zero.

The Sum of Squares Regression (SSR) and Sum of Squares Error (SSE) are employed to calculate the F-statistic. In the calculation, both the Sum of Squares Regression (SSR) and Sum of Squares Error (SSE) are adjusted for the degrees of freedom.

The Sum of Squares Regression is divided by the number of independent variables, \(k\), to get the Mean Square Regression (MSR). That is:

$$MSR=\frac{SSR}{k}\ =\ \frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{k}$$

Since we only have \(k =1\), in a simple linear regression model, the above formula changes to:

$$MSR=\frac{SSR}{1}=\frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{1}=\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2$$

Therefore, in Simple Linear Regression Model, MSR = SSR.

Also, the Sum of Squares Error (SSE) is divided by degrees of freedom given by \(n-k-1\) (this translates to \(n-2\) for simple linear regression) to arrive at Mean Square Error (MSE). That is,

$$MSE=\frac{\text{Sum of Squares Error (SSE)}}{n-k-1}=\frac{\sum_{i=1}^{n}\left(Y_i-\widehat{Y}\right)^2}{n-k-1}$$

For a simple linear regression model,

$$MSE\ =\frac{\text{Sum of Squares Error(SSE)}}{n-2}\ \ =\ \ \frac{\sum_{i\ \ =\ \ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}$$

Finally, to calculate the F-statistic for the linear regression, we find the ratio of MSR to MSE. That is,

$$ F-statistic\ =\ \frac{MSR}{MSE}\ =\ \frac{\frac{SSR}{k}}{\frac{SSE}{n-k-1}}\ =\ \frac{\frac{\sum_{i=1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{k}}{\frac{\sum_{i\ =\ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-k-1}} $$

For simple linear regression, this translates to:

$$F-statistic=\frac{MSR}{MSE}=\frac{\frac{SSR}{k}}{\frac{SSE}{n-k-1}}\ =\ \frac{\frac{\sum_{i=1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{1}}{\frac{\sum_{i\ =\ 1\ }^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}}\ =\ \frac{\sum_{i\ =\ 1}^{n}\left(\widehat{Y}_i-\bar{Y}\right)^2}{\frac{\sum_{i\ =\ 1}^{n}\left(Y_i-\widehat{Y}\right)^2}{n-2}}$$

The F-statistic in simple linear regression is F-distributed with \(1\) and \(n-2\) degrees of freedom. That is,

$$\frac{MSR}{MSE}\sim F_{1,n-2}$$

Interpretation of F-test Statistic

A large F-statistic value proves that the regression model is effective in its explanation of the variation in the dependent variable and vice versa. On the contrary, an F-statistic of 0 indicates that the independent variable does not explain the variation in the dependent variable.

We reject the null hypothesis if the calculated value of the F-statistic is greater than the critical F-value.

It is worth mentioning that F-statistics are not commonly used in regressions with one independent variable. This is because F-statistic is equal to the square of the t-statistic for the slope coefficient, which implies the same thing as the t-test.

Sergio Torrico

2021-07-23

Excelente para el FRM 2 Escribo esta revisión en español para los hispanohablantes, soy de Bolivia, y utilicé AnalystPrep para dudas y consultas sobre mi preparación para el FRM nivel 2 (lo tomé una sola vez y aprobé muy bien), siempre tuve un soporte claro, directo y rápido, el material sale rápido cuando hay cambios en el temario de GARP, y los ejercicios y exámenes son muy útiles para practicar.

diana

2021-07-17

So helpful. I have been using the videos to prepare for the CFA Level II exam. The videos signpost the reading contents, explain the concepts and provide additional context for specific concepts. The fun light-hearted analogies are also a welcome break to some very dry content. I usually watch the videos before going into more in-depth reading and they are a good way to avoid being overwhelmed by the sheer volume of content when you look at the readings.

Kriti Dhawan

2021-07-16

A great curriculum provider. James sir explains the concept so well that rather than memorising it, you tend to intuitively understand and absorb them. Thank you ! Grateful I saw this at the right time for my CFA prep.

nikhil kumar

2021-06-28

Very well explained and gives a great insight about topics in a very short time. Glad to have found Professor Forjan's lectures.

Marwan

2021-06-22

Great support throughout the course by the team, did not feel neglected

Benjamin anonymous

2021-05-10

I loved using AnalystPrep for FRM. QBank is huge, videos are great. Would recommend to a friend

Daniel Glyn

2021-03-24

I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!

michael walshe

2021-03-18

Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.

Assumptions Underlying Linear Regression

Analysis of Variance (ANOVA)

quantitative-methods

Arithmetic Mean Return Vs Geometric Me ...

Both arithmetic return and geometric return are methods commonly used to calculate the... Read More

quantitative-methods

Cash Flow Additivity

A timeline is a physical illustration of the amounts and timing of cashflows... Read More

quantitative-methods

Interest Rates

The time value of money is a concept that states that cash received... Read More

quantitative-methods

Interest Rate as the Sum of Real Risk- ...

Interest is a reward a borrower pays for using an asset, usually capital,... Read More