###### The Required Rate of Return the the Go ...

Given all the inputs to a dividend discount model (DDM) except the required... **Read More**

**R-squared** \(\bf{(R^2)}\) measures how well an estimated regression fits the data. It is also known as the **coefficient of determination** and can be formulated as:

$$ R^2=\frac{\text{Sum of regression squares}}{\text{Sum of squares total}}=\frac{{\sum_{i=1}^{n}{(\widehat{Y_i}-\bar{Y})}}^2}{{\sum_{i=1}^{n}{(Y_i-\bar{Y})}}^2} $$

Where:

\(n\) = Number of observations.

\(Y_i\) = Dependent variable observations.

\(\widehat{Y_i}\) = Dependent variables predicted value to the independent variable.

\(\bar{Y}\)= Dependent variable mean.

In the presence of independent variables, \(R^2\) will either increase or remain constant. However, \(R^2\) cannot be used to measure the goodness of fit of a model as it will not decrease with the addition of independent variables.

- It is impossible to determine the statistical significance of the coefficients from \(R^2\).
- A bias in the predicted coefficients or estimates cannot be determined with \(R^2\).
- When a model is good, it has a high \(R^2\); when it is bad, it has a low \(R^2\), usually due to overfitting and biases in the model.

An **overfitted regression model** is one with too many independent variables to the number of observations in a sample. Overfitting may produce coefficients that do not reflect the true relationship between the independent and dependent variables.

Multiple regression software packages usually produce an **adjusted** \(\bf{R^2} (\bar{R}^2)\) as an alternative measure of goodness of fit. Using adjusted \(R^2\) in regression is beneficial since it does not automatically increase when more independent variables are included, given that it adjusts for degrees of freedom.

$$ \bar{R^2}=1-\left[\cfrac{\frac{\text{Sum of squares error} }{n-k-1}}{\frac{\text{Sum of squares total}}{n-1}}\right] $$

Therefore, the relationship between \(\bar{R^2}\) and \(R^2\) can mathematically be derived as follows:

$$ \bar{R^2}=1-\left[\left(\frac{n-1}{n-k-1}\right)\ \left(1-R^2\right)\right] $$

Note that:

- If \(k \geq 1\) then \(R^2 > \text{adjusted } R^2\) the result is that adjusted \(R^2\) can be negative while \(R^2\) is zero at minimum.

When including a new variable in the regression, the following should be taken into consideration:

- \(\bar{R^2}\) increases when the coefficient t-statistic is \(> \left|1.0\right|\).
- \(\bar{R^2}\) decreases when the coefficient t-statistic is \(< \left|1.0\right|\).
- At typical significance levels, 5% and 1%, a t-statistic with an absolute value of 1.0 does not indicate that the independent variable is different from zero. Therefore, the adjusted \(R^2\) doesn’t demonstrate that it will increase significantly.

One of the outputs of multiple regression is the ANOVA table. The following shows the general structure of an Anova table.

$$ \begin{array}{c|c|c|c} \textbf{ANOVA} & \textbf{Df (degrees} & \textbf{SS (Sum of squares)} & \textbf{MSS (Mean sum} \\ & \textbf{of freedom)} & & \textbf{of squares)}\\ \hline \text{Regression} & k & \text{RSS} & MSR \\ & & \text{(Explained variation)} & \\ \hline \text{Residual} & n-(k+1) & \text{SSE} & MSE \\ & & \text{(Unexplained variation)} & \\ \hline \text{Total} & n-1 & \text{SST} & \\ & & \text{(Total variation) } & \end{array} $$

We can use the information in an ANOVA table to determine \(R^2\), the F-statistic, and the standard error estimates (SEE) as expressed below:

$$ R^2=\frac{RSS}{SST} $$

$$ F=\frac{MSR}{MSE} $$

$$ SEE=\sqrt{MSE} $$

Where:

$$ \begin{align*} MSR & =\frac{RSS}{k} \\ MSE & =\frac{SSE}{n-k-1} \end{align*} $$

Consider the following regression results generated from multiple regression analysis of the price of the US Dollar index on the inflation rate and real interest rate.

$$ \begin{array}{cccc} \text{ANOVA} & & & \\ \hline & \text{df} & \text{SS} & \text{Significance F} \\ \hline \text{Regression} & 2 & 432.2520 & 0.0179 \\ \text{Residual} & 7 & 200.6349 & \\ \text{Total} & 9 & 632.8869 & \\ \hline \\ & \text{Coefficients} & \text{Standard Error} & \\ \hline \text{Intercept} & 81 & 7.9659 & \\ \text{Inflation rates} & -276 & 233.0748 & \\ \text{Real interest Rates} & 902 & 279.6949 & \\ \hline \end{array} $$

Given the above information, the regression equation can be expressed as:

$$ P=81-276INF+902IR $$

Where:

\(P\) = Price of USDX.

\(INF\) = Inflation rate.

\(IR\) = Real interest rate.

\(R^2\) and adjusted \(R^2\) can also be calculated as follows:

$$ \begin{align*} R^2 & =\frac{RSS}{SST}=\frac{432.2520}{632.8869}=0.6830=68.30\% \\ \\ \text{Adjusted } R^2 & =1-\left(\frac{n-1}{n-k-1}\right)\left(1-R^2\right)=1-\frac{10-1}{10-2-1}\left(1-0.6830\right) \\ & =0.5924 = 59.24\% \end{align*} $$

It’s important to note the following:

- Multiple regression does not provide a straightforward explanation of adjusted \(R^2\) in terms of the variance explained by the dependent variable, as is the case in simple regression.
- Adjusted \(R^2\) does not indicate whether a regression coefficient’s predictions are true or biased. Residual plots and other statistics are required to determine whether or not the predictions are accurate.
- To assess the significance of the model’s fit, we use the F-Statistic and other goodness-of-fit metrics from the ANOVA rather than \(R^2\) and adjusted \(R^2\).

## Question

Which of the following is

most appropriatefor adjusted \(R^2\)?

- It is always positive.
- It may or may not increase when one adds an independent variable.
- It is non-decreasing in the number of independent variables.
## Solution

The correct answer is

B.The value of the adjusted \(R^2\) increases only when the added independent variables improve the fit of the regression model. Moreover, it decreases when the added variables do not improve the model fit sufficiently.

A is incorrect: The adjusted \(R^2\) can be negative if \(R^2\) is low enough. However, multiple \(R^2\) is always positive.

C is incorrect: The adjusted \(R^2\)can decreasewhen the added variables do not improve the model fit by a good enough amount. However, multiple \(R^2\) is non-decreasing in the number of independent variables. For this reason, it is less reliable as a measure of goodness of fit in regression with more than one independent variable than in a one-independent variable regression.