Candidate’s objectives:

After completing this reading you should be able to:

- Construct, apply, and interpret hypothesis tests and confidence intervals for a single coefficient in a multiple regression.
- Construct, apply, and interpret joint hypothesis tests and confidence intervals for multiple coefficients in a multiple regression.
- Interpret the \(F\)-statistic.
- Interpret tests of a single restriction involving multiple coefficients.
- Interpret confidence sets for multiple coefficients.
- Identify examples of omitted variable bias in multiple regressions.
- Interpret the \({ R }^{ 2 }\) and adjusted \({ R }^{ 2 }\) in a multiple regression.

## Hypothesis Tests and Confidence Intervals for a Single Coefficient

This section is about the calculation of the standard error, hypotheses testing, and confidence interval construction for a single regression in a multiple regression equation.

### Standard Errors for the OLS Estimators

When dealing with a single regressor, the variance of the OLS estimator can be estimated by replacing sample variables for expectations, leading to the estimator, \({ \hat { \sigma } }_{ \hat { \beta } }^{ 2 }\). There is a convergence by these sample averages to the population averages, by the law of large numbers – according to the least squares assumption. The standard deviation of the sampling distribution of \({ \hat { \beta } }_{ 1 }\), is estimated by the standard the standard error of \({ \hat { \beta } }_{ 1 }\), \(\left( SE\left( { \hat { \beta } }_{ 1 } \right) \right) \), which happens to be the square root of \({ \hat { \sigma } }_{ \hat { \beta } }^{ 2 }\).

This concept can be extended further to multiple regression. Here, we can apply \(SE\left( { \hat { \beta } }_{ j } \right) \) in the estimation of the standard deviation of the \(j\)th regression coefficient’s OLS estimator \({ \hat { \beta } }_{ j }\). Matrices are the simplest form of stating the standard error formula.

### Hypothesis Tests for a single coefficient

Suppose that we are testing the hypothesis that the true coefficient \({ \beta }_{ j }\) on the \(j\)th regressor takes on some specific value \({ \beta }_{ j,0 }\). Let the alternative hypothesis be two-sided. Therefore, the following is the mathematical expression of the two hypotheses:

$$ { H }_{ 0 }:{ \beta }_{ j }={ \beta }_{ j,0 }\quad vs.\quad { H }_{ 1 }:{ \beta }_{ j }\neq { \beta }_{ j,0 } $$

This expression represents the two-sided alternative. The following are the steps to follow while testing the null hypothesis:

- Computing the coefficient’s standard error.
- Computing the \(t\)-statistic, as previously described:
$$ t=\frac { { \hat { \beta } }_{ j }-{ \beta }_{ j,0 } }{ SE\left( { \hat { \beta } }_{ j } \right) } $$

- Computing the test’s \(p-value\) as previously described:
$$ p-value=2\Phi \left( -|{ t }^{ act }| \right) $$

- Also, the \(t\)-statistic can be compared to the critical value corresponding to the significance level that is desired for the test.

### Confidence Intervals for a Single Coefficient

Supposing that an interval contains the true value of \({ \beta }_{ j }\) with a probability of 95%. This is simply the 95% two-sided confidence interval for \({ \beta }_{ j }\). The implication here is that the true value of \({ \beta }_{ j }\) is contained in 95% of all possible randomly drawn variables.

Alternatively, the 95% two-sided confidence interval for \({ \beta }_{ j }\) is the set of values that are impossible to reject when a two-sided hypothesis test of 5% is applied. Therefore, with a large sample size:

$$ 95\%\quad confidence\quad interval\quad for\quad { \beta }_{ j }=\left[ { \hat { \beta } }_{ j }-1.96SE\left( { \hat { \beta } }_{ j } \right) ,{ \hat { \beta } }_{ j }+1.96SE\left( { \hat { \beta } }_{ j } \right) \right] $$

## Tests of Joint Hypotheses

In this section, we consider the formulation of the joint hypotheses on multiple regression coefficients. We will further study the application of an \(F\)-statistic in their testing.

### Hypotheses Testing on Two or More Coefficients

#### Joint Null Hypothesis

Generally, when two or more conditions are imposed on the regression coefficients, the hypothesis is said to be a joint hypothesis. Assuming that our joint and null hypotheses can be expressed as:

$$ { H }_{ 0 }:{ \beta }_{ j }={ \beta }_{ j,0 },{ \beta }_{ m }={ \beta }_{ m,0 }\quad where\quad there\quad are\quad a\quad total\quad of\quad q\quad restrictions, $$

$$ vs.\quad H_{ 1 }:\quad One\quad or\quad more\quad of\quad the\quad q\quad restrictions\quad under\quad H_{ 0 }\quad does\quad not\quad hold.\quad \quad \quad Equation\quad I $$

Note that in the above expression, different regression coefficients are expressed as \({ \beta }_{ j },{ \beta }_{ m },\dots ,\) and these coefficients takes the following values under the null hypothesis: \({ \beta }_{ j,0 },{ \beta }_{ m,0 }\).

For the joint null hypothesis to be a fault, we must have at least one of the equalities under the null hypothesis in the equation \(I\) as false, which is the alternative hypothesis.

#### \(F\)-Statistic

The test of the joint hypothesis about regression coefficients calls for the application of the \(F\)-statistic.

**The \(F\)-statistic with \(q = 2\) restrictions**Let the two restrictions in the joint null hypothesis be \({ \beta }_{ 1 }=0\) and \({ \beta }_{ 2 }=0\). Therefore, the two \(t\)-statistics, \({ t }_{ 1 }\) and \({ t }_{ 2 }\), are combined by the \(F\)-statistic as shown below:

$$ F=\frac { 1 }{ 2 } \left( \frac { { t }_{ 1 }^{ 2 }+{ t }_{ 2 }^{ 2 }-2{ \hat { \rho } }_{ { t }_{ 1 },{ t }_{ 2 } }{ t }_{ 1 },{ t }_{ 2 } }{ 1-{ \hat { \rho } }_{ { t }_{ 1 },{ t }_{ 2 } }^{ 2 } } \right) $$

The correlation between the two \(t\)-statistics has \({ \hat { \rho } }_{ { t }_{ 1 },{ t }_{ 2 } }\) as the estimator. Assuming the \(t\)-statistics are uncorrelated, therefore, the terms \({ \hat { \rho } }_{ { t }_{ 1 },{ t }_{ 2 } }\) and \({ \hat { \rho } }_{ { t }_{ 1 },{ t }_{ 2 } }^{ 2 }\) will get dropped. As a result:

$$ F=\frac { 1 }{ 2 } \left( { t }_{ 1 }^{ 2 }+{ t }_{ 2 }^{ 2 } \right) $$

This implies that the average of the squared \(t\)-statistics is the \(F\)-statistic. Based on the null hypothesis that \({ t }_{ 1 }\) and \({ t }_{ 2 }\) are standard normal random variables, the distribution of \(F\) is an \({ F }_{ 2,\infty }\).

**The \(F\)-statistic with \(q\) restrictions**In large samples under the null hypothesis, the sampling distribution of the \(F\)-statistic is an \({ F }_{ q,\infty }\) distribution.

### Applying the Statistical Software to Calculate the Heteroskedasticity-Robust \(F\)-Statistic

The general heteroskedasticity-robust formula is applicable in the computation of the \(F\)-statistic. With no regard on whether the errors are homoskedastic or heteroskedastic, \({ F }_{ q,\infty }\) is the \(F\)-statistic’s large \(n\) distribution based on the null hypothesis.

For most statistical software, the homoskedastic-only standard errors are calculated by default. However, other software packages require the user to select the “robust” option for the \(F\)-statistic to be computed applying the heteroskedasticity-robust standard errors.

#### Applying the \(F\)-Statistic to calculate the \(p-value\)

The large sample \({ F }_{ q,\infty }\) approximation can be applied to the distribution of the \(F\)-statistic to calculate the \(p-value\). Assuming the actually computed \(F\)-statistic value is denoted as \({ F }^{ act }\). Under the null hypothesis, the distribution of the \(F\)-statistic is a large-sample \({ F }_{ q,\infty }\).

Therefore:

$$ P-value=Pr\left[ { F }_{ q,\infty }>{ F }^{ act } \right] $$

#### The Overall Regression \(F\)-Statistic

To test the joint hypothesis that all the slope coefficients are zero, we apply the overall regression \(F\)-statistic. We have that:

$$ { H }_{ 0 }:{ \beta }_{ 1 }=0,{ \beta }_{ 2 }=0,\dots ,{ \beta }_{ k }=0 $$

$$ Versus $$

$$ { H }_{ 1 }:{ \beta }_{ j }\neq 0,\quad at\quad least\quad one\quad j,j=1,\dots ,k $$

No variation in \({ Y }_{ i }\) is explained by any of the regressors, under this null hypothesis. This is despite the likelihood of the intercept being nonzero. In case the null hypothesis holds in large samples, the distribution of the overall regression of the \(F\)-statistic is and \({ F }_{ k,\infty }\).

#### The \(F\)-Statistic When \(q = 1\)

The test of an \(F\)-statistic is case \(q = 1\) is a single restriction. The null hypothesis on a single regression coefficient will, therefore, be the new joint null hypothesis. Furthermore, the square of the \(t\)-statistic will be the \(F\)-statistic.

### The Homoskedasticity-Only \(F\)-Statistic

If we are dealing with a homoskedastic error term, we can express the \(F\)-statistic as the improvement in the fit of the regression as measured either by the sum of squared residuals or by the regression \({ R }^{ 2 }\). This will yield to an \(F\)-statistic that is called the homoskedasticity-only \(F\)-statistic. Its validity is verified only in the event that the error term is homoskedastic. On the other hand, the validity of the heteroskedasticity-robust \(F\)-statistic can be verified whether the error term is homoskedastic or heteroskedastic.

The homoskedastic-only \(F\)-statistic can be calculated by applying a formula whose basis is the sum of squared residuals from two regressions. The restricted regression is the first regression, whose null hypothesis is forced to be true. The unrestricted regression is the second regression, whose alternative hypothesis is allowed to be true.

The following is the formula for the homoskedasticity-only \(F\)-regression:

$$ F=\frac { { \left( { SSR }_{ restricted }-{ SSR }_{ unrestricted } \right) }/{ q } }{ { { SSR }_{ unrestricted } }/{ \left( n-{ k }_{ unrestricted }-1 \right) } } $$

The sum of squared residuals from the restricted regression is denoted as \({ SSR }_{ restricted }\), the sum of squared residuals from the unrestricted regression is denoted as \({ SSR }_{ restricted }\). There are \(q\) regressions under the null hypothesis. The number of regressors from the unrestricted regression is denoted as \({ k }_{ unrestricted }\).

The following is the alternative formula for the homoskedasticity-only \(F\)-regression, based on \({ R }^{ 2 }\) of the two regressions:

$$ F=\frac { { \left( { R }_{ restricted }^{ 2 }-{ R }_{ unrestricted }^{ 2 } \right) }/{ q } }{ { 1-{ R }_{ unrestricted }^{ 2 } }/{ \left( n-{ k }_{ unrestricted }-1 \right) } } $$

#### Using the Homoskedastic-Only \(F\)-Statistic when \(n\) is small

Supposing that we are dealing with homoskedastic errors that are \(i.i.d\) normally distributed. Then, under the null hypothesis, the homoskedastic-only \(F\)-statistic has a distribution of \({ F }_{ q,n-{ k }_{ unrestricted }-1 }\). Both \(q\) and \(n-{ k }_{ unrestricted }-1\) will affect the critical values for this distribution. The convergence of \({ F }_{ q,n-{ k }_{ unrestricted }-1 }\) distribution will be to the \({ F }_{ q,\infty }\) as \(n\) increases.

## Testing Single Restrictions Involving Multiple Coefficients

A single restriction involving two or more coefficients may be suggested by economic theory.

Supposing we have a theory suggesting a null hypothesis of the form \({ \beta }_{ 1 }={ \beta }_{ 2 }\). This means that the first and second regressor have the same effects. Therefore, the task involves testing this null hypothesis against the alternative – the two coefficients differ:

$$ { H }_{ 0 }:{ \beta }_{ 1 }={ \beta }_{ 2 }\quad vs\quad { H }_{ 1 }:{ \beta }_{ 1 }\neq { \beta }_{ 2 }\quad \quad \quad equation\quad II $$

Since we have a single restriction, \(q = 1\). However, the restriction involves multiple coefficients, \({ \beta }_{ 1 }\) and \({ \beta }_{ 2 }\).

### Approach number 1: Direct Testing of the Hypothesis

For some statistical packages, there is a specialized command that is designed to take restrictions, like in equation \(I\), resulting in an \(F\)-statistic with an \({ F }_{ 1,\infty }\) distribution under the null hypothesis.

### Approach number 2: Transforming the regression

Supposing that we are given only two regressors namely \({ X }_{ 1i }\) and \({ X }_{ 2i }\), in the regression. Therefore, the population regression will have the following form:

$$ { Y }_{ i }={ \beta }_{ 0 }+{ \beta }_{ 1 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 2i }+{ u }_{ i }\quad \quad \quad \quad equation\quad III $$

Note that by subtracting and adding \({ \beta }_{ 2 }{ X }_{ 1i }\), we have that:

$$ { \beta }_{ 1 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 2i }={ \beta }_{ 1 }{ X }_{ 1i }-{ \beta }_{ 2 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 2i }=\left( { \beta }_{ 1 }-{ \beta }_{ 2 } \right) { X }_{ 1i }+{ \beta }_{ 2 }\left( { X }_{ 1i }+{ X }_{ 2i } \right) =\gamma { X }_{ 1i }+{ \beta }_{ 2 }{ W }_{ 1 } $$

Note that:

$$ { \gamma }_{ 1 }={ \beta }_{ 1 }-{ \beta }_{ 2 } $$

$$ { W }_{ 1 }= { X }_{ 1i }+{ X }_{ 2i } $$

Therefore, equation \(III\) can be expressed as:

$$ { Y }_{ i }={ \beta }_{ 0 }+{ \gamma }_{ 1 }{ X }_{ 1i }+{ \beta }_{ 2 }{ W }_{ 1 }+{ u }_{ i } $$

Since \({ \beta }_{ 1 }-{ \beta }_{ 2 }\) is the coefficient, the null hypothesis in equation \(II\) requires \({ \gamma }_{ 1 }=0\) and the alternative requires that \({ \gamma }_{ 1 }\neq 0\). Therefore, the transformation from equation \(II\) to equation \(III\) has turned a restriction on two regression coefficients into a restriction on a single regression coefficient.

Now that the restriction involves a single coefficient, \({ \gamma }_{ 1 }\), by applying the \(t\)-statistic method we can test the null hypothesis in equation \(II\).

To test the null hypothesis, we first construct the regressor, \({ W }_{ i }\), which happens to be a summation of the original regressors. We then evaluate the regression of \({ Y }_{ i }\) on \({ X }_{ i }\) and \({ W }_{ i }\). We can finally calculate a 95% confidence interval for the difference in coefficients \({ \beta }_{ 1 }-{ \beta }_{ 2 }\) as:

$$ { \hat { \gamma } }_{ 1 }\pm 1.96SE\left( { \hat { \gamma } }_{ 1 } \right) $$

We can extend this method to other restrictions on the regression equations, for example, we can have an extension to \(q > 1\).

## Confidence Sets for Multiple Coefficients

In this section, we consider the construction of a confidence set for two or more regression coefficients. For two or more coefficients, the 95% confidence set is a set having the true population values of these coefficients in 95% of the randomly drawn samples.

Let us assume that we want to construct a confidence set for coefficients \({ \beta }_{ 1 }\) and \({ \beta }_{ 2 }\). The joint null hypothesis that \({ \beta }_{ 1 }={ \beta }_{ 1,0 }\) and \({ \beta }_{ 2 }={ \beta }_{ 2,0 }\) can be tested by applying the \(F\)-statistic.

If every possible value of \({ \beta }_{ 1,0 }\) and \({ \beta }_{ 2,0 }\) is to be tested at the 5% confidence level, we construct the \(F\)-statistic for each pair, \(\left( { \beta }_{ 1,0 }{ \beta }_{ 2,0 } \right) \), which is to be rejected should it surpass the 5% critical value of 3%. In 95% of the sample, the true population values of \({ \beta }_{ 1 }\) and \({ \beta }_{ 2 }\) will not be rejected. Therefore, the 95% confidence set for \({ \beta }_{ 1 }\) and \({ \beta }_{ 2 }\) is made up of values not rejected at the 5% level by this \(F\)-statistic.

### Model Specification for Multiple Regression

Since there is no single rule that always applies, it can be very challenging to determine the variables to be included in a multiple regression. Therefore is always advisable to use your knowledge of the empirical problem and focus on the unbiased estimates of the causal effects of interest.

## Omitted Variable Bias in Multiple Regression

This is the bias in the OLS estimator arising when at least one included regressor gets collaborated with an omitted variable. The following conditions must be satisfied for an omitted variable bias to occur:

- There must be a correlation between at least one of the included regressors and the omitted variable.
- The dependent variable \(Y\) must be determined by the omitted variable.

### Model Specification in Theory and in Practice

Theoretically, including the omitted variable in the regression is the solution to the omitted variable bias in case the data is available on the omitted variable. However, it may be difficult and may require judgment to make the decision of whether a particular variable should be included.

We have a two-fold approach to the challenge of the potential omitted variable bias. We can first apply the base specification regression, where we choose the best set of the regressor by applying a combination of expert judgment, economic theory, and knowledge on the collection of the data.

The next step involves developing a list of candidate alternative specifications. These are alternative sets of regressors. The reliability of estimates from or base specification is verified by confirming whether estimates of the coefficients of interest are numerically similar across the alternative specifications. If yes, then they are reliable, if not then the substantial variations in estimates are evidence of omitted variable bias in the original specification.

## Practical Interpretation of the \({ R }^{ 2 }\) and the adjusted \({ R }^{ 2 }\), \({ \bar { R } }^{ 2 }\)

When an \({ R }^{ 2 }\) or an adjusted \({ R }^{ 2 }\) is near 1, then the values of the dependent variable in the sample can be easily predicted by the regressors. If \({ R }^{ 2 }\) or an adjusted \({ R }^{ 2 }\) is near 0, then they are not good predictors of the said values.

The following are the factors to watch out when guarding against applying the \({ R }^{ 2 }\) or the \({ \bar { R } }^{ 2 }\):

- An added variable doesn’t have to be statistically significant just because the \({ R }^{ 2 }\) or the \({ \bar { R } }^{ 2 }\) has increased.
- It is not always true that the regressors are a true cause of the dependent variable, just because there is a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
- It is not necessary that there is no omitted variable bias just because we have a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
- It is not necessarily true that we have the most appropriate set of regressors just because we have a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
- It is not necessarily true that we have an inappropriate set of regressors just because we have a low \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).