Candidate’s objectives:

After completing this reading you should be able to:

- Define and interpret omitted variable bias, and describe the methods for addressing this bias.
- Distinguish between single and multiple regression.
- Interpret the slope coefficient in a multiple regression.
- Describe homoskedasticity and heteroskedasticity in a multiple regression.
- Describe the OLS estimator in a multiple regression.
- Calculate and interpret measures of fit in multiple regression.
- Explain the assumptions of the multiple linear regression model.
- Explain the concepts of imperfect and perfect multicollinearity and their implications.

## Omitted Variable Bias

Consider the student-teacher ratio analysis. You will realize that factors like the quality education determined by the teacher quality and computer usage, and issues affecting students, such as the background of their families, were omitted. Therefore, in the regression of the test scores on the student-teacher ratio chances are that there could be a bias in the OLS estimator of the slope.

An omitted variable bias occurs in the event of a correlation between the regressor and the omitted variable from the analysis that can partly determine the dependent variable. Therefore, for the omitted variable to occur, the following two conditions must be satisfied:

- There must be a correlation between the omitted variable and the included regressor.
- The dependent variable can be determined by the omitted variable.

### Omitted Variable Bias and the First Least Squares Assumption

Due to the omitted variable bias, the first least squares assumption, \(E\left( { u }_{ i }|{ X }_{ i } \right) =0\), is not correct. Other than \({ X }_{ i }\), all factors that determine \({ Y }_{ i }\) were represented by \({ u }_{ i }\), in the linear regression model with a single regressor. The error term becomes correlated with \({ X }_{ i }\) in the event that one of the said other factors are correlated with \({ X }_{ i }\).

It is also important to note that the conditional mean of \({ u }_{ i }\) given \({ X }_{ i }\) happens to be nonzero as there is a correlation between \({ u }_{ i }\) and \({ X }_{ i }\). In this regard, the first least squares assumption is violated by this correlation, and with dire consequences. The bias in the OLS estimator does not vanish no matter the size of the sample. The Estimator is also considered inconsistent.

### Omitted Variable Bias Formula

Supposing that the correlation between \({ u }_{ i }\) and \({ X }_{ i }\) has been denoted as \(corr\left( { X }_{ i },{ u }_{ i } \right) ={ \rho }_{ { Xu } }\). Moreover, suppose further that apart from the first least square assumption, the second and third least squares assumptions are valid. The following is an expression for a limit of the OLS estimator:

$$ { \hat { \beta } }_{ 1 }\xrightarrow [ p ]{ } { \beta }_{ 1 }+{ \rho }_{ { X }_{ u } }\frac { { \sigma }_{ u } }{ { \sigma }_{ x } } $$

The implication here is that \({ \hat { \beta } }_{ 1 }\) becomes close to \({ \beta }_{ 1 }+{ \rho }_{ { Xu } }\left( \frac { { \sigma }_{ u } }{ { \sigma }_{ x } } \right) \), with an increase in the sample size, and this happens with a great probability.

The following is a summary of the omitted variable bias concepts summarized by the above expression:

- Omitted variable bias is a challenge despite the size of the sample. With omitted variable bias, \({ \hat { \beta } }_{ 1 }\) becomes an inconsistent estimator of \({ \beta }_{ 1 }\). The bias in \({ \hat { \beta } }_{ 1 }\) that persists even in large samples is denoted as \({ \rho }_{ { Xu } }\left( \frac { { \sigma }_{ u } }{ { \sigma }_{ x } } \right) \).
- The correlation \({ \rho }_{ { Xu } }\) between the regressor and the error term determines the size of the bias, in practice. The bias increases with the absolute value of \({ \rho }_{ { Xu } }\).
- Depending on whether \(X\) and \(u\) are positively or negatively correlated, then the direction of the bias in \({ \hat { \beta } }_{ 1 }\) can be determined.

## The Multiple Regression Model

The single variable regression model is a special type of the multiple regression model we had studied earlier on. In this model, we estimate the impact of varying one variable \(\left( { X }_{ 1i } \right) \), on \({ Y }_{ i }\), and at the same time, the other regressors \(\left( { X }_{ 2i },{ X }_{ 3i },\dots ,{ X }_{ ni } \right) \) are not allowed to vary.

### The population Regression Line

Consider a situation where we are provided with two independent variables \({ X }_{ 1i }\) and \({ X }_{ 2i }\). The following linear function gives the average relationship between \({ X }_{ 1i }\) and \({ X }_{ 2i }\), and the dependent variable \(Y\):

$$ E\left( { Y }_{ i }|{ X }_{ 1i }={ x }_{ 1 },{ X }_{ 2i }={ x }_{ 2 } \right) ={ \beta }_{ 0 }+{ \beta }_{ 1 }{ x }_{ 1 }+{ \beta }_{ 2 }{ x }_{ 2 }\quad \quad \quad \quad equation\quad I $$

The conditional expectation of \({ Y }_{ i }\) given that \({ X }_{ 1i }={ x }_{ 1 }\) and \({ X }_{ 2i }={ x }_{ 2 }\) is denoted as

$$ E\left( { Y }_{ i }|{ X }_{ 1i }={ x }_{ 1 },{ X }_{ 2i }={ x }_{ 2 } \right). $$

In the multiple regression model, the above equation \(I\) represent the population regression line or the population regression function. In the equation: we have that: the intercept is the coefficient of \({ \beta }_{ 0 }\), the slope coefficient of \({ X }_{ 1i }\) is the coefficient \({ \beta }_{ 1 }\), and the coefficient on \({ X }_{ 2i }\) is the coefficient \({ \beta }_{ 2 }\).

The equation \(I\) above can be written as:

$$ Y={ \beta }_{ 0 }+{ \beta }_{ 1 }{ X }_{ 1 }+{ \beta }_{ 2 }{ X }_{ 2 } $$

Let us change \({ X }_{ 1 }\) by an amount \(\Delta { X }_{ 1 }\) while \({ X }_{ 2 }\) remains constant. This will cause \(Y\) to change by \(\Delta { Y }\).

Therefore:

$$ Y+\Delta Y={ \beta }_{ 0 }+{ \beta }_{ 1 }\left( { X }_{ 1 }+\Delta { X }_{ 1 } \right) +{ \beta }_{ 2 }{ X }_{ 2 }\quad \quad \quad equation\quad II $$

If we subtract equation \(I\) from equation \(II\), we the result is:

$$ \Delta Y={ \beta }_{ 1 }\Delta { X }_{ 1 } $$

$$ \Rightarrow { \beta }_{ 1 }=\frac { \Delta Y }{ \Delta { X }_{ 1 } } ,Holding\quad { X }_{ 2 }\quad constant $$

\({ \beta }_{ 1 }\) happens to be the expected change in \(Y\) of a unit change in \({ X }_{ 1 }\), as \({ X }_{ 2 }\) is being held constant. \({ \beta }_{ 1 }\) is also described as the partial effect of \({ X }_{ 1 }\) on \(Y\), as \({ X }_{ 2 }\) is held constant.

### The Population Multiple Regression Model

The population multiple regression model, with only two regressors, can be written as:

$$ { Y }_{ i }={ \beta }_{ 0 }+{ \beta }_{ 1 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 2i }+{ u }_{ i },\quad \quad i=1,\dots ,n $$

The \(i\)th of the \(n\) observations is denoted by the subscript \(i\). \({ u }_{ i }\) is the error term.

\({ \beta }_{ 0 }\) is treated as the coefficient on a regressor that always equal to 1, in regression with binary regressors.

Alternatively:

$$ { Y }_{ i }={ \beta }_{ 0 }{ X }_{ 0i }+{ \beta }_{ 1 }{ X }_{ 1i }+{ \beta }_{ 2 }{ X }_{ 2i }+{ u }_{ i }, $$

Where

\({ X }_{ 0i }=1\) and

\(i=1,\dots ,n\)

Since \({ X }_{ 0i }\) is always equal to 1 for all observations it is referred to as the constant regressor. The constant term in the regression is denoted by the intercept \({ \beta }_{ 0 }\).

Suppose that the variance of the conditional distribution of \({ u }_{ i }\) given \({ X }_{ 1i },{ X }_{ 2i },\dots ,{ X }_{ ki }\), \(var\left( { u }_{ i }|{ X }_{ 1i },{ X }_{ 2i },\dots ,{ X }_{ ki } \right) \), is constant for \(i=1,\dots ,n\) and therefore independent of \({ X }_{ 1i },{ X }_{ 2i },\dots ,{ X }_{ ki }\), then the error term \({ u }_{ i }\) is said to be homoskedastic. If this is not the case the term is said to be heteroskedastic.

## The OLS Estimator in Multiple Regression

This section is all about the estimation of the coefficients of the multiple regression model using OLS.

## The OLS Estimator

In the multiple regression model, the coefficients \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots,{ \beta }_{ k }\) can be estimated by applying the OLS method. Suppose the estimators of \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots,{ \beta }_{ k }\) are \({ b }_{ 0 },{ b }_{ 1 },\dots,{ b }_{ k }\), then, using these estimators, the predicted value of \(Y\) is:

$$ { b }_{ 0 }+{ b }_{ 1 }{ X }_{ 1i }+{ b }_{ 2 }{ X }_{ 2i }+\cdots +{ b }_{ k }{ X }_{ ki } $$

And the following is the mistake in \({ Y }_{ i }\) prediction:

$$ { Y }_{ i }-\left( { b }_{ 0 }+{ b }_{ 1 }{ X }_{ 1i }+{ b }_{ 2 }{ X }_{ 2i }+\cdots +{ b }_{ k }{ X }_{ ki } \right) ={ Y }_{ i }-{ b }_{ 0 }-{ b }_{ 1 }{ X }_{ 1i }-{ b }_{ 2 }{ X }_{ 2i }-\cdots -{ b }_{ k }{ X }_{ ki } $$

Therefore, the following expression sums up these squared predicted mistakes over all \(n\) observations:

$$ \sum _{ i=1 }^{ n }{ { \left( { Y }_{ i }-{ b }_{ 0 }-{ b }_{ 1 }{ X }_{ 1i }-{ b }_{ 2 }{ X }_{ 2i }-\cdots -{ b }_{ k }{ X }_{ ki } \right) }^{ 2 } } $$

In the above equation, the sum of mistakes is minimized by the estimators of the coefficients \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots,{ \beta }_{ k }\). These estimators are referred to as the Ordinary Least Squares (OLS) estimators of \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots,{ \beta }_{ k }\). We will denote the OLS estimators as \(\hat { \beta } _{ 0 },{ \hat { \beta } }_{ 1 },\dots ,{ \hat { \beta } }_{ k }\).

We apply the OLS estimators in the construction of the OLS regression line:

$$ \hat { \beta } _{ 0 }+{ \hat { \beta } }_{ 1 }{ X }_{ 1 }+\dots+ { \hat { \beta } }_{ k }{ X }_{ k } $$

And the OLS residual is:

$$ { \hat { u } }_{ i }={ Y }_{ i }-{ \hat { Y } }_{ i } $$

We can calculate the OLS estimators by a repeated trial and error technique with various values of \({ b }_{ 0 },\dots ,{ b }_{ k }\) until such a point that the total sum of squares has been minimized.

## Measures of Fit in Multiple Regression

In multiple regression, the following are the most popular summary statistics:

- The standard error of the regression (SER)
- The regression \({ R }^{ 2 }\)
- The adjusted \({ R }^{ 2 }\), or simply \({ \bar { R } }^{ 2 }\).

### The Standard Error of the Regression (SER)

The SER is an estimate of the standard deviation of the error term \({ u }_{ i }\). In other words, the spread of the distribution of \(Y\) around the regression line is measured by SER. The following is an equation of the SER in multiple regression:

$$ SER={ s }_{ \hat { u } } ,$$

Where:

$$ { s }_{ \hat { u } }^{ 2 }=\frac { 1 }{ n-k-1 } \sum _{ i=1 }^{ n }{ { \hat { u } }_{ 1 }^{ 2 } } =\frac { SSR }{ n-k-1 } $$

The sum of squared residuals (SSR) is computed as follows:

$$ S=\sum _{ i=1 }^{ n }{ { \hat { u } }_{ 1 }^{ 2 } } $$

The degrees-of-freedom adjustment happens to be negligible as \(n\) gets larger.

### The Regression \({ R }^{ 2 }\)

This regression is a fraction of the sample variance of \({ Y }_{ i }\) that the regressors predict. Furthermore, this regression is the fraction of the variance of \({ Y }_{ i }\) not explained by the regressors subtracted from 1.

Therefore:

$$ { R }^{ 2 }=\frac { ESS }{ TSS } =1-\frac { SSR }{ TSS } $$

In this case, the explained sum of squares is (ESS) is computed in the following manner:

$$ ESS=\sum _{ j=1 }^{ n }{ { \left( { \hat { Y } }_{ i }-\bar { Y } \right) }^{ 2 } } $$

The total sum of squares (TSS) is given as:

$$ TSS=\sum _{ j=1 }^{ n }{ { \left( { Y }-\bar { Y } \right) }^{ 2 } } $$

Apart from a situation in which the estimated coefficient in the added regressor is 0, any addition of a regressor, in multiple regression, leads to a corresponding increase in \({ R }^{ 2 }\).

### The Adjusted \({ R }^{ 2 }\)

The adjusted \({ R }^{ 2 }\), or \({ \bar { R } }^{ 2 }\), offers a way to reduce \({ R }^{ 2 }\) by some factor. This is, therefore, a modification of \({ R }^{ 2 }\) which does not always increase with the addition of a new regressor.

\({ \bar { R } }^{ 2 }\) is determined in the following manner:

$$ { \bar { R } }^{ 2 }=1-\frac { n-1 }{ n-k-1 } \frac { SSR }{ TSS } =1-\frac { { s }_{ \hat { u } }^{ 2 } }{ { s }_{ Y }^{ 2 } } $$

It is important to note the following concerning \({ \bar { R } }^{ 2 }\):

- \(\frac { n-1 }{ n-k-1 } <1\), therefore \({ \bar { R } }^{ 2 }<{ R }^{ 2 }\), always.
- Adding the regressor can have the following effect on \({ \bar { R } }^{ 2 }\)
- Cause the SSR to fall hence increasing \({ \bar { R } }^{ 2 }\).
- It can increase \(\frac { n-1 }{ n-k-1 } \).

- \({ \bar { R } }^{ 2 }\) can take a negative value.

## The Least Squares Assumptions in Multiple Regressions

The following are the least squares assumptions in the multiple regression model:

- The conditional distribution of \({ u }_{ i }\) given \({ X }_{ 1i },{ X }_{ 2i },\dots ,{ X }_{ ki }\) has a mean 0.
- \(\left( { X }_{ 1i },{ X }_{ 2i },\dots ,{ X }_{ ki },{ Y }_{ i } \right) ,i=1,2,\dots ,n\) are \(i.i.d.\)
- Large outliers are unlikely
- No perfect multicollinearity

We have already discussed the three least squares assumptions in chapter-Linear Regression with One Regression. In this section, we focus on the fourth assumption. This assumption is that there is no perfect multicollinearity.

### Assumption number IV: No perfect Multicollinearity

This is a new assumption to the multiple regression model. In this assumption, perfect multicollinearity situation – a condition that enables the computation of the OLS estimator. Assuming that a regressor is the perfect linear function of the other regressors, the regressors are said to display perfect multicollinearity and are therefore perfectly multicollinear.

## The Distribution of the OLS Estimators in Multiple Regression

The reason why different values of OLS estimators are produced by different samples is the difference in data from one sample to the next. Due to these variations, there will be uncertainty linked to the OLS estimators of the population regression coefficients \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots ,{ \beta }_{ k }\).The sampling distribution of the OLS estimators summarizes this variation.

We know that under the least squares assumption, the OLS estimator \(\left( { \hat { \beta } }_{ 0 }\quad and\quad { \hat { \beta } }_{ 1 } \right) \) are the unbiased and consistent estimators of the coefficients \(\left( { \beta }_{ 0 }\quad and\quad { \beta }_{ 1 } \right) \) that are unknown in the linear regression model with a single regressor.

Moreover, a bivariate normal distribution can well approximate the sampling distribution of \({ \hat { \beta } }_{ 0 }\quad and\quad { \hat { \beta } }_{ 1 }\) in large samples. The randomly sampled data has the OLS estimators \({ \beta }_{ 0 },{ \beta }_{ 1 },\dots ,{ \beta }_{ k }\) as its averages. This implies that the sampling distribution of these averages becomes normal in the event that the sample size is sufficiently large. This calls for the application of the central limit theorem.

## Multicollinearity

We are already aware that perfect multicollinearity is as a result of the fact that one of the regressors is a perfect linear combination of the other regressors. On the other hand of perfect multicollinearity is imperfect multicollinearity. It arises in case one of the regressors is very highly correlated with other regressors. However, the regressor is not perfectly correlated with them but is just very highly correlated.

As opposed to perfect multicollinearity, estimation of the regression is not prevented by imperfect multicollinearity. Furthermore, imperfect multicollinearity does not imply a logical problem with the choice of a regressor. Nevertheless, the implication is that there could be an imprecise estimation of one or more regression coefficients.