After completing this reading you should be able to:
- Define covariance stationary, autocovariance function, autocorrelation function, partial autocorrelation function, and autoregression.
- Describe the requirements for a series to be covariance stationary.
- Explain the implications of working with models that are not covariance stationary.
- Define white noise, and describe independent white noise and normal (Gaussian) white noise.
- Explain the characteristics of the dynamic structure of white noise.
- Explain how a lag operator works.
- Describe Wold’s theorem.
- Define a general linear process.
- Relate rational distributed lags to Wold’s theorem.
- Calculate the sample mean and sample autocorrelation, and describe the Box-Pierce Q-statistic and the Ljung-Box Q-statistic.
- Describe sample partial autocorrelation.
Covariance Stationery Time Series
The ordered set: \(\left\{ \dots ,{ y }_{ -2 },{ y }_{ -1 },{ y }_{ 0 },{ y }_{ 1 },{ y }_{ 2 },\dots \right\} \) is called the realization of a time series. Theoretically, it starts from the infinite past and proceeds to the infinite future. However, only a finite subset of realization can be used in practically, and is called a sample path.
A series is said to be covariance stationery if both its mean and covariance structure are stable over time. This implies that:
At time \(t\) the mean is:
$$ E\left( { y }_{ t } \right) ={ \mu }_{ t } $$
As covariance stationarity dictates, a mean that is stable over time is written as:
$$ E\left( { y }_{ t } \right) ={ \mu },\quad \quad \quad \forall t $$
It can be quite challenging to quantify the stability of a covariance structure. We will, therefore, use the autocovariance function. The covariance between \({ y }_{ t }\) and \({ y }_{ t-\tau }\) is the autocovariance at displacement \(\tau\). That is:
$$ \gamma \left( t,\tau \right) =cov\left( { y }_{ t },{ y }_{ t-\tau } \right) =E\left( { y }_{ t }-\mu \right) \left( { y }_{ t-\tau }-\mu \right) $$
As covariance stationarity demands, for autocovariance to solely depend on displacement, \(\tau\), then there must be stability in the covariance structure.
Therefore:
$$ \gamma \left( t,\tau \right) =\gamma \left( \tau \right) ,\quad \quad \quad \forall t $$
In a covariance stationery series, the cyclical functions are basically summarized by the autocovariance function. Autocovariances are graphed and examined as functions of \(\tau\). The function is symmetrical:
$$ \gamma \left( \tau \right) =\gamma \left( -\tau \right) ,\quad \quad \quad \forall t $$
Since displacement is the only factor that affects the autocovariance of a covariance stationery series, then the aspect of symmetry comes in.
Note that:
$$ \gamma \left( 0 \right) =cov\left( { y }_{ t },{ y }_{ t } \right) =var\left( { y }_{ t } \right) $$
This is another necessity of covariance stationarity. We must have a finite variance for the series. Furthermore, under covariance stationarity, other than being stable means should also be finite.
The correlation between two variables \(x\) and \(y\) can be defined as:
$$ corr\left( x,y \right) =\frac { cov\left( x,y \right) }{ { \sigma }_{ x }{ \sigma }_{ y } } $$
This implies that the product of the standard deviations of \(x\) and \(y\) normalizes or standardizes their covariance. The units of measurement of \(x\) and \(y\) do not affect the correlation. As compared to covariance, the correlation has a superior interpretability and is, therefore, more popular. The autocorrelation function, \(\rho \left( t \right) \), is applied more often than the autocovariance function, \(\gamma \left( \tau \right) \).
Where:
$$ \rho \left( t \right) =\frac { \gamma \left( \tau \right) }{ \gamma \left( 0 \right) } $$
And \(\tau =0,1,2,\dots \)
Note that \(\gamma \left( 0 \right)\) is the variance of \({ y }_{ t }\). Therefore, by covariance stationarity \(\gamma \left( 0 \right)\) is that variance of \(y\) at any other time \({ y }_{ t-1 }\).
Therefore:
$$ \rho \left( t \right) =\frac { cov\left( { y }_{ t },{ y }_{ t-1 } \right) }{ \sqrt { var\left( { y }_{ t } \right) } \sqrt { var\left( { y }_{ t-1 } \right) } } $$
$$ =\frac { \gamma \left( \tau \right) }{ \sqrt { \gamma \left( 0 \right) } \sqrt { \gamma \left( 0 \right) } } =\frac { \gamma \left( \tau \right) }{ \gamma \left( 0 \right) } $$
Note that:
$$ \rho \left( 0 \right) =\frac { { \gamma \left( 0 \right) } }{ { \gamma \left( 0 \right) } } =1 $$
The partial autocorrelation function is denoted as, \(p\left( \tau \right) \), and in a population linear regression of \({ y }_{ t }\) on \({ y }_{ t-1 },\dots ,{ y }_{ t-\tau }\), it is the coefficient of \({ y }_{ t-\tau }\). This regression is referred to as the autoregression. This is because the regression is on the lagged values of the variable.
White Noise
Assume that:
$$ { y }_{ t }={ \epsilon }_{ t } $$
$$ { \epsilon }_{ t }\sim \left( 0,{ \sigma }^{ 2 } \right) ,\quad \quad \quad \forall { \sigma }^{ 2 }<\infty $$
where \({ \epsilon }_{ t }\) is the shock and is uncorrelated over time. Therefore, \({ \epsilon }_{ t }\) and \({ y }_{ t }\) are said to be serially uncorrelated.
This that has a zero mean and unchanging variance is referred to as the zero-mean white noise (or just white noise) and is written as:
$$ { \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right) $$
And:
$$ { y }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right) $$
\({ \epsilon }_{ t }\) and \({ y }_{ t }\) serially uncorrelated but not necessarily serially independent. If \(y\) possesses this property, (serially uncorrelated but not necessarily serially independent) then it is said to be an independent white noise.
Therefore, we write:
$$ { y }_{ t }\underset { \sim }{ iid } \left( 0,{ \sigma }^{ 2 } \right) $$
This is read as “\(y\) is independently and identically distributed with a mean and constant variance. \(y\) is said to be serially independent if it is serially uncorrelated and it has a normal distribution. In this case, \(y\) is called the normal white noise or the Gaussian white noise.
Written as:
$$ { y }_{ t }\underset { \sim }{ iid } N\left( 0,{ \sigma }^{ 2 } \right) $$
To characterize the dynamic stochastic structure of \({ y }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right) \), it follows that the unconditional mean and variance of \(y\) are:
$$ E\left( { y }_{ t } \right) =0 $$
And:
$$ var\left( { y }_{ t } \right) ={ \sigma }^{ 2 } $$
These two are constant since only displacement affects the autocovariances rather than time. All the autocovariances and autocorrelations are zero beyond displacement zero since white noise is uncorrelated over time.
The following is the autocovariance function for a white noise process:
$$ \gamma \left( \tau \right) =\begin{cases} { \sigma }^{ 2 },\quad \tau =0 \\ 0,\quad \quad \tau \ge 0\quad \quad \end{cases} $$
The following is the autocorrelation function for a white noise process:
$$ \rho \left( \tau \right) =\begin{cases} 1,\quad \quad \tau =0 \\ 0,\quad \quad \tau \ge 1\quad \end{cases} $$
Beyond displacement zero, all partial autocorrelations for a white noise process are zero. Thus, by construction white noise is serially uncorrelated. The following is the function of the partial autocorrelation for a white noise process:
$$ p\left( \tau \right) =\begin{cases} 1,\quad \quad \tau =0 \\ 0,\quad \quad \tau \ge 1\quad \end{cases} $$
Simple transformations of white noise are considered in the construction of processes with much richer dynamics. Then the white noise should be the 1-step-ahead forecast errors from good models.
The mean and variance of a process, conditional on its past is another crucial characterization of dynamics with crucial implications for forecasting.
To compare the conditional and unconditional means and variances, consider the independence white noise: \({ y }_{ t }\underset { \sim }{ iid } \left( 0,{ \sigma }^{ 2 } \right) \). \(y\) has an unconditional mean and variance of zero and \({ \sigma }^{ 2 }\) respectively. Now, consider the transformational set:
$$ { \Omega }_{ t-1 }=\left\{ { y }_{ t-1 },{ y }_{ t-2 },\dots \right\} $$
Or:
$$ { \Omega }_{ t-1 }=\left\{ { \epsilon }_{ t-1 },{ \epsilon }_{ t-2 },\dots \right\} $$
The conditional mean and variance do not necessarily have to be constant. The conditional mean for the independent white noise process is:
$$ E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =0 $$
The conditional variance is:
$$ var\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =E\left( { \left( { y }_{ t }-E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) \right) }^{ 2 }|{ \Omega }_{ t-1 } \right) ={ \sigma }^{ 2 } $$
Independent white noise series have identical conditional and unconditional means and variances.
The Lag Operator
Let \(L\) denote the lag operator. This operator lags a series, as suggested by its name.
$$ L{ y }_{ t }={ y }_{ t-1 } $$
Furthermore:
$$ { L }^{ 2 }{ y }_{ t }=L\left( { L }{ y }_{ t } \right) =L\left( { y }_{ t-1 } \right) ={ y }_{ t-2 } $$
We apply the polynomial in the lag operator to operate on the series rather than the lag operator itself. A degree \(m\) polynomial in the lag operator is a linear function of powers of \(L\) to the \(m\)th factor.
That is:
$$ B\left( L \right) ={ b }_{ 0 }+{ b }_{ 1 }L+{ b }_{ 2 }{ L }^{ 2 }+\cdots +{ b }_{ m }{ L }^{ m } $$
Consider the following \(m\)th-order lag operator polynomial \({ L }^{ m }\), where:
$$ { { L }^{ m } }{ y }_{ t }={ y }_{ t-m } $$
This is a simple example of the operation on a series by a lag operator polynomial.
Let \(\Delta\) be the first-difference operator, then:
$$ \Delta { y }_{ t }=\left( 1-L \right) { y }_{ t }={ y }_{ t }-{ y }_{ t-1 } $$
The infinite-order lag polynomial operator is written as:
$$ B\left( L \right) ={ b }_{ 0 }+{ b }_{ 1 }L+{ b }_{ 2 }{ L }^{ 2 }+\cdots =\sum _{ i=0 }^{ \infty }{ { b }_{ i }{ L }^{ i } } $$
The following equation denotes the infinite distributed lag of current and past shocks:
$$ B\left( L \right) { \epsilon }_{ t }={ b }_{ 0 }{ \epsilon }_{ t }+{ b }_{ 1 }{ \epsilon }_{ t-1 }+{ b }_{ 2 }{ \epsilon }_{ t-2 }+\cdots =\sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } $$
Wold’s Theorem, the General Linear Process, and Rational Distributed Lags
Wold’s representation theorem will help us determine the appropriate model for a covariance stationary residual.
Wold’s Theorem
Assuming that \(\left\{ { y }_{ t } \right\} \) is any zero-mean covariance-stationary process. Then:
$$ { y }_{ t }=B\left( L \right) { \epsilon }_{ t }=\sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } $$
Where:
$$ { \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right) $$
Note that \({ b }_{ 0 }=1\) and \({ \Sigma }_{ i=0 }^{ \infty }{ b }_{ i }^{ 2 }<0\).
The accurate model for any covariance stationery series is the Wold’s representation. Since \({ \epsilon }_{ t }\) corresponds to the 1-step-ahead forecast errors to be incurred should a particularly good forecast be applied, the \({ \epsilon }_{ t }\)’s are the innovations.
The General Linear Process
According to Wold’s theorem, the following is the only form of models to be considered when forecasting models for covariance stationary time series are formulated:
$$ { y }_{ t }=B\left( L \right) { \epsilon }_{ t }=\sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } $$
$$ { \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right) $$
Where \({ b }_{ i }\) are the coefficients with \({ b }_{ 0 }=1\) and \({ \Sigma }_{ i=0 }^{ \infty }{ b }_{ i }^{ 2 }<0\). This is referred to as the general linear process.
Taking means and variances, the following unconditional moments are obtained:
$$ E\left( { y }_{ t } \right) =E\left( \sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } \right) =\sum _{ i=0 }^{ \infty }{ { b }_{ i }E\left( { \epsilon }_{ t-i } \right) } =\sum _{ i=0 }^{ \infty }{ { b }_{ i }\times 0 } =0 $$
And:
$$ var\left( { y }_{ t } \right) =var\left( \sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } \right) =\sum _{ i=0 }^{ \infty }{ { b }_{ i }^{ 2 }var\left( { \epsilon }_{ t-i } \right) } =\sum _{ i=0 }^{ \infty }{ { b }_{ i }^{ 2 }{ \sigma }^{ 2 } } ={ \sigma }^{ 2 }\sum _{ i=0 }^{ \infty }{ { b }_{ i }^{ 2 } } $$
Consider the information set:
$$ { \Omega }_{ t-1 }=\left\{ { \epsilon }_{ t-1 },{ \epsilon }_{ t-2 },\dots \right\} $$
The conditional mean is:
$$ E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }|{ \Omega }_{ t-1 } \right) +{ b }_{ 1 }E\left( { \epsilon }_{ t }|{ \Omega }_{ t-1 } \right) +{ b }_{ 2 }E\left( { \epsilon }_{ t-2 }|{ \Omega }_{ t-1 } \right) +\cdots $$
$$ =0+{ b }_{ 1 }{ \epsilon }_{ t-1 }+{ b }_{ 2 }{ \epsilon }_{ t-2 }+\cdots =\sum _{ i=0 }^{ \infty }{ { b }_{ i }{ \epsilon }_{ t-i } } $$
The conditional variance is:
$$ var\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =E\left( { \left( { y }_{ t }-E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) \right) }^{ 2 }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }^{ 2 }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }^{ 2 } \right) ={ \sigma }^{ 2 } $$
Rational Distributed Lags
It is not a necessity that in the lag operator infinitely many free parameters be contained, by infinite polynomials, as they hinder its practical application. Such polynomials are often referred to as the rational polynomials. The rational distributed lags are the distributed lags built from them.
Let:
$$ B\left( L \right) =\frac { \Theta L }{ \Phi L } $$
The degree of the numerator polynomial is \(q\):
$$ \Theta L=\sum _{ i=0 }^{ q }{ { \theta }_{ i }{ L }^{ i } } $$
And the degree of the denominator polynomial is \(p\):
$$ \Phi L=\sum _{ i=0 }^{ p }{ { \varphi }_{ i }{ L }^{ i } } $$
In the \(B\left( L \right)\) polynomial, there lack infinitely many parameters. In fact there are \(p + q\) parameters.
Let \(B\left( L \right)\) be approximately rational, that is:
$$ B\left( L \right) \approx \frac { \Theta L }{ \Phi L } $$
Then, an approximation of the Wold’s representation can be determined if the rational distributed lag is applied.
Estimation and Inference for the Mean, Autocorrelation, and Partial Autocorrelation Functions
Sample Mean:
A stationery covariance series has a mean given as:
$$ \mu =E{ y }_{ t } $$
According to the analog principle, estimators are developed by using sample averages in the place of expectations. Therefore, the sample mean estimates our population mean for a sample of size \(T\).
$$ \bar { y } =\frac { 1 }{ T } \sum _{ t=1 }^{ T }{ { y }_{ t } } $$
This estimate is necessary when estimating the autocorrelation function.
Sample Autocorrelations
Consider the covariance stationary series \(y\), at the displacement \(\tau\) the autocorrelation is:
$$ \rho \left( \tau \right) =\frac { E\left( \left( { y }_{ t }-\mu \right) \left( { y }_{ t-\tau }-\mu \right) \right) }{ E\left( { \left( { y }_{ t }-\mu \right) }^{ 2 } \right) } $$
When the analog principle is applied, then the following is the natural estimator as a result:
$$ \hat { \rho } \left( \tau \right) =\frac { \frac { 1 }{ T } { \Sigma }_{ t=\tau +1 }^{ T }\left( \left( { y }_{ t }-\bar { y } \right) \left( { y }_{ t-\tau }-\bar { y } \right) \right) }{ \frac { 1 }{ T } { \Sigma }_{ t=1 }^{ T }{ \left( { y }_{ t }-\bar { y } \right) }^{ 2 } } =\frac { { \Sigma }_{ t=\tau +1 }^{ T }\left( \left( { y }_{ t }-\bar { y } \right) \left( { y }_{ t-\tau }-\bar { y } \right) \right) }{ { \Sigma }_{ t=1 }^{ T }{ \left( { y }_{ t }-\bar { y } \right) }^{ 2 } } $$
This estimator is referred to as the sample autocorrelation function or correlogram.
A series that is white noise has the following distribution of the sample autocorrelations in the large sample:
$$ \hat { \rho } \left( \tau \right) \sim N\left( 0,\frac { 1 }{ T } \right) $$
Since their mean is zero, the sample autocorrelations are the unbiased estimators of the true autocorrelations, which happen to be zero.
It is important to note that at various displacements, the sample autocorrelations are approximately independent of each other. The sum of independent \({ \chi }^{ 2 }\) variables is also \({ \chi }^{ 2 }\) degrees of freedom of the summed up variables.
The following is the equation for the Box-Pierce \(Q\)-Statistic:
$$ { Q }_{ BP }=T\sum _{ \tau =1 }^{ m }{ { \hat { \rho } }^{ 2 } } \left( \tau \right) $$
Under the null hypothesis that \(y\) is white noise, it is approximately it is approximately distributed as \({ \chi }_{ m }^{ 2 }\) random variable.
The following is a slight modification of the Box-Pierce \(Q\)-Statistic that closely follows the \({ \chi }^{ 2 }\) distribution in small samples:
$$ { Q }_{ LB }=T\left( T+2 \right) \sum _{ \tau =1 }^{ m }{ { \left( \widehat { \frac { 1 }{ T-\tau } } \right) } } { \rho }^{ 2 }\left( \tau \right) $$
The distribution of \({ Q }_{ LB }\) is approximately similar to that of \({ \chi }_{ m }^{ 2 }\) random variable if our null hypothesis is that \(y\) is white noise.
Apart from the fact that a weighted sum of squared autocorrelations replaces the sum of squared autocorrelations, then the Box-Pierce \(Q\)-Statistic is similar to the Ljung-Box \(Q\)-statistic. The weights here are:
$$ \frac { T+2 }{ T-\tau } $$
Sample Partial Autocorrelations
The sample partial autocorrelations correspond to a thought experiment involving linear regression using a sample of size \(T\).
Assume that the fitted regression is:
$$ { \hat { y } }_{ t }=\hat { c } +{ \hat { \beta } }_{ 1 }{ y }_{ t-1 }+\cdots +{ \hat { \beta } }_{ \tau }{ y }_{ t-\tau } $$
Therefore, at displacement \(\tau\), the following is the sample partial autocorrelation is:
$$ \hat { p } \left( \tau \right) ={ \hat { \beta } }_{ \tau } $$
Question 1
The following sample autocorrelation estimates are obtained using 300 data points:
Lag |
1 |
2 |
3 |
Coefficient |
0.25 |
-0.1 |
-0.05 |
Compute the value of the Ljung-Box Q-statistic.
- 22.5
- 22.74
- 30
- 30.1
The correct answer is A.
$$ { Q }_{ BP }=T\sum _{ \tau =1 }^{ m }{ { \hat { \rho } }^{ 2 } } \left( \tau \right) $$
$$ = 300({0.25}^{ 2 }+ {(-0.1)}^{2}+ {(-0.05)}^{2}) = 22.5 $$
Question 2
The following sample autocorrelation estimates are obtained using 300 data points:
Lag |
1 |
2 |
3 |
Coefficient |
0.25 |
-0.1 |
-0.05 |
Compute the value of the Box-Pierce Q-statistic.
- 30.1
- 30
- 22.5
- 22.74
The correct answer is D.
$$ { Q }_{ LB }=T\left( T+2 \right) \sum _{ \tau =1 }^{ m }{ { \left( \widehat { \frac { 1 }{ T-\tau } } \right) } } { \rho }^{ 2 }\left( \tau \right) $$
$$ = 300(302)({ \frac {{0.25}^{ 2 }}{299}} + { \frac {{-0.1}^{ 2 }}{298}} + { \frac {{-0.05}^{ 2 }}{297}} ) = 22.74 $$
Note: Provided the sample size is large, the Box-Pierce and the Ljung-Box tests typically arrive at the same result.