 ### Modeling Cycles: MA, AR, and ARMA Models

After completing this reading you should be able to:

• Describe the properties of the first-order moving average (MA(1)) process, and distinguish between autoregressive representation and moving average representation.
• Describe the properties of a general finite-order process of order $$q$$ (MA($$q$$)) process.
• Describe the properties of the first-order autoregressive (AR(1)) process, and define and explain the Yule-Walker equation.
• Describe the properties of a general $$p$$th order autoregressive (AR($$p$$)) process.
• Define and describe the properties of the autoregressive moving average (ARMA) process.

## Moving Averages (MA) Models

The moving average process of finite order is considered an approximation to the Wold representation that happens to be a moving average process of infinite order. Various sorts of shocks in a time series drive all variations.

### The First-Order Moving Average (MA(1)) Process

The process is defined as:

$${ y }_{ t }={ \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }=\left( 1+\theta L \right) { \epsilon }_{ t }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

In the general MA process, and particularly the MA(1) process, a function of current and lagged unobservable shocks expresses the current value of the observed series. This is an important feature that generally defines the MA process.

The following is the equation for the unconditional mean:

$$E\left( { y }_{ t } \right) =E\left( { \epsilon }_{ t } \right) +\theta E\left( { \epsilon }_{ t-1 } \right) =0$$

And the unconditional variance is:

$$var\left( { y }_{ t } \right) =var\left( { \epsilon }_{ t } \right) +{ \theta }^{ 2 }var\left( { \epsilon }_{ t-1 } \right) ={ \sigma }^{ 2 }+{ \theta }^{ 2 }{ \sigma }^{ 2 }={ \sigma }^{ 2 }\left( 1+{ \theta }^{ 2 } \right)$$

An increase in the absolute value of $$\theta$$ causes the unconditional variance to increase, given that the value of $$\sigma$$ is constant.

Consider the following conditional information set:

$${ \Omega }_{ t-1 }=\left\{ { \epsilon }_{ t-1 },{ \epsilon }_{ t-2 },\dots \right\}$$

The set has a conditional mean of:

$$E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }|{ \Omega }_{ t-1 } \right) +\theta E\left( { \epsilon }_{ t }|{ \Omega }_{ t-1 } \right) =\theta { \epsilon }_{ t-1 }$$

And a conditional variance of:

$$var\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) =E\left( { \left( { y }_{ t }-E\left( { y }_{ t }|{ \Omega }_{ t-1 } \right) \right) }^{ 2 }|{ \Omega }_{ t-1 } \right)$$

$$E\left( { \epsilon }_{ t }^{ 2 }|{ \Omega }_{ t-1 } \right) =E\left( { \epsilon }_{ t }^{ 2 } \right) ={ \sigma }^{ 2 }$$

The current conditional expectation is not affected whatsoever by more distant shocks, but only the first lag of the shock.

The next step is to calculate the autocorrelation of the MA(1) process. We start by calculating the autocovariance function. That is:

$$\gamma \left( \tau \right) =E\left( { y }_{ t }{ y }_{ t-\tau } \right) =E\left( \left( { \epsilon }_{ t }+\theta { \epsilon }_{ t-1 } \right) \left( { \epsilon }_{ t-\tau }+\theta { \epsilon }_{ t-\tau -1 } \right) \right)$$

$$=\begin{cases} \theta { \sigma }^{ 2 },\quad \quad \tau =1 \\ 0,\quad \quad \quad Otherwise \end{cases}$$

Therefore, the autocorrelation function is defined as:

$$\rho \left( \tau \right) =\frac { \gamma \left( \tau \right) }{ \gamma \left( 0 \right) } =\begin{cases} \frac { \theta }{ 1+{ \theta }^{ 2 } } , & \tau =1 \\ 0, & Otherwise \end{cases}$$

This function happens to be the autocovariance function scaled by the variance.

The sharp cutoff in the autocorrelation function is a crucial feature in this case. Regardless of the values of MA parameters, the necessities for covariance stationarity for any MA(1) process are always met.

The MA(1) process is considered invertible if:

$$|\theta |<1$$

Therefore, the MA(1) process can be inverted and the current value of the series expressed in terms of a current shock and the lagged values of the series, instead of a current and a lagged shock. This is referred to as the autoregressive representation, and it can be calculated as follows:

The process has been defined as:

$${ y }_{ t }={ \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

We then solve for the innovation:

$${ \epsilon }_{ t }={ y }_{ t }-\theta { \epsilon }_{ t-1 }$$

The expressions for the innovations at various dates are given as follows after lagging by successively more periods:

$${ \epsilon }_{ t-1 }={ y }_{ t-1 }-\theta { \epsilon }_{ t-2 }$$

$${ \epsilon }_{ t-2 }={ y }_{ t-2 }-\theta { \epsilon }_{ t-3 }$$

$${ \epsilon }_{ t-3 }={ y }_{ t-3 }-\theta { \epsilon }_{ t-4 }$$

After backward substitution in the MA(1) process we have:

$${ y }_{ t }={ \epsilon }_{ t }+\theta { y }_{ t-1 }-{ \theta }^{ 2 }{ y }_{ t-2 }+{ \theta }^{ 3 }{ y }_{ t-3 }-\cdots$$

The finite autoregressive representation can be expressed as follows using the lag operator notation:

$$\frac { 1 }{ 1+\theta L } { y }_{ t }={ \epsilon }_{ t }$$

Since $$\theta$$, in the back substitution, is raised to some progressively higher powers, if $$|\theta |<1$$ only then will a convergent autoregressive representation exist.

The only root of the MA(1) lag operator polynomial is the solution to:

$$1+{\theta L}=0$$

Which is:

$$L=-\frac { 1 }{ \theta }$$

This implies that if $$|\theta |<1$$ then its inverse will be lower than 1 in absolute value.

The next step is to evaluate the partial autocorrelation function for the MA(1) process. This function will have a gradual decay to zero. In a sequence of progressive higher-order autoregressive approximations, the coefficients on the last included lag are the partial autocorrelations.

We have shown that for the general autoregressive representation:

$${ y }_{ t }={ \epsilon }_{ t }+\theta { y }_{ t-1 }-{ \theta }^{ 2 }{ y }_{ t-2 }+{ \theta }^{ 3 }{ y }_{ t-3 }-\cdots$$

When $$\theta=0.5$$ we have that:

$${ y }_{ t }={ \epsilon }_{ t }+0.5{ y }_{ t-1 }-{ 0.5 }^{ 2 }{ y }_{ t-2 }+{ 0.5 }^{ 3 }{ y }_{ t-3 }-\cdots$$

A similar damped oscillation is observed for the partial autocorrelations.

## The Finite-Order Moving Average Process of order $$q$$, MA($$q$$), Process

For MA($$q$$) process, we have that:

$${ y }_{ t }={ \epsilon }_{ t }+{ \theta }_{ 1 }{ \epsilon }_{ t-1 }+\cdots +{ \theta }_{ q }{ \epsilon }_{ t-q }=\Theta \left( L \right) { \epsilon }_{ t }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

The $$q$$th-order lag operator polynomial is given as, $$\Theta \left( L \right)$$.

Where:

$$\Theta \left( L \right) =1+{ \theta }_{ 1 }L+\cdots +{ \theta }_{ q }{ L }^{ q }$$

The MA($$q$$) process is a generalized representation of the MA(1) process. This means that the MA(1) process is a special case of the MA($$q$$) process, with $$q$$ being equal to 1.

Therefore, the MA($$q$$) and the MA(1) processes have properties that are similar in all aspects. When $$q>1$$, the MA($$q$$) lag operator polynomial has $$q$$ roots, and there are chances of ending up with complex roots.

For invertibility of the MA($$q$$) process to exist, all the roots must have inverses that are inside the unit circle. This enables us to have the convergent autoregressive representation:

$$\frac { 1 }{ \Theta \left( L \right) } { y }_{ t }={ \epsilon }_{ t }$$

Depending on the information set, the MA($$q$$) process’ conditional mean changes accordingly. However, the unconditional moments are fixed. The $$q$$ lags of the innovation in the MA($$q$$) process are the determining factors for the conditional mean. This makes the MA($$q$$) process have a potentially longer memory, which is clearly observed in its autocorrelation function where all the autocorrelations are zero, beyond the $$q$$th displacement.

The defining property of the moving average process is this autocorrelation cutoff.

According to Wold’s representation:

$${ y }_{ t }=B\left( L \right) { \epsilon }_{ t }$$

With the order of $$B\left( L \right)$$ being infinite. The infinite order polynomial $$B\left( L \right)$$ is approximated by applying the first order polynomial $$1+{ \theta }L$$, as the MA(1) model is being fit.

Even better approximations to the Wold representation can be provided by the MA($$q$$) processes. The infinite moving average is approximated by the MA($$q$$) process, with a moving average of finite order,

$${ y }_{ t }=\Theta \left( L \right) { \epsilon }_{ t }.$$

## Autoregressive Models (AR) Models

This another approximation to the Wold representation. The autoregressive process is a simple stochastic difference equation. In discrete time-stochastic dynamic modeling, the natural vehicle is the stochastic difference equations.

### The AR(1) process

The following equation is the AR(1) for short, in the AR(1) process:

$${ y }_{ t }={ \epsilon }_{ t }+\varphi { y }_{ t-1 }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

It can also be expressed in the lag operator form as follows:

$$\left( 1-\varphi L \right) { y }_{ t }={ \epsilon }_{ t }$$

It is also important to note that a finite-order moving average process is always covariant stationery. However, for invertibility, certain conditions have to be met. But for autoregressive process invertibility always exist. However, covariance stationarity in the autoregressive process requires some conditions to be satisfied.

For the AR(1) process:

$${ y }_{ t }=\varphi { y }_{ t-1 }+{ \epsilon }_{ t }$$

Then on the right hand side backward substitution for the lagged $$y$$’s is done to obtain:

$${ y }_{ t }={ \epsilon }_{ t }+\varphi { \epsilon }_{ t-1 }+{ \varphi }^{ 2 }{ \epsilon }_{ t-2 }+\cdots$$

And this can be expressed in the following manner in the lag operator form:

$${ y }_{ t }=\frac { 1 }{ 1-\varphi L } { \epsilon }_{ t }$$

For convergence to exist in this moving average representation for $$y$$, then $$|\varphi |<1$$. Therefore, in the AR(1) process, the condition for covariance stationarity is $$|\varphi |<1$$.

The unconditional mean can be calculated as:

$$E\left( { y }_{ t } \right) =E\left( { \epsilon }_{ t }+\varphi { \epsilon }_{ t-1 }+{ \varphi }^{ 2 }{ \epsilon }_{ t-2 }+\cdots \right)$$

$$=E\left( { \epsilon }_{ t } \right) +\varphi E\left( { \epsilon }_{ t-1 } \right) +{ \varphi }^{ 2 }E\left( { \epsilon }_{ t-2 } \right) +\cdots$$

$$=0$$

And the unconditional variance is calculated as:

$$var\left( { y }_{ t } \right) =var\left( { \epsilon }_{ t }+\varphi { \epsilon }_{ t-1 }+{ \varphi }^{ 2 }{ \epsilon }_{ t-2 }+\cdots \right)$$

$$={ \sigma }^{ 2 }+{ \varphi }^{ 2 }{ \sigma }^{ 2 }+{ \varphi }^{ 4 }{ \sigma }^{ 2 }+\cdots$$

$$={ \sigma }^{ 2 }\sum _{ i=0 }^{ \infty }{ { \varphi }^{ 2i } }$$

$$= \frac { { \sigma }^{ 2 } }{ 1-{ \varphi }^{ 2 } }$$

The conditional moments are:

$$E\left( { y }_{ t }|{ y }_{ t-1 } \right) =E\left( \varphi { y }_{ t-1 }+{ \epsilon }_{ t }|{ y }_{ t-1 } \right)$$

$$=\varphi E\left( { y }_{ t-1 }|{ y }_{ t-1 } \right) +E\left( { \epsilon }_{ t }|{ y }_{ t-1 } \right)$$

$$=\varphi { y }_{ t-1 }+0$$

$$=\varphi { y }_{ t-1 }$$

And:

$$Var\left( { y }_{ t }|{ y }_{ t-1 } \right) =var\left( \varphi { y }_{ t-1 }+{ \epsilon }_{ t }|{ y }_{ t-1 } \right)$$

$$={ \varphi }^{ 2 }var\left( { y }_{ t-1 }|{ y }_{ t-1 } \right) +var\left( { \epsilon }_{ t }|{ y }_{ t-1 } \right)$$

$$=0+{ \sigma }^{ 2 }$$

$$={ \sigma }^{ 2 }$$

For the autocovariances we have:

$${ y }_{ t }=\varphi { y }_{ t-1 }+{ \epsilon }_{ t }$$

Both sides of the equation are multiplied by $${ y }_{ t-\tau }$$, such that:

$${ y }_{ t }\times { y }_{ t-\tau }=\varphi \times { y }_{ t-\tau }\times { y }_{ t-1 }+{ y }_{ t-\tau } \times { \epsilon }_{ t }$$

For $$\tau \ge 1$$,when we take expectations of both sides we obtain:

$$\gamma \left( \tau \right) =\varphi \gamma \left( \tau -1 \right)$$

This results in the Yule-Walker Equation: Given $$\gamma \left( \tau \right)$$, for any $$\tau$$ , $$\gamma \left( \tau -1 \right)$$ can be obtained in accordance with the Yule-Walker equation. Let $$\gamma \left( 0 \right)$$ be the variance of the process, such that:

$$\gamma \left( 0 \right) =\frac { { \sigma }^{ 2 } }{ 1-{ \varphi }^{ 2 } }$$

$$\gamma \left( 1 \right) =\varphi \frac { { \sigma }^{ 2 } }{ 1-{ \varphi }^{ 2 } }$$

$$\gamma \left( 2 \right) ={ \varphi }^{ 2 }\frac { { \sigma }^{ 2 } }{ 1-{ \varphi }^{ 2 } }$$

This can be generally expressed as:

$$\gamma \left( \tau \right) ={ \varphi }^{ \tau }\frac { { \sigma }^{ 2 } }{ 1-{ \varphi }^{ 2 } } ,\quad \quad \quad \tau =0,1,2,\dots$$

We then divide through by $$\gamma \left( 0 \right)$$ and we obtain the autocorrelations,

$$\rho \left( \tau \right) ={ \varphi }^{ \tau },,\quad \quad \quad \quad \tau =0,1,2,\dots$$

A positive $$\varphi$$ implies a one-sided autocorrelation decay, while a negative $$\varphi$$ implies that the decay is about back and forth oscillations.

The AR(1) process has a partial autocorrelation function that cuts off abruptly.

Therefore:

$$\rho \left( \tau \right) =\begin{cases} \varphi , & \tau =1 \\ 0, & \tau >1 \end{cases}$$

The first partial autocorrelation is the autoregressive coefficient, with all the longer lags having coefficients of zero, for the true process that is an AR(1).

## AR($$p$$) Process

The following is the equation of a general $$p$$th order autoregressive process, AR($$p$$):

$${ y }_{ t }={ \varphi }_{ 1 }{ y }_{ t-1 }+{ \varphi }_{ 2 }{ y }_{ t-2 }+\cdots +{ \varphi }_{ p }{ y }_{ t-p }+{ \epsilon }_{ t }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

This can also be expressed in the following way, as a lag operator:

$$\Phi \left( L \right) { y }_{ t }=\left( 1-{ \varphi }_{ 1 }L-{ \varphi }_{ 2 }{ L }^{ 2 }-\cdots -{ \varphi }_{ p }{ L }^{ p } \right) { y }_{ t }={ \epsilon }_{ t }$$

Covariance stationarity in the AR($$p$$) process occurs iff all the roots of the autoregressive lag operator polynomial $$\Phi \left( L \right)$$ have inverses that fall inside the unit circle. Here, the process can be written in the form of a convergent infinite moving average:

$${ y }_{ t }=\frac { 1 }{ \Phi \left( L \right) } { \epsilon }_{ t }$$

At displacement $$p$$, the cutoff for the AR($$p$$) partial autocorrelation is sharp.

## Autoregressive Moving Average (ARMA) Models

These are models combined with a view of obtaining a better approximation to the Wold representation. The result is the autoregressive moving average, ARMA ($$p,q$$), process. The ARMA(1,1) is the simplest ARMA process which is neither a pure autoregression or a pure moving average. That is:

$${ y }_{ t }=\varphi { y }_{ t-1 }+{ \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

Or as a lag operator:

$$\left( 1-\varphi L \right) { y }_{ t }=\left( 1+\theta L \right) { \epsilon }_{ t }$$

For stationarity and invertibility, the absolute value of $$\theta$$ should be less than one. With covariance stationarity, then the moving average representation is:

$${ y }_{ t }=\frac { \left( 1+\theta L \right) }{ \left( 1-\varphi L \right) } { \epsilon }_{ t }$$

This happens to be an infinite distributed lag of current and past innovations. The following is the infinite autoregressive representation with the invertibility condition satisfied:

$$\frac { \left( 1-\varphi L \right) }{ \left( 1+\theta L \right) } { y }_{ t }={ \epsilon }_{ t }$$

The ARMA(1,1) is a special case of the ARMA($$p,q$$) process, such that:

$${ y }_{ t }={ { \varphi }_{ 1 }y }_{ t-1 }+{ { \varphi }_{ 2 }y }_{ t-2 }+\cdots +{ { \varphi }_{ p }y }_{ t-p }+{ \epsilon }_{ t }+{ { \theta }_{ 1 }\epsilon }_{ t-1 }+{ { \theta }_{ 2 }\epsilon }_{ t-2 }+\cdots +{ { \theta }_{ q }\epsilon }_{ t-q }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

Or:

$$\Phi \left( L \right) { y }_{ t }=\Theta \left( L \right) { \epsilon }_{ t }$$

Where:

$$\Phi \left( L \right) =1-{ \varphi }_{ 1 }L-{ \varphi }_{ 2 }{ L }^{ 2 }-\cdots { \varphi }_{ p }{ L }^{ p }$$

And:

$$\Theta \left( L \right) =1+{ \theta }_{ 1 }L+{ \theta }_{ 2 }{ L }^{ 2 }+\cdots +{ \theta }_{ q }{ L }^{ q }$$

The process is considered covariance stationery with a convergent infinite moving average representation, in the event that all the inverses of the roots of $$\Phi \left( L \right)$$ fall inside the unit circle.

That is:

$${ y }_{ t }=\frac { \Theta \left( L \right) }{ \Phi \left( L \right) } { \epsilon }_{ t }$$

The process is considered invertible with a convergent infinite autoregressive representation, in the event that all the inverses of the roots of $$\Theta \left( L \right)$$ fall inside the unit circle.

That is:

$$\frac { \Phi \left( L \right) }{ \Theta \left( L \right) } { y }_{ t }={ \epsilon }_{ t }$$

## Application: Specifying and Estimating Models for Employment Forecasting

Moving averages are nonlinear in parameters. We need to understand how they are nonlinear and their estimation. The following is an invertible MA(1) model having a mean that is not zero:

$${ y }_{ t }=\mu +{ \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }$$

If we undertake an $$m$$ times backward substitution, then the following autoregression approximation will be obtained:

$${ y }_{ t }\approx \frac { \mu }{ 1+\theta } +\theta { y }_{ t-1 }-{ \theta }^{ 2 }{ y }_{ t-2 }+\cdots +{ \left( -1 \right) }^{ m-1 }{ \theta }^{ m }{ y }_{ t-m }+{ \epsilon }_{ t }$$

This implies that an invertible moving average can be approximated as an autoregression of finite-order, with better approximations being obtained when the value of $$m$$ is increased. Therefore, the residuals can be expressed approximately in terms of the observed data and then solve for the parameters minimizing the sum of squared residuals, (using a computer):

$$\hat { \mu } ,\hat { \theta } =\underset { \mu ,\theta }{ argmin } \sum _{ t=1 }^{ T }{ { \left( { y }_{ t }-\left( \frac { \mu }{ 1+\theta } +\theta { y }_{ t-1 }-{ \theta }^{ 2 }{ y }_{ t-2 }+\cdots +{ \left( -1 \right) }^{ m-1 }{ \theta }^{ m }{ y }_{ t-m } \right) \right) }^{ 2 } }$$

$${ \hat { \sigma } }^{ 2 }=\frac { 1 }{ T } \sum _{ t=1 }^{ T }{ { \left( { y }_{ t }-\left( \frac { \hat { \mu } }{ 1+\hat { \theta } } +\hat { \theta } { y }_{ t-1 }-{ \hat { \theta } }^{ 2 }{ y }_{ t-2 }+\cdots +{ \left( -1 \right) }^{ m-1 }{ \hat { \theta } }^{ m }{ y }_{ t-m } \right) \right) }^{ 2 } }$$

We apply the numerical approximation models to determine the parameter estimates since the autoregressive approximation has restricted parameters.

There are also some alternative approximations of autoregressions is one of them. We can apply the ordinary least-squares regression to conveniently estimate the autoregressions.

Assuming that we have the following AR(1) model:

$$\left( { y }_{ t }-\mu \right) =\varphi \left( { y }_{ t-1 }-\mu \right) +{ \epsilon }_{ t }$$

$${ \epsilon }_{ t }\sim WN\left( 0,{ \sigma }^{ 2 } \right)$$

This can as well be written as:

$$\left( { y }_{ t } \right) =c+\varphi { y }_{ t-1 }+{ \epsilon }_{ t }$$

And:

$$c=\mu \left( 1-\varphi \right)$$

Then the least squares estimators will be:

$$\hat { c } ,\hat { \mu } =\underset { c,\mu }{ argmin } \sum _{ t=1 }^{ T }{ { \left( { y }_{ t }-c-\varphi { y }_{ t-1 } \right) }^{ 2 } }$$

And:

$${ \hat { \sigma } }^{ 2 }=\frac { 1 }{ T } \sum _{ t=1 }^{ T }{ { \left( { y }_{ t }-\hat { c } -\hat { \varphi } { y }_{ t-1 } \right) }^{ 2 } }$$

$$\mu$$ has an implied estimate of:

$$\hat { \mu } =\frac { \hat { c } }{ 1-\hat { \mu } }$$

## Question

Assume the shock in a time series is approximated by Gaussian white noise. Yesterday’s realization, y(t) was 0.015 and the lagged shock was -0.160. Today’s shock is 0.170.

If the weight parameter theta, θ, is equal to 0.70, determine today’s realization under a first-order moving average, MA(1), process.

1. -4.205
2. 4.545
3. 0.058
4. 0.282

The correct answer is C.

Today’s shock = $${ \epsilon }_{ t }$$; yesterday’s shock = $${ \epsilon }_{ t-1 }$$; today’s realization = $${ y }_{ t }$$; yesterday’s realization = $${ y }_{ t-1 }$$.

The MA(1) is given by:

$${ y }_{ t }={ \epsilon }_{ t }+\theta { \epsilon }_{ t-1 }$$

$${ y }_{ t }= 0.170 + 0.7(-0.160) = 0.058$$