After completing this reading you should be able to:

- Explain how asset return distributions tend to deviate from the normal distribution.
- Explain reasons for fat tails in a return distribution and describe their implications.
- Distinguish between conditional and unconditional distributions.
- Describe the implications of regime switching on quantifying volatility.
- Evaluate the various approaches for estimating VaR.
- Compare and contrast different parametric and non-parametric approaches for estimating conditional volatility.
- Calculate conditional volatility using parametric and non-parametric approaches.
- Explain the process of return aggregation in the context of volatility forecasting methods.
- Evaluate implied volatility as a predictor of future volatility and its shortcomings.
- Explain long horizon volatility/VaR and the process of mean reversion according to an AR(1) model.
- Calculate conditional volatility with and without mean reversion.
- Describe the impact of mean reversion on long-horizon conditional volatility estimation.

## Non-Normality of Asset Return Distributions

In the last three decades, the world has witnessed a number of largely unprecedented financial crises. These include:

- The stock market crash of 2007-2008
- The U.S. Savings & Loan crisis of 1989-1991
- The Russian default crisis and LTCM Hedge Fund crisis in 1998
- The bursting of US Tech burble in 2000-2001
- The East Asian financial crisis of 1997

Like many of the financial crises before it, the 2007/2008 financial crisis brought to the fore the divergence between the normal distribution and asset return distributions. For a long time, researchers and market analysts have tried to fit parametric models to asset return data. Parametric models are models founded on a set of distributional assumptions.

Due to the wide applicability of the normal distribution and the occurrence of normality in a broad range of phenomena, analysts have tried to fit asset returns to the normal distribution. And while this approach has had some success, it has proved unreliable and grossly inaccurate, particularly in light of the continued and recurrent nature of the above-mentioned financial crises.

### Exactly How Do Asset Return Distributions Differ from the Normal Distribution?

There are several reasons as to why the normality framework has been adopted for most risk estimation attempts. For starters, the normal distribution is relatively easy to implement. All we need are two parameters – the mean, \(\mu \) , and the standard deviation, \(\sigma \), of returns. With these two, it’s possible to fully characterize the distribution and even come up with Value at Risk measures at specified levels of confidence. In the real world, however, asset returns are not normally distributed, and this can even be observed empirically. Asset return distributions exhibit several attributes of non-normality. These are:

**Fat Tails (Negative Skewness):**A defining characteristic of the normal distribution is that most data points are concentrated around the center (mean), with very few points at the tails. In fact, the distribution is symmetrical (has equal and opposite halves) with outliers that consistently diminish, resulting in smooth, narrow tails.**Exhibit 1.1 – The Normal Distribution:**While most asset return distributions have the same mean and variance as the normal distribution, it’s at the tails where differences set in. Asset returns have a fat-tailed distribution, meaning that there are

**more observations**(probability weight in the tails relative to the normal distribution).Exhibit 1.2 illustrates this phenomenon:

**Exhibit 1.2 – An Illustration of Non-normality of Asset Returns**The blue line shows the distribution of monthly dollar-hedged international equity returns (empirical data) while the orange line depicts the normal distribution. It’s evident that the blue line (representing equity returns) is not just more peaked but also has a higher density (more observations) at the extreme left than the normal distribution.

**Skewness:**As seen in Exhibit 1.2 above, the equity returns lean further to the right than the normal distribution. This phenomenon is known as negative skewness. A direct consequence of negative skewness is that the left slope of equity returns is longer than the left slope of the normal distribution, indicating a greater magnitude of extreme negative events.**Instability of Parameter Values:**Unlike the normal distribution, asset return distributions have unstable parameters (mean and standard deviation). This tendency arises from market conditions that are continually changing. This instability is evident in asset volatility where asset returns turn out to be way more volatile than predictions using the normal distribution would suggest.**Asset Returns Are Not Always**A host of traditional asset allocation models are built on the premise of normality – that asset returns are independent and identically distributed (iid). In reality, however, this rarely holds true. Empirical evidence has, for example, revealed that in some cases, one month’s return may be influenced by the return from the previous month. Therefore, the returns could be described as dependent. In such a scenario, it would be important to take such influence into account while coming up with future asset projections.**Independent and****I****dentically Distributed****(I.I.D.) R****andom****V****ariables**:**Serial Correlation:**Traditional asset allocation models assume a linear relationship between asset classes. In particular, the models assume that the relationship between variables at the extremes is similar to their relationship at less extreme values. Linear correlation inherently assumes that the joint distribution of asset returns is multivariate normal. In reality, however, correlations at the extremes are quite different than under normal conditions – They are simply not linearly correlated but actually exhibit serial correlation. As such, using the normal distribution to model extreme events underestimates the probability of joint negative returns.

## Fat Tails and Their Implications

For purposes of this reading, it’s important to keep in mind that the term ‘fat tails’ is relative: it refers to the tails of one distribution relative to the normal distribution. If a distribution has fatter tails relative to the normal distribution, it has a similar mean and variance, but probabilities at the extremes are significantly different.

When modeling asset returns, analysts focus on extreme events – those that have a low probability of occurrence but which result in catastrophic losses when they occur. There’s no point concentrating on non-extreme events that would reasonably not be expected to cause serious losses.

If we were to assume that the normal distribution holds as far as asset returns are concerned, we would not expect not even a single daily move of 4 standard deviations or more per year. The normal distribution predicts that about 99.7% of outcomes lie within 3 standard deviations from the mean.

In reality, every financial market experiences one or more daily price moves of 4 standard deviations or more every year. In fact, there’s at least one market that experiences a daily price move of 10 or more standard deviations per year. What then does this suggest?

The overriding argument is that the true distribution of asset returns has fatter tails.

### Implications of Fatter Tails

There’s a higher probability of extreme events than the normal distribution would suggest. This means that reliance on normality results in inaccurately low estimates of VaR. As a result, firms tend to be unprepared for such tumultuous events and are heavily affected, sometimes leading to foreclosure and/or large-scale ripple effects that reverberate and destabilize the entire financial system.

## Conditional vs. Unconditional Distribution

As we have seen, the phenomenon of “fat tails” is a result of what we call non-stationary parameters. This simply means that the standard deviation and/or the mean are not constant but rather keep changing over time. When this happens, the return distribution is said to be **conditional**.

**Unconditional distribution** manifests when the mean and standard deviation of asset returns are the same for any given day, regardless of market and economic conditions. In other words, even if there exists information about the distribution of asset returns that suggests the existence of different parameters, we ignore that information and assume that on any given day, the same distribution exists.

Suppose we have full sample return data, collected over a period of time, which we subdivide into two subsets, with each subset being normally distributed, based on market environment with **conditional** means and variances. If we were to collect “secondary samples” at different points in time from the full sample, we would generate fat tails in the unconditional distribution even if the conditional distributions have similar means but different volatility values. When markets are efficient (all available information is reflected in the prevailing market price), it would be highly unlikely that first moments and conditional means would change by margins big enough to make a significant difference over time.

The volatility of returns is widely known to increase or even decrease over time depending on a host of events in an economy. For example, prior to U.S. presidential elections, there normally exists significant market uncertainty in light of any potential fiscal/monetary policy changes. In those periods, the volatility of returns generally spikes to reflect the uncertainty. Similarly, when the public is anticipating interest rate changes from the Federal Reserve, volatility increases in the days leading to the announcement. The direction stock prices take in part depends on the specific moves most anticipated by a majority of investors. Fatter tails could be linked to such events.

## The Implications of Regime Switching on Quantifying Volatility

In an attempt to better model asset returns, analysts may subdivide a specified time period into regimes, where each regime has a clearly noticeable set of parameters that are markedly different from those of other regimes. For example, volatility could rise sharply during a market crash only to stabilize when conditions improve. This is the idea behind the **regime-switching volatility model**.

For illustration, let’s use real interest rates of a developed nation between 1990 and 2015:

Given the graph above, it would be difficult to identify different states of the economy. Now, suppose we make use of an econometric model to try and identify the different economic states.

From the econometric model above, we can identify three distinct states of the economy. These are precisely what we would call regimes. As we switch from one regime to another, at least one parameter has to change. So, how exactly does the regime-switching model help to better measure volatility?

The conditional distribution of returns is always normal with either low or high volatility but a constant mean. The regime-switching model captures the conditional normality and in so doing helps resolve the fat tails problem.

## Various Approaches for Estimating VaR

Approaches for estimating the value at risk can be broadly categorized either as historical-based approaches or the implied-volatility-based approach.

### Historical-based approaches

Historical-based approaches can be further subdivided into three categories: parametric, non-parametric, and hybrid.

**The parametric approach:**The parametric approach is characterized by specific distribution assumptions on conditional asset returns. In most cases, it’s assumed that returns are normally or lognormally distributed with volatility that varies with time.**The nonparametric approach:**The nonparametric approach differs from the parametric approach in that it has no assumptions regarding the distribution of asset returns. It uses historical data directly. A good example of nonparametric models is historical simulation.**Hybrid approach:**A hybrid approach borrows from both parametric and nonparametric models. The result is a wholesome approach that still makes use of historical data.

### Implied-volatility-based approach

Whereas historical approaches use historical data to gauge the volatility of an asset, the implied volatility approach is based on current market data, i.e it is forward-looking, its greatest advantage. It is informed by the idea that historical events are not always very accurate predictors of future events.

That said, it’s important to note just like historical approaches, this method doesn’t guarantee certain results. Implied volatility is a product of probabilities and other factors such as current investor sentiments, and therefore there can be no assurances that market prices will follow the predicted pattern. Another disadvantage of the model is that it is model-dependent.

## Parametric Approaches for VaR

We look at two models:

- The RiskMetric\({ s }^{ ® }\) model (EWMA)
- GARCH model

The two are both exponential smoothing weighting methods. The RiskMetric\({ s }^{ ® }\) model is an exponentially weighted moving average model that uses a decay factor,\(\lambda \), of 0.94 for daily data and 0.97 for monthly data. Both methods place more weight on recent data, and the weights decline as you go back in time (as returns become older).

Using the RiskMetric\({ s }^{ ® }\) model, the weight for the most current historical return is given by the formula:

$$ Weight=\left( 1-\lambda \right) { \lambda }^{ t } $$

The forecast is a weighted average of the previous forecast, with weight \(\lambda \), and the latest squared innovation, with weight \(\left( 1-\lambda \right)\).

$$ { \sigma }_{ t }^{ 2 }=\lambda { \sigma }_{ t-1 }^{ 2 }+\left( 1-\lambda \right) { r }_{ t-1,t }^{ 2 } $$

Using RiskMetric\({ s }^{ ® }\) model, the conditional variance is estimated as:

$$ { \sigma }_{ t }^{ 2 }=\left( 1-\lambda \right) \left( { \lambda }^{ 0 }{ r }_{ t-1,t }^{ 2 }+{ \lambda }^{ 1 }{ r }_{ t-2,t-1 }^{ 2 }+{ \lambda }^{ 2 }{ r }_{ t-3,t-2 }^{ 2 }+\cdots +{ \lambda }^{ N }{ r }_{ t-N-1,t-N }^{ 2 } \right) $$

The GARCH model assumes that the conditional variance depends on the previous conditional variance and the latest innovation. The conditional variance, \({ \sigma }_{ t }^{ 2 }\), is derived using information up to time \({ t-1 }\), with \({ r }_{ t-1,t }\) as the previous day’s return. Using the simplest model, the GARCH(1,1):

$$ { \sigma }_{ t }^{ 2 }={ a }+{ b }{ r }_{ t-1,t }^{ 2 }+c { \sigma }_{ t-1 }^{ 2 } $$

Note that RiskMetrics is a restricted case of GARCH. Consider the following two constraints on the parameters of the GARCH model:

$$ a = 0; b + c = 1 $$

We can then rewrite the GARCH model as:

$$ { \sigma }_{ t }^{ 2 }={ 0 }+{ (1-c) }{ r }_{ t-1,t }^{ 2 }+c { \sigma }_{ t-1 }^{ 2 } $$

Which is identical to the RiskMetrics model except that we substitute the letter \(c\) by the symbol \(\lambda\) :

$$ { \sigma }_{ t }^{ 2 }=\lambda { \sigma }_{ t-1 }^{ 2 }+\left( 1-\lambda \right) { r }_{ t-1,t }^{ 2 } $$

A third parametric model, the **historical standard deviation method**, differs from the RiskMetric\({ s }^{ ® }\) model and the GARCH model in that it places **equal weights** to all historical observations.

## Nonparametric vs. Parametric VaR methods

The advantages of nonparametric models over parametric models are:

- They do not require any assumptions regarding the distribution of returns to estimate VaR
- Multivariate density estimation allows for weights to vary so as to reflect data relevance of the data in light of current market conditions, regardless of the timing of the data
- Problems posed by Fat tails, skewness, and other deviations from the assumed distribution are avoided
- Multivariate density estimation exhibits a lot of flexibility in introducing dependence on economic variables

The disadvantages of nonparametric approaches compared to parametric models are:

- Subdividing the full sample data into different market regimes reduces the amount of data available for historical simulations
- Their use of data is less efficient compared to parametric models
- Multivariate density estimation requires a large amount of data that has a direct relationship with the number of conditioning variables incorporated in the model
- Multivariate density estimation may lead to data snooping or overfitting when working to come up with weighting scheme assumptions as well as the number of observations required to produce reliable volatility estimates

## Nonparametric Approaches for VaR

**Historical Simulation Method:**Under this method, all returns are weighted equally based on \(k\), the number of observations used \(\left( Weight={ 1 }/{ k } \right) \). For example, if we make return observations over a 100-day period, each return would have a weight equal to \(0.01\left( ={ 1 }/{ 100 } \right) \)**Hybrid approach:**The hybrid approach uses historical simulation to come up with estimates of return. Weights attached to each return decline exponentially. The steps required for implementation of the hybrid approach are:*Step 1:*Assign weights for historical realized returns to the most recent \(K\) returns using an exponential smoothing process. This process is represented in the sequence below:$$ \left[ \frac { 1-\lambda }{ 1-{ \lambda }^{ k } } \right] { \lambda }^{ 0 },\left[ \frac { 1-\lambda }{ 1-{ \lambda }^{ k } } \right] { \lambda }^{ 1 },\left[ \frac { 1-\lambda }{ 1-{ \lambda }^{ k } } \right] { \lambda }^{ 2 },\cdots ,\left[ \frac { 1-\lambda }{ 1-{ \lambda }^{ k } } \right] { \lambda }^{ k-1 } $$

In other words, the weight of a return registered \(n\) periods ago \(=\frac { 1-\lambda }{ 1-{ \lambda }^{ k } } \times { \lambda }^{ n-1 }\)

*Step 2:*Order the returns*Step 3:*Determine the VaR for the portfolio by starting with the lowest return and accumulating the weights until \(x\) percentage is reached.

**The Process of Return Aggregation in the Context of Volatility Forecasting Methods**

How do we compute the VaR for a portfolio that consists of several positions? There are three main approaches.

**The Variance-Covariance Method**

The variance-covariance approach presents an extension to the parametric approach. When a portfolio is comprised of more than one position, we can estimate a single VaR measurement by assuming that asset returns are all **normally distributed**.

To calculate variance-covariance VaR, we need to identify some key data:

- the weighting of the assets within the portfolio;
- the standard deviation (or volatility) of each asset’s return; and
- the correlation between the assets within the portfolio

To compute portfolio volatility, the vector of the weights of the assets in the portfolio is multiplied by the transpose of the vector of the weights of the assets multiplied by the covariance matrix of all of the assets.

The biggest setback of this approach is that correlations tend to increase during stressful market events, as happened during the 2007/08 financial crisis. When that happens, portfolio VAR may underestimate the true VAR.

**Extension of the Historical Simulation Approach**

The second approach is to extend the historical simulation approach to the portfolio. We aggregate each period’s historical returns which are weighted according to the relative size of each position. However, the weights are determined based on the market value of positions today. In other words, we disregard the original asset weightings in favor of today’s.

Since this approach does not require us to estimate any parameters, it is free from estimation errors, e.g., errors in correlation estimates even in the face of increased market turbulence.

**A Combination of the Historical Simulation Approach and the Variance-Covariance Approach**

Under this method, we aggregate the simulated returns and then apply a parametric distributional assumption to the aggregated portfolio. Crucially, the method requires estimation of N volatilities and N(N – 1)/2 correlations which brings about estimation errors. However, the method is strongly anchored on the strong law of large numbers: even if individual positions are not normally distributed, the aggregate position will be normally distributed.

**How Implied Volatility Can be Used to Predict Future Volatility and its Shortcomings**

To price assets correctly or establish their risk profile, we need to estimate the asset’s expected (future) volatility. To do so, we could work with one of two indicators – historical volatility or implied volatility.

**Historical volatility** refers to he realized (observed) volatility over a given “look back” period. It’s the volatility deduced from past movements in the price of an asset. Historical volatility is not a perfect predictor of future volatility. Events experienced in the past may not recur in the future.

**Implied volatility** is often interpreted as the market’s expectation for the future volatility of an asset. It’s the volatility that’s reflected in the current market price of an asset. In the world of stocks, for example, implied volatility of the stock can be derived from the price of the corresponding option. Specifically, implied volatility is the expected future volatility of the stock that is implied by the price of the stock’s options. Implied volatility can be deduced from option pricing models (such as the BSM model) which give the price of the option. It reflects where the marketplace expects volatility to be at in the future. As such, implied volatility is not directly observable: it needs to be worked out using current market data related to derivatives.

There are two main advantages of implied volatility over historical volatility:

- It is a forward-looking, predictive measure that reflects the market’s consensus.
- It is not restrained by historical distribution patterns

On the other hand, implied volatility has its own shortcomings:

- It is model dependent
- Options on the same underlying asset may trade at different implied volatilities. For example, deep out of the money and deep in the money options trade at a higher volatility than at the money options.
- It has limited availability because it can only be deduced when there are current market prices
- It assumes volatility will remain constant over a period of time, but volatility actually changes over time.

**Long Horizon Volatility and VaR**

In many current applications, risk management calls for volatility and VaR forecasts for horizons longer than a day or a month. To do so, managers usually apply the square root rule.

The square root rule states that **variance scales directly with time such that the volatility scales directly with the square root of time**. It follows that the J-period return volatility is equal to the square root of J times the single period return volatility, i.e.,

$$ J-period \quad Volatility = One-period \quad Volatility \times \sqrt{J} $$

$$ \sigma(r_{t,t+J}) = \sigma(r_{t,t+1})\sqrt{J} $$

We can extend this rule for VaR so that:

$$ J-period \quad VaR= One-period \quad VaR \times \sqrt{J} $$

Note however that we can only apply this rule under restrictive i.i.d. assumptions. Specifically, we assume that:

- The distribution is the same at each period (i.e., there is no predictable time variation in expected return nor in risk).
- Returns are uncorrelated across each period, i.e., volatility is constant
- The distribution is the same for one- or T-period, or is stable under addition, such as the normal distribution.

#### Example of VaR over Multiple Periods using the Square Root Rule

Suppose that the 1-period VAR is $100. Then:

- The 2-period VAR is \($100\times \sqrt{2} = $141.40\)
- The 5-period VAR is \($100\times \sqrt{5} = $223.60\)
- The 10 period VaR is \($100\times \sqrt{10} = $316.20\)

**What if Returns are Not Independent (They are Correlated)?**

As noted earlier on, volatility is indeed not constant, and this remains true for most assets in financial markets. Volatility is stochastic and is in fact autoregressive. An autoregressive process is a stationary process that has a long run mean (LRM) to which the series tends to revert. For instance, when interest rates arise above their LRM they are expected to fall and vice versa. This tendency of a process to revert to a long term average is referred to as **mean reversion**.

What this means, therefore, is that the square root of time rule is an **inaccurate and biased** means of scaling volatility in the long run. The bias – upwards or downwards, depends on today’s volatility relative to the LRM of volatility.

- When today’s volatility is above its long-run mean then we can expect a decline in volatility over the longer horizon. In these circumstances, extrapolating long horizon volatility using today’s volatility will overstate the true expected long horizon volatility.
- When today’s volatility is below its long-run mean then we can expect an increase in volatility over the longer horizon. In these circumstances, extrapolating long horizon volatility using today’s volatility will understate the true expected long horizon volatility.

**Mean Reversion According to an AR(1) Model**

The simplest form of mean reversion can be illustrated by a first-order autoregressive process [AR(1)], where the current value is based on the immediately preceding value.

$$ X_{ t+1 } = a + bX_t + e_{ t+1 } $$

The expected value of X_{t} as a function of period t information is:

$$ E[X_{ t+1 }] = a + bX_t $$

We can restate this as:

$$ E[X_{ t+1 }] = (1-b) \times \frac{ a }{ 1-b } + bX_t $$

The long-run mean of this model is evaluated as [a / (1 – b)]. The parameter of utmost interest in this long-run mean equation is b, often termed “the speed of reversion” parameter. There are two scenarios:

- If b = 1, the long-run mean is infinite (i.e., the process is a random walk, nonstationary process with an undefined long-run mean. That implies that the next period’s expected value is equal to today’s value.
- If b is less than 1, then the process is mean reverting (i.e., the time series will trend toward its long-run mean). That implies that when X
_{t}is above the LRM, it will be expected to decline and if it is below the long-run mean it is expected to increase in value.

**Conditional Volatility With and Without Mean Reversion**

Let’s consider our time series model above:

$$ X_{ t+1 } = a + bX_t + e_{ t+1 } $$

**Without mean reversion (b = 1),**

- The single period volatility is \(\sigma\)
- The two-period volatility follows the square root rule and is \(\sigma \times \sqrt{2}\)

**With mean reversion (b ****< 1),**

- The single period volatility is \(\sigma\)
- The two period volatility is \(\sqrt{1+b^2} \times \sigma\) (which is less than \(\sigma \times \sqrt{2}\))

**Impact of Mean Reversion on Long-Horizon Conditional Volatility Estimation**

If strong mean reversion exists, then the long horizon risk is smaller than it would be without mean reversion (under square root volatility).

For instance, in a convergence trade where traders make the assumption that the spread between a long position and a short position is mean reverting (i.e., b < 1), then the long horizon risk is smaller than the square root volatility as \(\sqrt{1+b^2} \times \sigma\) is less than \(\sigma \times \sqrt{2}\). Mean reversion, therefore, has the potential to create sharp differences of opinions on the risk assessment of a trade.

## Questions

### Question 1

The current estimate of daily volatility is 2.3 percent. The closing price of an asset yesterday was CAD 46. Today, the asset ended the day at CAD 47.20. Using log returns and the exponentially weighted moving average model with \(\lambda \) = 0.94, determine the updated estimate of volatility.

- 2.319%
- 0.0537%
- 2.317%
- 2.315%

The correct answer is **C**.

The updated variance is given by:

$$ { h }_{ t }=\lambda { \left( current\quad volatility \right) }^{ 2 }+\left( 1-\lambda \right) { \left( current\quad return \right) }^{ 2 } $$

Current volatility = 0.023

$$ Current\quad log-return=ln47.2–ln46=3.85439–3.82864=0.02575 $$

$$ { h }_{ t }=0.94{ \left( { 0.023 }^{ 2 } \right) }+\left( 0.06\times { 0.02575 }^{ 2 } \right) =0.000537 $$

$$ Updated\quad estimate\quad of\quad volatility=\sqrt { 0.000537 } =0.02317 $$

### Question 2

Until \({ 31 }^{ st }\quad August\quad 2017\), the South African rand had for years registered a very small historical volatility versus the U.S dollar. On \({ 1 }^{ st }\quad September\quad 2017\), the South African government opted to abandon the previously enforced currency peg. Using data from the close of business on \({ 1 }^{ st }\quad Sep\quad 2017\), which of the following methods of calculating volatility would have resulted in the smallest jump in measured historical volatility?

- Exponentially weighted with a daily decay factor of 0.94
- 60-day equal weight
- 90-day equal weight
- 250-day equal weight

The correct answer is **D**.

Once the currency peg has been lifted, the volatility as of the close of business on \({ 1 }^{ st }\quad Sep\) must have been significantly higher than the same parameter for the previous day and all prior observations. However, the exact effect on historical volatility would greatly depend on the weighting methodology applied on daily volatility records, beginning with the most recent observation (on \({ 1 }^{ st }\quad Sep\)).

With the EWMA method, the most recent observation would have a weight of \(1 – 0.94 = 0.06\)

With the 60-day MA, the most recent observation would have a weight of \({ 1 }/{ 60 }=0.0167\)

Similarly, the 90-day MA would result in a weight of \({ 1 }/{ 90 }=0.0111 \) and \({ 1 }/{ 250 }=0.004 \) for the 250-day MA.