In this chapter, we seek to estimate risk measures without making strong assumptions about the relevant distribution. We focus on how to assemble the \(P/L\) data to be used for estimating risk measures using historical simulation approach. This method will introduce parametric formulas such as GARCH volatility forecasting equations and convert non-parametric methods to semi-parametric methods which are necessary for retention of broad HS framework and accounting for ways risks differ from foreseeable horizon period to our sample period. Non-parametric methods are the most natural choice for high-dimensional problems.

**Compiling Historical Simulation Data**

Supposing we have a portfolio of \(n\) assets and for each asset \(i\), we observe return for each of \(T\) subperiods in our historical sample period. If \({ R }_{ i,t }\) is the return on asset \(I\) in subperiod \(t\), and withe amount currently invested in asset \(i\), then the historically simulated portfolio \(P/L\) over subperiod \(t\) is:

$$ { P/L }_{ t }=\sum _{ i=1 }^{ n }{ { w }_{ i }{ R }_{ i,t } } $$

This equation is the basis of \(HS\) \(VaR\) and \(ES\) which will not necessarily generate the same \(P/L\) earned in our portfolio.

**Estimation of Historical Simulation \(VaR\) and \(ES\).**

**Basic historical simulation**

Having obtained historical \(P/L\) data, \(VaR\) can be estimated by plotting \(P/L\) or \(L/P\) on a simple histogram.

**Boostrapped Historical Simulation**

This procedure involves resampling from our existing data set with replacement. To apply the bootstrap, we create a large number of new samples by randomly drawing from the original sample and replacing after it has been drawn. These resampled samples will give new \(VaR\) estimates from which the best estimate is taken as the mean. Similarly, resample-based \(ES\) estimates can be produced, each of which will be average of the losses in each resample exceeding the average \(VaR\).

**Historical Simulation Using Non-Parametric Density Estimation**

Basic HS has the practical drawback of only allowing us to estimate \(VaR\)s at discrete confidence intervals determined by the data size. It does not make the best use of the information we have. In non-parametric density estimation, data is treated like drawings from unspecified empirical distribution function. Besides using the histogram, data is represented by kernels. \(VaR\)s and \(ES\)s for any confidence levels can also be estimated and avoid constraints imposed by the data set size. We do this by drawing straight lines connecting the mid-points at the top of each histogram bar and treat the area under the lines like a pdf, therefore, estimating \(VaR\)s at any confidence level regardless of the data set size. This method is more transparent and easy to check compared to kernel methods which do not necessarily produce better estimates in practice.

**Estimating Curves and Surfaces for \(VaR\) and \(ES\)**

The methods so far discussed enable us to estimate the \(VaR\) and \(ES\) at a single holding period equal to the frequency period over which our data are observed. \(VaR\)s or \(ES\)s or other holding periods can be estimated by constructing a \(HS\) \(P/L\) series whose frequency matches the desired holding period. The only concern is, as the holding period rises, the number of observations rapidly falls causing lack of insufficiency of data.

**Estimating Confidence intervals for Historical Simulation \(VaR\) and \(ES\)**

Since the aforementioned methods do not give any indication of the precision in \(VaR\) and \(ES\) estimates, there are several methods to get around this limitation and produce confidence intervals for risk estimates.

**An Order Statistics (OS) Approach to the Estimation of Confidence Intervals for \(HS\) \(VaR\) and \(ES\)**

Application of the theory of order statistics gives \(VaR\) or \(ES\) estimate and a complete distribution function from which we can read off the \(VaR\) or \(ES\) confidence interval. The \(OS\) approach gives estimates of the 5% and 95% points of the 95% \(VaR\) distribution function (90% confidence intervals bounds for the \(VaR\) – of 1.552 and 1.797). This tells us that we can be 90% sure that the \(VaR\) lies in the range\(\left[ 1.552,1.797 \right] \). The corresponding points of the \(ES\) distribution function can be obtained by mapping from the \(VaR\) to the \(ES\).

**A Bootstrap Approach to the Estimation of Confidence Intervals For \(HS\) \(VaR\) and \(ES\)**

In this approach, a bootstrapped histogram of resample based \(VaR\) (or \(ES\)) estimates is produced from which the confidence interval is read. For ES estimation, we estimate the \(VaR\) and then estimate the \(ES\), for each new data set, as the average of losses in excess of \(VaR\). Doing the process constantly provides a large number of \(ES\) estimates which we can plot the same way as the \(VaR\) estimates.

**Weighted Historical Simulation**

\(HS\) \(P/L\) series is constructed in a way that gives any observation the same weight on \(P/L\) provided that it is less than \(n\) periods old and zero weight if older than that.

Recall that \({ R }_{ i,t }\) is the return on asset \(i\) in period \(t\), using the past \(n\) observations in implementing \(HS\). Therefore, an observation \({ R }_{ i,t -j}\) belongs to this data set if \(j\) is contained in the set of values: \({ 1 }_{ ,…, }t-n\), where \(j\) is the age of the observation. For a \(HS\) \(P/L\) series, observation \({ R }_{ i,t -j}\) will affect \({ P/L }_{ t }\), then \({ P/L }_{ t+1,…., }{ P/L }_{ 1+n }\). However, after \(n\) periods, \({ R }_{ i,t -j}\) will fall out of the data set and thereafter have no effect on \(P/L\).

**Age-Weighted Historical Simulation**

In this approach, we weight the probabilities of each observation for asset \(i\) to discount the older observation in favor of newer ones.

If \(w\left( 1 \right) \) is the probability weight given to an observation 1 day old, then: \(w\left( 2 \right) \) could be \(\lambda w\left( 1 \right) \);\(w\left( 3 \right) \) could be \({ \lambda }^{ 2 }w\left( 1 \right)\) ,…,\(w\left( i \right) \) could be \({ \lambda }^{ i-1 }w\left( 1 \right)\). Where: 0 <\({ \lambda }\)<1, and it reflects exponential decay rate in the weight as it ages.

\(W\left( 1 \right) ={ \left( 1-\lambda \right) }/{ \left( 1-{ \lambda }^{ n } \right) }\) Such that the sum of the weights is !. The weight given to an observation \(i\) days old is given by:

\(W\left( i \right) =\frac { { \lambda }^{ i-1 }\left( 1-\lambda \right) }{ 1-{ \lambda }^{ n } }\) , This corresponds to the weight of \({ 1 }/{ n }\) given to any in-sample observation under basic \(HS\).

To implement age weighting, we replace the old equal weights \(\frac { 1 }{ n } \) with age dependent weights \(W\left( i \right) \) given above. This age weighted approach is desirable because it provides a nice generalization of traditional \(HS\) which is a special case with zero decay or \(\lambda \rightarrow 1\). Secondly, an appropriate \(\lambda \) choice makes the \(VaR\) (or \(ES\)) estimates more responsive to large loss observations. It also helps to reduce distortions caused by events unlikely to occur and reduce ghost effects. Finally, age weighting can be modified in a way that makes risk estimates more efficient and effectively eliminate ghost effect that remains.

**Volatility-weighted Historical Simulation**

Suppose we are interested in forecasting \(VaR\) for day \(T\), Let \({ r }_{ t,I }\) be the historical return in asset \(i\) on day \(t\),\({ \sigma }_{ t,I }\) be the historical GARCH volatility forecast of the return on asset \(i\) for day \(t\), made at the end of day \(t-1\), And \({ \sigma }_{ T,I }\) be the most recent forecast of the volatility asset \(i\).

We replace the returns in our dataset, \({ r }_{ t,I }\) , with volatility-adjusted returns given by:

$$ { r }_{ t,i }^{ \ast }=\left( \frac { { \sigma }_{ T,i } }{ { \sigma }_{ t,i } } \right) { r }_{ t,i } $$

Hence, in any period \(t\), actual returns are increased/decreased depending on whether the current volatility forecast is greater or less than the estimated volatility period \(t\).

**Correlation-weighted Historical Simulation**

Suppose we made any volatility-based adjustment to our \(HS\) returns along the Hull-White lines and still wish to adjust the returns to reflect correlation changes. If \(m\) positions exist and \(1\times m\) vector of Historical returns \(R\) for some period \(t\) reflects an \(m\times m\) variance-covariancematrix \(\Sigma \). \(\Sigma \) in turn can be decomposedinto \(\sigma C{ \sigma }^{ T }\), where \(\sigma \) is an \(m\times m\) diagonal matrix of volatilities, \({ \sigma }^{ T }\) is its transpose, and \(C\) is \(m\times m\) matrix of historical correlations.

\(R\) reflects a historical correlation matrix \(C\) and should be adjusted to \(\hat { R } \) to reflect a current correlation matrix \(C\). Let both correlation matrices be positive definite. Therefore, each correlation matrix has an \(m\times m\) matrix square root, \(A\) and \(\hat { A } \) respectively. Therefore:

$$ R=A\varepsilon \quad \quad \left( Equation\quad 1 \right) $$

$$ \hat { R } =\hat { A } \varepsilon \quad \quad \left( Equation2 \right) $$

Inverting equation 1 gives:

$$ \varepsilon =A{ A }^{ -1 }\quad \quad \left( \varepsilon \quad is\quad the\quad uncorrelated\quad noise\quad process \right) $$

Substituting to 2 gives:

$$ \hat { R } =\hat { A } { A }^{ -1 }R $$

This is the correlation-adjusted series we intend to use.

**Filtered Historical Simulation (FHS)**

FHS combines the benefits of HS with the power and flexibility of conditional volatility models. The first step in estimating the \(VaR\) of a single-asset portfolio over a 1-day holding period is to fit a GARCH model to our portfolio return data. Thereby postulating that portfolio returns follow the model:

$$ { r }_{ t }=\mu +{ \varepsilon }_{ t } $$

$$ { \sigma }_{ t }^{ 2 }=\omega +\alpha { \left( { \varepsilon }_{ t-1 }+\gamma \right) }^{ 2 }+\beta { \sigma }_{ t-1 }^{ 2 } $$

The second step is using the model to forecast volatility for each of the days in a sample period, dividing them into the realized returns to get a set of standardized returns which in turn should be independently and identically distributed. In the third step, we bootstrap from our dataset of standardized returns. Finally, each simulated return gives a possible end-of-tomorrow portfolio value with a possible loss and the \(VaR\) is the loss corresponding to the confidence level chosen.

**Advantages and Disadvantages of Non-Parametric Methods**

**Advantages**

- They are instinctive and conceptually simple.
- They can accommodate fat tails, skewness and other abnormal features to parametric approaches.
- They can accommodate any type of position including derivative positions.
- HS works quite well empirically.
- In varying degrees, they are quite easy to implement on a spreadsheet.
- They are free of operational problems.
- They use readily available data.
- Results provided are easily reported and communicated to seniors.
- Confidence intervals for nonparametric \(VaR\) and \(ES\) are easily produced.
- When combined with add-ons they are capable of refinement and potential improvement.

**Disadvantages**

- For unusually quiet data periods, \(VaR\) and \(ES\) estimates are too low for actual risks faced.
- For unusually volatile data periods the estimates for \(VaR\) or \(ES\) produced are too high.
- Difficulty in handling shifts during sample periods.
- An extreme loss in the data set dominates non-parametric risk estimates.
- Subject to the phenomenon of ghost effect or shadow effects.
- They are constrainedby the largest loss in historical data.

**Estimating Risk Measures with Order Statistics**

For an accurate and practical way to estimate the distribution function for a risk measure, the theory of order statistics is very applicable. This is because it enables their confidence interval estimation.

** Using Order Statistics to estimate Confidence Intervals for \(VaR\).**

In a sample of \(n\) observations, the \(rth\) order statistic is the \(rth\)lowest/highest. Suppose observations \({ x }_{ 1 },{ x }_{ 2 }…{ x }_{ n }\) are from a known distribution \(F\left( x \right) \), having \(rth\) order statistic \({ x }_{ r }\). Let \({ x }_{ 1 }\le { x }_{ 2 }\le …\le { x }_{ n }\).The probability that \(j\) of the \(n\) observations not exceeding a fixed value \(x\), must obey the following distribution:

$$ Pr\left\{ jobservations\le x \right\} =\left( \begin{matrix} n \\ j \end{matrix} \right) { \left\{ F\left( x \right) \right\} }^{ j }{ \left\{ 1-F\left( x \right) \right\} }^{ n-j } $$

Therefore, the probability of at least \(r\) not exceeding \(x\) in the sample is the binomial:

$$ { G }_{ r }\left( x \right) =\sum _{ j=r }^{ n }{ \left( \begin{matrix} n \\ j \end{matrix} \right) } { \left\{ 1-F\left( x \right) \right\} }^{ n-j } $$

This is the distribution function of order statistic and the quantile or \(VaR\) and its associated confidence intervals. It can also be used to estimate percentiles of non-normal parametric \(VaR\) by replacing the distribution function \(F\left( x \right) \) by the \(t\)-distribution function, Gumbel distribution function and so on. And finally can be used for the estimation of confidence intervals of the empirical distribution function.

**The Bootstrap**

The bootstrap is used to asses the accuracy of parameter estimates and uncertainty in estimation procedures. It replaces mathematical or statistical analysis with simulation-based resampling from a given dataset. The bootstrap can also be used to provide alternative point estimates of parameters.

**Limitations of Conventional Sampling approaches.**

Let \(n\) be the sample size drawn from a population. We are interested in estimating a particular parameter \(\theta \) using a suitable sample estimator say \(\Theta \) (if \(\theta \) is mean then \(\Theta \) is the sample mean, If \(\theta \) is the variance then \(\Theta \) is sample variance and so on). To estimate confidence intervals for \(\theta \) by traditional approaches requires the statistical theory. In general, closed-form confidence intervals are of limited applicability and will definitely not apply to practical situations. These limitations help us appreciate the bootstrap.

**The Bootstrap and its Implementation**

Given an original sample size \(n\), we draw a new random sample of similar size from this original sample replacing each chosen observation in the same pool after it has been drawn. When constructing the resample, some observations are chosen more than once while others are not chosen at all, therefore, making the resample differs from the original despite every observation coming from it. With the resample, we use it to estimate the resample estimate of the parameter. Repeating the procedure, we obtain a set of \(B\) resample parameter estimates which are also regarded as the bootstrapped sample of parameter estimates which can be used to estimate a confidence interval for our parameter. This confidence interval is given by:

$$ Confidence\quad interval=\left[ { \Theta }_{ \alpha }^{ B },{ \Theta }_{ 1-\alpha }^{ B } \right] $$

Where \({ \Theta }_{ \alpha }^{ B }\) is the \(\alpha \) quantile of bootstrapped resample estimator \({ \Theta }^{ B }\left( 1 \right) \) values.

If the parameter estimators are biased, weuse the bias-corrected and accelerated \(\left( B{ C }_{ a } \right)\) approach where we replace \(\alpha \) and \(1-\alpha \) subscripts in the above equation with \({ \alpha }_{ 1 }\) and \({ \alpha }_{ 2 }\):

\( \quad \quad \quad \quad \quad \quad { \alpha }_{ 1 }=\left( { \dot { Z } }^{ o }+\frac { { \dot { Z } }^{ o }+{ Z }_{ \alpha } }{ 1-\hat { a } \left( { \dot { Z } }^{ o }+{ Z }_{ \alpha } \right) } \right) ,{ \alpha }_{ 2 }=\)ф\(\left( { \dot { Z } }^{ o }+\frac { { \dot { Z } }^{ o }+{ Z }_{ 1-\alpha } }{ 1-\hat { \alpha } \left( { \dot { Z } }^{ o }+{ Z }_{ 1-\alpha } \right) } \right) \)

If the parameters \(\hat { a } \) and \({ \dot { Z } }^{ o }\) are zero, then \(B{ C }_{ a }\) will coincide with the earlier percentile intervals. The parameter \(\hat { a } \) is the rate of change of standard error of \(\Theta \) with respect to parameter \(\theta \). It is a correction for skewness and can be estimated from the following:

$$ \hat { a } =\frac { \sum _{ i=1 }^{ M }{ { \left( \Theta -{ \Theta }^{ B }\left( i \right) \right) }^{ 3 } } }{ 6{ \left[ \sum _{ i=1 }^{ M }{ { \left( \Theta -{ \Theta }^{ B }{ \left( i \right) } \right) }^{ 2 } } \right] }^{ { 3 }/{ 2 } } } $$

We can also use the bootstrapped sample of parameter estimates for an alternative point estimator of a parameter superior than \(\Theta \). For \(B\) resample estimator, bootstrapped point estimator \({ \Theta }^{ B }\):

$$ { \Theta }^{ B }=\frac { 1 }{ B } \sum _{ i=1 }^{ B }{ { \Theta }^{ B }\left( i \right) } $$

Thebias is the difference between the expectation of the estimator and the quality estimated:

$$ bias=E\left[ \Theta \right] -\theta \Rightarrow estimatedbias={ \Theta }^{ B }-\Theta $$

The bias can have a relatively large standard error.

**Standard Errors of Bootstrap Estimators**

The estimated standard error for \(\Theta \), \({ \dot { S } }_{ B }\) ,can be obtained from:

$$ { \dot { S } }_{ B }={ \left( \frac { 1 }{ B } \sum _{ i=1 }^{ B }{ { \left( { \Theta }^{ B }\left( i \right) -{ \Theta }^{ B } \right) }^{ 2 } } \right) }^{ { 1 }/{ 2 } } $$

Where

$$ { \Theta }^{ B }=\left( { 1 }/{ B } \right) \sum _{ i=1 }^{ B }{ { \Theta }^{ B }\left( i \right) } $$

Moreover:

$$ Var\left( { \dot { S } }_{ B } \right) =Var\left[ E\left( { \dot { S } }_{ B } \right) \right] +E\left[ Var\left( { \dot { S } }_{ B } \right) \right] $$

This can be rearranged to:

$$ Var\left( { \dot { S } }_{ B } \right) =Var\left[ { \dot { m } }_{ 2 }^{ { 1 }/{ 2 } } \right] +E\left[ \frac { { \dot { m } }_{ 2 } }{ 4B } \left( \frac { { \dot { m } }_{ 4 } }{ { \dot { m } }_{ 2 }^{ 2 } } -1 \right) \right] $$

Where \({ \dot { m } }_{ i }\) is the \(ith\) moment of the bootstrap distribution of the \({ \Theta }^{ B }\left( i \right) \) .

Inthis case, the above equation reduces to:

$$ Var\left( { \dot { S } }_{ B } \right) =\frac { { { \dot { m } }_{ 4 } }/{ { \dot { m } }_{ 2 }-{ \dot { m } }_{ 2 } } }{ 4{ n }^{ 2 } } +\frac { { \sigma }^{ 2 } }{ 2nB } +\frac { { \sigma }^{ 2 }\left( \frac { { \dot { m } }_{ 4 } }{ { \dot { m } }_{ 2 }^{ 2 } } -3 \right) }{ 4{ n }^{ 2 }B } $$

For the normal distribution:

$$ Var\left( { \dot { S } }_{ B } \right) =+\frac { { \sigma }^{ 2 } }{ 2{ n }^{ 2 } } \left( 1+\frac { n }{ B } \right) $$

To choose \(B\), we take \({ \dot { S } }_{ B }\) as our value associated with infinite resamples and let \(\tau \) be a target probability close to 1 and bound chosen on the percentage deviation of \({ \dot { S } }_{ B }\) from \({ \dot { S } }_{ x }\). We want to choose \(B=B\left( bound,\tau \right) \) such that the probability of \({ \dot { S } }_{ B }\) being within the desired bound is \(\tau \):

$$ Pr\left[ 100\left| \frac { { \dot { S } }_{ B-{ \dot { S } }_{ – } } }{ { \dot { S } }_{ B } } \right| \le bound \right] =\tau $$

For large \(B\), the resample is approximately:

$$ B=\frac { 2500\left( k-1 \right) { x }_{ \tau }^{ 2 } }{ { bound }^{ 2 } } $$

**Time Dependency and the Bootstrap**

These are the ways we can modify bootstraps to allow for dependency:

- We can model the dependence parametrically and then bootstrap from the residuals which should be independent.
- Alternatively, we can use block approach by dividing sample data into non-overlapping blocks of equal length and randomly choose a block.
- Finally, we can modify the probabilities with which individual observations are chosen. This can be done by making the probabilities of selection dependent on the time indices of recently selected observations.

**Practice Questions**

1) Assume that Mrs. Barnwell a risk manager has a portfolio with only 2 positions with a historical correlation between them being 0.5. She wishes to adjust her historical returns \(R\) to reflect a current correlation of 0.8. Which of the following best reflects the 0.8 current correlation?

- \(\left( \begin{matrix} 1 & 0.3464 \\ 0 & 0.6928 \end{matrix} \right) R\)
- \(\begin{pmatrix} 0 & 0.3464 \\ 1 & 0.6928 \end{pmatrix}R\)
- \(\begin{pmatrix} 1 & 0 \\ 0.3464 & 0.6928 \end{pmatrix}R\)
- \(0.96R\)

Correct answer is **c**

Recall if \({ a }_{ i,j }\) is the \(i\), \(jth\) element of the 2 x 2 matrix \(A\), then by applying Choleski decomposition, \({ a }_{ 11 }=1\), \({ a }_{ 12 }=0\), \({ a }_{ 21 }=\rho ,{ a }_{ 22 }=\sqrt { 1-{ \rho }^{ 2 } } \). From Our data, \({ \rho }\) = 0.5, Matrix \(\overline { A } \) is similar but has a \({ \rho } = 0.8\).

Therefore:

$$ { A }^{ -1 }=\frac { 1 }{ { a }_{ 11 }{ a }_{ 22 }-{ a }_{ 12 }{ a }_{ 21 } } \begin{pmatrix} { a }_{ 22 } & { -a }_{ 12 } \\ { -a }_{ 21 } & { a }_{ 11 } \end{pmatrix} $$

Substituting

$$ \hat { R } =\overline { A } { A }^{ -1 }R $$

We get

$$ \begin{pmatrix} 1 & 0 \\ 0.8 & \sqrt { 1-{ 0.8 }^{ 2 } } \end{pmatrix}\frac { 1 }{ \sqrt { 1-{ 0.5 }^{ 2 } } } \begin{pmatrix} \sqrt { 1-{ 0.5 }^{ 2 } } & 0 \\ -0.5 & 1 \end{pmatrix}R $$

$$ =\begin{pmatrix} 1 & 0 \\ 0.3464 & 0.6925 \end{pmatrix}R $$

2) Given that the mean return from a dataset has been pre-calculated and is given as 0.04. The standard deviation has also been given as 0.32. With 90% confidence, what will be our maximum percentage loss? Assume that from our dataset, \(Z\)= -0.28 and \(N\left( Z \right) \) = 0.10 since you are to locate the value at the 10 percentile.

- 36.96%
- 11.27%
- 11.32%
- 36.72%

The correct answer is **a**

Recall that

$$ Z=\frac { X-\mu }{ \sigma } $$

From the data we are given that : \(\mu =0.04\), \(\sigma =0.32 \) and \(Z=-1.28\)

Therefore:

$$ -1.28=\frac { X-0.04 }{ 0.32 } \Rightarrow X=-1.28\left( 0.32 \right) +0.04=-0.3696 $$

$$ X=-0.3696 = 36.96\% loss $$

This means that we are 90% confident that the maximum loss will not exceed 36.96%

3) A dataset is given such that, the kurtosis in its distribution is 8, \({ x }_{ \tau }\) is 1.57 and a chosen bound on the percentage deviation given as 30.24. What is the required number of the resamples?

- 54
- 30
- 34
- 47

The correct answer is **d**.

Recall that from Standard Errors of Bootstrap Estimators:

$$ Pr\left[ 100\left| \frac { { \dot { S } }_{ B-{ \dot { S } }_{ – } } }{ { \dot { S } }_{ B } } \right| \le bound \right] =\tau $$

we have:

$$ B=\frac { 2500\left( k-1 \right) { x }_{ \tau }^{ 2 } }{ { bound }^{ 2 } } \quad \quad \quad \left( a \right) $$

We are given that \(k = 8\), \({ x }_{ \tau }=1.35\), \(bound = 22.7\). Applying these values in the equation \(\left( a \right) \) gives:

$$ B=\frac { 2500\left( 8-1 \right) \times { 1.57 }^{ 2 } }{ { 30.24 }^{ 2 } } =47.17 $$

$$ \approx 47 $$