After completing this reading you should be able to:
 Apply the bootstrap historical simulation approach to estimate coherent risk measures.
 Describe historical simulation using nonparametric density estimation.
 Compare and contrast the ageweighted, the volatilityweighted, the correlationweighted, and the filtered historical simulation approaches.
 Identify advantages and disadvantages of nonparametric estimation methods.
The Bootstrap Historical Simulation Approach to Estimating Coherent Risk Measures
Bootstrapping presents a simple but powerful improvement over basic Historical Simulation is to estimate VaR and ES. Crucially, it assumes that the distribution of returns will remain the same in the past and in the future, justifying the use of historical returns to forecast the VaR.
A bootstrap procedure involves resampling from our existing data set with replacement. A sample is drawn from the data set, its VaR recorded, and the data “returned.” This procedure is repeated over and over. The final VaR estimate from the full data set is taken to be the \(\textbf{average}\) of all sample VaRs. In fact, bootstrapped VaR estimates are often more accurate than a ‘raw’ sample estimates.
There are three key points to note regarding a basic bootstrap exercise:

We start with a given original sample of size n. We then draw a new random sample of the same size from this original sample, “returning” each chosen observation back in the sampling pool after it has been drawn.

Sampling with replacement implies that some observations get chosen more than once, and others don’t get chosen at all. In other words, a new sample, known as a resample, may contain multiple instances of a given observation or leave out the observation completely, making the resample different from both the original sample and other resamples. From each resample, therefore, we get a different estimate of our parameter of interest.

The resampling process is repeated many times over, resulting in a set of resample parameter estimates. In the end, the average of all the resample parameter estimates gives us the final bootstrap estimate of the parameter. The bootstrapped parameter estimates can also be used to estimate a confidence interval for our parameter of interest.
Equally as important is the possibility to extend the key tenets of bootstraps to estimation of the expected shortfall. Each drawn sample will have its own ES. First, the tail region is sliced up into n slices and the VaR for each of the resulting n – 1 quantiles is determined. The final VaR estimate is taken to be the average of all the tail VaRs.We then estimate the ESas the average of losses in excess of the final VaR.
As in the case of the VaR, the best estimate of the expected shortfall given the original data set is the average of all of the sample expected shortfalls.
In general, this bootstrapping technique consistently provides more precise estimates of coherent risk measures than historical simulation on raw data alone.
Bootstrapped Confidence Intervals
For a start, we know that thanks to the central limit theorem, the distribution of \(\widehat { \theta } \)often approaches normality as the number of samples gets large. In these circumstances, it would be reasonable to estimate a confidence interval for θ assuming \(\widehat { \theta }\) is approximately normal.
Given that \(\widehat { \theta }\) is our estimate of θ and \(\widehat { \sigma }\) is the estimate of the standard error of \(\widehat { \theta } \), the confidence interval at 95% is: $$ [\widehat { \theta }1.96\widehat { \sigma },\widehat { \theta } +1.96 \sigma ̂] $$
It is also possible to work out confidence intervals using percentiles of the sample distribution. The upper and lower bounds of the confidence interval are given the percentile points (or quantiles)of the sample distribution of parameter estimates.
Historical Simulation using Nonparametric Density Estimation
A huge selling point about the traditional historical approach has much to do with its simplicity. However, there’s one major drawback: due to the discrete nature of the data, it is impossible to estimate VaRs between data points. For example, if there are 100 historical observations, it would be easy to estimate the VaR at 95% or even 99%. But what about the VaR at, say, 96.5%? It would be impossible to incorporate a level of confidence of 96.5%. The point here is that with n observations, the historical simulation method only allows for n different confidence levels. Luckily, Nonparametric density estimation offers a potential solution to this problem.
So what happens? We treat our data as drawings that are free from the “shackles” of some specified distribution. The idea is to make the data “speak for itself” without making any strong assumptions about its distribution. To enable us estimate VaRs and ESs for any confidence levels, we simply draw straight lines connecting the midpoints at the top of each histogram bar (in the original data set’s distribution) and treat the area under the lines as if it were a pdf. By so doing, don’t we lose part of the data? No: by connecting the midpoints, the lower bar “receives” some area from the upper bar, which “loses” or cedes an equal amount of area. In the end, no area is lost, only displacement occurs. We still end up with a probability distribution function. The shaded area in the figure below represents a possible confidence interval that can be utilized regardless of the size of the data set.
Weighted Historical Simulation Approaches
Recall that under the historical method of estimating VaR, all of the past n observations are weighted equally, where each observation has a weight of 1/n. In other words, our HS P/L series is constructed in a way that gives any observation n periods old or less the \(\textbf {same}\) weight in our VaR, and \(\textbf {no}\) weight (i.e., a zero weight) to all observations that come after that. While simple in construction, this weighting scheme has several flaws.
First, it seems hard to justify giving each observation the same weight without taking into account its age, market volatility at the time it was observed, or the value it takes. For instance, it’s an open secret that gas prices are more volatile in the winter than in the summer, so if the sample period cuts across the two seasons of the year, the resulting VaR estimate will not reflect the true risk facing the firm. As a matter of fact, equal weights will tend to underestimate true risks in the winter and overestimate them in the summer.
Second, equal weights make the resulting risk estimates unresponsive to major events. For instance, we all know that risk increases significantly following a major destabilizing event such as a stock market crash or the start of a trade war involving one or more economies (US and China would be perfect examples). Unless a very high level of confidence is used, HS VaR estimates would not capture the increased risk following such events. The increase in risk would only reflect in subsequent dates if the market slide continued.
Third, equal weights suggest that each observation in the sample period is equally likely and independent of all the others. That is untrue because, in practice, periods of high or low volatility tend to be clustered together.
Fourth, an unusually high observation will tend to have a major influence on the VaR until n days have passed and the observation has fallen out of the sample period, at which point the VaR will fall again.
Finally, it would be difficult to justify a sudden shift of weight from 1/n on date n to zero on date n+1. In other words, it would be hard to explain why the observation on date n is important, but that on date n+1 is not.
This learning outcome looks at four improvements to the traditional historical simulation method.
Ageweighted Historical Simulation (Hybrid HS)
Instead of equal weights, we could come up with a weighting structure that discounts the older observations in favor of newer ones.
Let the ratio of consecutive weights be constant at lambda (\(\lambda\)). If w(1) is the probability weight given to an observation that’s 1 day old, then w(2), the probability given to an observation 2 days old, could be \(\lambda\)w(1); w(3), the probability weight given to an observation 3 days old, could be \(\lambda^2\) w(1); w(4) could be \(\lambda^3\) w(1,), w(5) could be \(\lambda^4\) w(1), and so on. In such a case, lambda would be a term between 0 and 1 and would reflect the exponential rate of decay in the weight as time goes. A\(\lambda\) close to 1 signifies a slow rate of decay, and a \(\lambda\) far away from 1 signifies a high rate of decay.
Under ageweighted historical simulation, therefore, the weight given to an observation i days old is given by:
$$ w(i)=\cfrac {\lambda^{i1} (1\lambda)}{(1\lambda^n ) } $$
w(1) is set such that the sum of the weights is 1.
Example
$$ w(i)=\cfrac {\lambda^{i1} (1\lambda)}{(1\lambda^n ) } \quad \quad \text{e.g. }w(6)=\cfrac {0.96^{61} (10.96)}{(10.96^{100} ) }=3.32\% $$
Advantages of the ageweighted HS method include:
 It generalizes standard historical simulation (HS) because “we can regard traditional HS as a special case with zero decay, where \(\lambda\) is essentially equal to 1.
 Choosing lambda appropriately will make VaR/ES estimates more responsive to large loss observations. A suitable choice of lambda will award a large loss event a higher weight than under traditional HS, making the resulting next day VaR higher than it would otherwise have been.
 It helps to reduce distortions caused by events that are unlikely to recur, and helps to reduce ghost effects. An unusually large loss will have its weight gradually reduced as time goes until it is “kicked out” of the historical sample size.
 Ageweighting can be modified in a way that renders VaR and ES more efficient.
Volatilityweighted Historical Simulation
Instead of weighting individual observations by proximity to the current date, we can also weight data by relative volatility. This idea was originally put forth by Hull and White to incorporate changing volatility in risk estimation. The underlying argument is that if volatility has been on the rise in the recent past, then using historical data will \(\textbf{underestimate}\) the current risk level. Similarly, if current volatility has significantly reduced, then using historical data will \(\textbf{overstate}\) the current risk level.
If \(\text r_{\text t}\),i, is the historical return in asset i on day t in our historical sample, \(\sigma_{t,i}\), the historical GARCH (or EWMA) forecast of the volatility of the return on asset i for day t, and \(\sigma_{T,i}\), the most recent forecast of the volatility of asset i, then the volatilityadjusted return is:
$$ \text r_{\text t,\text i}^*=\cfrac {\sigma_{\text T,\text i}}{\sigma_{\text t,\text i}} \text r_{\text t,\text i} $$
Actual returns in any period t will therefore increase (or decrease), depending on whether the current forecast of volatility is greater (or less than) the estimated volatility for period t.
Advantages of the volatilityweighted approach relative to equalweighted or ageweighted approaches include:
 The approach explicitly incorporates volatility into the estimation procedure. The equalweighted HS completely ignores volatility changes. Although the ageweighted approach recognizes volatility, its treatment is rather arbitrary and restrictive.
 The method produces nearterm VaR estimates that are likely to be more sensitive to current market conditions.
 Volatilityadjusted returns allow for VaR and ES estimates that that can exceed the maximum loss in our historical data set. Under traditional HS, VaR or ES cannot be bigger than the losses in our historical data set.
 Empirical evidence indicates that this approach produces VaR estimates that are superior to the VaR estimates under the ageweighted approach.
Correlationweighted Historical Simulation
Historical returns can also be adjusted to reflect changes between historical and current correlations. In other words, this method incorporates updated correlations between asset pairs. In essence, the historical correlation (or equivalently variancecovariance) matrix is adjusted to the new information environment by “multiplying” the historic returns by the revised correlation matrix to yield updated correlationadjusted returns.
Filtered Historical Simulation
The filtered historical simulation is undoubtedly the most comprehensive, and hence most complicated, of the nonparametric estimators. The method aims to combine the benefits of historical simulation with the power/flexibility of conditional volatility models(like GARCH or asymmetric GARCH).
Steps involved:
 A conditional volatility model (e.g., GARCH) is fitted to our portfolioreturn data.
 Actual returns are translated into standardized returns
 The conditional volatility model is used to forecast volatility for each of the days in a sample period.
 These volatility forecasts are then divided into the realized returns to produce a set of standardized returns that are iid (independent and identically distributed)
 A bootstrapping exercise is performed assuming a 1day VaR holding period.
 The VaR is computed
Advantages and Disadvantages of NonParametric Methods
Advantages
 They are instinctive and conceptually simple.
 They can accommodate fat tails, skewness and other abnormal features to parametric approaches.
 They can accommodate any type of position including derivative positions.
 HS works quite well empirically.
 In varying degrees, they are quite easy to implement on a spreadsheet.
 They are free of operational problems.
 They use readily available data.
 Results provided are easily reported and communicated to seniors.
 Confidence intervals for nonparametric VaR and ES are easily produced.
 When combined with addons they are capable of refinement and potential improvement.
Disadvantages
 For unusually quiet data periods, VaR and ES estimates are too low for actual risks faced.
 For unusually volatile data periods the estimates for VaR or ES produced are too high.
 Difficulty in handling shifts during sample periods.
 An extreme loss in the data set dominates nonparametric risk estimates.
 Subject to the phenomenon of ghost effect or shadow effects.
 They are constrained by the largest loss in historical data.
Question 1
Assume that Mrs. Barnwell a risk manager has a portfolio with only 2 positions with a historical correlation between them being 0.5. She wishes to adjust her historical returns \(R\) to reflect a current correlation of 0.8. Which of the following best reflects the 0.8 current correlation?
 \(\left( \begin{matrix} 1 & 0.3464 \\ 0 & 0.6928 \end{matrix} \right) R\)
 \(\begin{pmatrix} 0 & 0.3464 \\ 1 & 0.6928 \end{pmatrix}R\)
 \(\begin{pmatrix} 1 & 0 \\ 0.3464 & 0.6928 \end{pmatrix}R\)
 \(0.96R\)
Correct answer is c
Recall if \({ a }_{ i,j }\) is the \(i\), \(jth\) element of the 2 x 2 matrix \(A\), then by applying Choleski decomposition, \({ a }_{ 11 }=1\), \({ a }_{ 12 }=0\), \({ a }_{ 21 }=\rho ,{ a }_{ 22 }=\sqrt { 1{ \rho }^{ 2 } } \). From Our data, \({ \rho }\) = 0.5, Matrix \(\overline { A } \) is similar but has a \({ \rho } = 0.8\).
Therefore:
$$ { A }^{ 1 }=\frac { 1 }{ { a }_{ 11 }{ a }_{ 22 }{ a }_{ 12 }{ a }_{ 21 } } \begin{pmatrix} { a }_{ 22 } & { a }_{ 12 } \\ { a }_{ 21 } & { a }_{ 11 } \end{pmatrix} $$
Substituting
$$ \hat { R } =\overline { A } { A }^{ 1 }R $$
We get
$$ \begin{pmatrix} 1 & 0 \\ 0.8 & \sqrt { 1{ 0.8 }^{ 2 } } \end{pmatrix}\frac { 1 }{ \sqrt { 1{ 0.5 }^{ 2 } } } \begin{pmatrix} \sqrt { 1{ 0.5 }^{ 2 } } & 0 \\ 0.5 & 1 \end{pmatrix}R $$
$$ =\begin{pmatrix} 1 & 0 \\ 0.3464 & 0.6925 \end{pmatrix}R $$
Question 2
Given that the mean return from a dataset has been precalculated and is given as 0.04. The standard deviation has also been given as 0.32. With 90% confidence, what will be our maximum percentage loss? Assume that from our dataset, \(Z\)= 0.28 and \(N\left( Z \right) \) = 0.10 since you are to locate the value at the 10 percentile.
 36.96%
 11.27%
 11.32%
 36.72%
The correct answer is a
Recall that
$$ Z=\frac { X\mu }{ \sigma } $$
From the data we are given that : \(\mu =0.04\), \(\sigma =0.32 \) and \(Z=1.28\)
Therefore:
$$ 1.28=\frac { X0.04 }{ 0.32 } \Rightarrow X=1.28\left( 0.32 \right) +0.04=0.3696 $$
$$ X=0.3696 = 36.96\% loss $$
This means that we are 90% confident that the maximum loss will not exceed 36.96%
Question 3
A dataset is given such that, the kurtosis in its distribution is 8, \({ x }_{ \tau }\) is 1.57 and a chosen bound on the percentage deviation given as 30.24. What is the required number of the resamples?
 54
 30
 34
 47
The correct answer is d.
Recall that from Standard Errors of Bootstrap Estimators:
$$ Pr\left[ 100\left \frac { { \dot { S } }_{ B{ \dot { S } }_{ – } } }{ { \dot { S } }_{ B } } \right \le bound \right] =\tau $$
we have:
$$ B=\frac { 2500\left( k1 \right) { x }_{ \tau }^{ 2 } }{ { bound }^{ 2 } } \quad \quad \quad \left( a \right) $$
We are given that \(k = 8\), \({ x }_{ \tau }=1.35\), \(bound = 22.7\). Applying these values in the equation \(\left( a \right) \) gives:
$$ B=\frac { 2500\left( 81 \right) \times { 1.57 }^{ 2 } }{ { 30.24 }^{ 2 } } =47.17 $$
$$ \approx 47 $$