In this chapter, the importance and challenges of extreme values in risk management will be explained. The extreme value theory will be explained and its application to risk management will be studied. Furthermore, the generalized extreme value and peaks-over-threshold (POT) will be compared and contrasted, with a detailed description of the POT approach.

Moreover, by the end of the chapter, the learner should be able to do an evaluation of the tradeoffs applied in setting the level of threshold when the GP distribution is used. Finally. the application of multivariate EVT will be studied for risk management.

# Generalized Extreme Value Theory (EVT)

Let \(X\) be an independent identically distributed random variable from some unknown distribution \(F\left( x \right) =Prob\left( X\le x \right) \).The approximation of the extreme risk linked with the distribution of \(X\) is the task which is problematic as \(F\left( x \right)\) is unspecified.

Suppose a sample size \(n\) is drawn from \(F\left( x \right)\) whose maximum is \({ M }_{ n }^{ 2 }\).\({ M }_{ n }\) Can be considered an extreme value under the condition that \(n\) is large.The following generalized extreme value (\(GEV\)) distribution is the convergence of the extreme distributions as \(n\) grows larger:

$$ { H }_{ \xi ,\mu ,\sigma }=\begin{cases} exp\left[ { -\left( 1+\xi \frac { x-\mu }{ \sigma } \right) }^{ \frac { -1 }{ \xi } } \right] if\quad \xi \neq 0 \\ exp\left[ -exp\left( -\frac { x-\mu }{ \sigma } \right) \right] if\quad \xi =0 \end{cases}\quad \quad \quad \quad \quad \left( a \right) $$

Where:

$$ 1+\frac { \xi \left( x-\mu \right) }{ \sigma } >0 $$

\(\mu \) is a measure of the central tendency of \({ M }_{ n }\) and is the location parameter of the limiting distribution; \(\sigma \) evaluates the dispersion of \({ M }_{ n }\) and is the scale parameter of the limiting distribution; and \(\xi \) indicates the tail shape of the limiting distribution and is referred to as the tail index.

Consider the following cases:

- \(\xi >0\) is used under the condition that the \(F\left( x \right)\) tail is heavy and follows the power function, hence the GEV is the Fr\(\acute { e } \)chet distribution;
- \(\xi =0\) applies if \(F\left( x \right)\) has tails that are exponential, and relatively light tails. The GEV is the Gumbel distribution; and
- \(\xi =0\); Here the tail of the \(F\left( x \right)\) is abnormally lighter and the GEV is the Weibull distribution.

Setting the left hand side of equation (\(a\)) to \(p\) enables us to find the quantities associated with the GEV distribution. Taking \(ln\) of both sides:

$$ ln\left( p \right) =\begin{cases} { -\left( 1+\xi \left( \frac { x-\mu }{ \sigma } \right) \right) }^{ -\frac { 1 }{ \xi } }if\quad \xi \neq 0 \\ -exp\left( -\left( \frac { x-\mu }{ \sigma } \right) \right) \quad if\quad \xi =0\quad \end{cases} $$

Where:

$$ x=\mu -\frac { \sigma }{ \xi } \left[ 1-{ \left( -ln\left( p \right) \right) }^{ -\xi } \right] \quad \quad \quad \quad \quad \quad \left( Fr\acute { e } chet,\xi >0 \right) $$

$$ x=\mu -\sigma ln\left[ -ln\left( p \right) \right] \quad \quad \quad \quad \quad \quad \left( Gumbet,\xi =0 \right) $$

## A Short-Cut EV Method

The short-cut method of approximating VaR or ES via the EV theory has \(\xi >0\) as its basis, an extreme loss distribution chain follows a power-law times a slowly varying function.

$$ F\left( x \right) =k\left( x \right) { x }^{ -\frac { 1 }{ \xi } }\quad \quad \quad \quad \quad \left( i \right) $$

\(k\left( x \right)\) varies slowly with \(x\). If \(k\left( x \right)\) is approximately constant, then:

$$ F\left( x \right) \approx k{ x }^{ -\frac { 1 }{ \xi } }\quad \quad \quad \quad \quad \left( ii \right) $$

If there are two likelihoods, the in-sample probability \({ p }_{ in-sample }\) and out-of-sample probability \({ p }_{ out-of-sample }\),then equation (\(ii\)) implies:

$$ { p }_{ in-sample }\approx { kx }_{ in-sample }^{ -\frac { 1 }{ \xi } } $$

$$ { p }_{ out-of-sample }\approx { kx }_{ out-of-sample }^{ -\frac { 1 }{ \xi } } $$

Therefore:

$$ \frac { { p }_{ in-sample } }{ { p }_{ out-of-sample } } \approx { \left( \frac { { x }_{ in-sample } }{ { x }_{ out-of-sample } } \right) }^{ -\frac { 1 }{ \xi } } $$

$$ \Rightarrow { x }_{ out-of-sample }\approx { x }_{ in-sample }{ \left( \frac { { p }_{ in-sample } }{ { p }_{ out-of-sample } } \right) }^{ \xi } $$

Therefore, using a known in-sample quantile \(\left( { x }_{ in-sample } \right) \), a known out-of-sample probability \(\left( { p }_{ out-of-sample } \right) \),and an unknown in-sample probability \(\left( { p }_{ in-sample } \right) \),one quantile denoted as \( { x }_{ out-of-sample } \) can be approximated.

Using its empirical counterpart \(\left( { t }/{ n } \right) \), the \( { p }_{ in-sample } \) can be proxied easily with \(n\) being the sample size, and the number of observations higher than \( { x }_{ in-sample } \) being \(t\).

Therefore:

$$ { x }_{ out-of-sample }\approx { x }_{ in-sample }{ \left( \frac { { np }_{ out-of-sample } }{ t } \right) }^{ -\xi } $$

Note that this is easy to approximate by applying the data that is given.

## Estimation of EV Parameters

Relevant EV parameters \(\mu ,\sigma \) have to be approximated in order for EV risk measures to be estimated for:

- maximum likelihood (ML) methods;
- regression methods; and
- moment-based or semi-parametric methods.

## ML Estimation Methods

From a given data, the most likely estimators can be derived via ML methods. They are obtained by maximizing the likelihood function. A likelihood or log-likelihood function should be created first if ML is to be used.

The following is the log-likelihood function with \(m\) observations for \({ M }_{ n }\):

$$ |\left( { { \mu }_{ n },\sigma }_{ n } \right) =-m\quad ln\left( { \sigma }_{ n } \right) -\sum _{ i=1 }^{ m }{ exp\left( -\frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } \right) } -\sum _{ i=1 }^{ m }{ \frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } } $$

When \(\xi \neq 0\), the log-likelihood function is:

$$ |\left( { { \mu }_{ n },\sigma }_{ n },{ \xi }_{ n } \right) =-m\quad ln\left( { \sigma }_{ n } \right) -\left( 1+\frac { 1 }{ { \xi }_{ n } } \right) \sum _{ i=1 }^{ m }{ ln\left[ { 1+\xi }_{ n }\left( \frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } \right) \right] } -\sum _{ i=1 }^{ m }{ ln{ \left[ { 1+\xi }_{ n }\left( \frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } \right) \right] }^{ -\frac { 1 }{ { \xi }_{ t } } } } $$

The condition that all observations \({ M }_{ n }^{ i }\) fulfill \(1+\frac { \xi \left( { M }_{ n }^{ i }-\mu \right) }{ \sigma } >0\) is necessary for the maximization.

## Regression Methods

The sample of \({ M }_{ n }^{ i }\) values must first be ordered from the lowest to the highest. Since these are the ordered statistics, then for a large \(n\):

$$ E\left[ H\left( { M }_{ n }^{ i } \right) \right] =\frac { i }{ 1+m } \Rightarrow H\left( { M }_{ n }^{ i } \right) \approx \frac { i }{ 1+m } $$

The cumulative density function for the maxima is \(H\left( { M }_{ n }^{ i } \right) \). For \(\xi \neq 0\) we have that:

$$ \frac { i }{ 1+m } \approx exp\left[ -{ \left( 1+\frac { { \xi }_{ n }\left( { M }_{ n }^{ i }-{ \mu }_{ n } \right) }{ { \sigma }_{ n } } \right) }^{ -\frac { 1 }{ \xi } } \right] $$

Taking the logs twice of both sides, we get:

$$ log\left[ -log\left( \frac { i }{ 1+m } \right) \right] \approx -\frac { 1 }{ { \xi }_{ n } } log\left[ 1+{ \xi }_{ n }\left( \frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } \right) \right] $$

$$ A\quad regression\quad of\quad log\left[ -log\left( \frac { i }{ 1+m } \right) \right] \approx \left( \frac { { M }_{ n }-{ \mu }_{ n } }{ { \sigma }_{ n } } \right). $$

And the recovery of parameter estimates from a regression is obvious.

## Semi-Parametric Estimation Methods

These are methods used typically for the approximation of the tail index \(\xi\), with the most commonly used being the Hill Estimator. Given that \({ X }_{ 1 },{ X }_{ 2 }.\dots { X }_{ n, }\), denoted from the highest to the smallest, then the Hill \({ \xi }_{ n,k }^{ \left( H \right) }\) is given by:

$$ { \xi }_{ n,k }^{ \left( H \right) }=\frac { 1 }{ k } { \Sigma }_{ i=1 }^{ k }ln{ X }_{ i }-ln{ X }_{ k+1 } $$

Where \(k\) is the threshold applied in the approximation of the Hill Estimator which averages the most extreme \(k\) observations less the \(k +1\)th observation or the one next to the tail.

However, choosing a cut-off value for \(k\) is the main challenge practically. To counter this, Hill Estimators should be estimated for a variety of \(k\)-values, with the plot of estimators against \(k\)-values being horizontal. Therefore, the maximum possible information will be extracted from all our data even though the procedure is informal.

# The Peaks-Over-Threshold Approach: The Generalized Pareto Distribution

## Theory

Suppose that \(X\) is a random independent identically distributed (of random variables) loss whose distribution function is \(F\left( x \right) \) with \(u\) being the threshold function of \(X\), then the distribution of excess losses over our threshold \(u\) can be defined in the following ways:

$$ { F }_{ u }\left( x \right) =Pr\left\{ X-u\le x|X>u \right\} =\frac { F\left( x+u \right) -F\left( u \right) }{ 1-F\left( u \right) } \quad \quad \quad \quad \forall x>0\quad \quad \quad \quad \left( I \right) $$

This is the likelihood of a loss exceeding the threshold \(u\) by a maximum of \(x\), if it exceeds the threshold. According to the Gnedenko-Pickands-Balkema-deHaan (GPBdH) theorem, an increase in \(u\) leads to a converging of distribution \({ F }_{ u }\left( x \right)\) to a generalized Pareto distribution defined by:

$$ { G }_{ \xi ,\beta }\left( x \right) =\begin{cases} 1-{ \left( 1+\frac { \xi x }{ \beta } \right) }^{ -\frac { 1 }{ \xi } }for\quad \xi \neq 0 \\ 1-exp\left( -\frac { x }{ \beta } \right) \quad for\quad \xi =0 \end{cases} \quad \quad \quad \quad \left( II \right) $$

$$ \forall x\ge 0\quad and\quad 0\le \xi \ge 0 $$

The parameters in this distribution are \(\beta\), which is a positive scale parameter, and \(\xi\) which is the shape or tail index parameter that can be positive.

The deduction from this theorem is that the form of the distribution of excess losses is always the same irrespective of the losses’ distributions themselves. A reasonable threshold \(u\) should be chosen involving a trade-off, for the GP distribution to be used. The quantity of the observations, \({ N }_{ u }\), is determined by \(u\), in excess of the threshold value.

Rearranging the right-hand side of equation (\(I\)), and moving from the distribution of exceedances over the threshold to the parent distribution \(F\left( x \right) \) defined over ordinary losses, gives:

$$ F\left( x \right) =\left( 1-F\left( u \right) \right) { G }_{ \xi ,\beta }\left( x-u \right) +F\left( u \right) \quad \quad \quad \quad \quad \left( III \right) $$

$$ \forall x>u $$

An approximation of \(F\left( u \right)\) is necessary for the above equation to be applicable, which is the share of observations not exceeding the threshold. The sequence \({ \left( n-{ N }_{ u } \right) }/{ n }\) is the observed proportion of the below-threshold observations and is the natural estimator. By substituting it for \(F\left( u \right)\), and putting equation \(\left( II \right)\) into equation \(\left( III \right)\), we get:

$$ F\left( x \right) =1-\frac { { N }_{ u } }{ n } { \left[ 1+\xi \left( \frac { x-u }{ \beta } \right) \right] }^{ -\frac { 1 }{ \xi } }\quad \quad \quad \quad \left( IV \right) $$

The \(x\)-value in the above equation gives the VaR and can be recovered by inverting:

$$ VaR=u+\frac { \beta }{ \xi } \left\{ { \left[ \frac { n }{ { N }_{ u } } \left( 1-\alpha \right) \right] }^{ -\xi }-1 \right\} $$

Where \(\alpha\) is the VaR confidence level.

Therefore, the VaR plus the mean-excess loss over the VaR is equal the ES. If \(\xi <1\), then the ES is:

$$ ES=\frac { VaR }{ 1-\xi } +\frac { \beta -\xi u }{ 1-\xi } $$

## Estimation

A reasonable threshold \(u\), should be chosen for the estimates to be obtained and it determines the surplus threshold observations, \({ N }_{ u }\). This choice is a weak spot of the POT theory and involves trade-offs.

The ML approaches are the most reliable, perhaps, and involve the following log-likelihood maximization:

$$ |\left( \xi ,\beta \right) =\begin{cases} -m\quad in\quad \beta -\left( 1+\frac { 1 }{ \xi } \right) \sum _{ i=1 }^{ m }{ ln\left( 1+\xi { { X }_{ i } }/{ \beta } \right) } \quad \quad \xi \neq 0 \\ -mln\beta -\left( \frac { 1 }{ \beta } \right) \sum _{ i=1 }^{ m }{ { X }_{ i }\quad \quad \xi =0 } \end{cases} $$

The conditions on which \({ G }_{ \xi ,\beta }\left( X \right) \) are the basis points.

# Refinements to EV Approaches

In the following section, we will have a close look at:

- The conditional EV;
- Dealing with dependent (or non-independent and identically distributed) data; and
- The multivariate EVT.

## Conditional EV

The unconditional extreme value theory (EVT) procedures described are directly applied to the random variable of interest, \(X\). They are very applicable when forecasting VaR or ES over long time periods.

Using EVT to \(X\) adjusted entails a distinction between \(X\) and the random factors driving it to be identified. This conditional or dynamic EVT is most applicable when dealing with short-time horizons and situations of \(X\) as a dynamic structure that can be modeled.

Consider a situation where \(X\) might be governed by a GARCH process, hence the need to for the GARCH process to be accounted for and EVT applied to random innovations driving it.

Accounting for this dynamic structure requires an estimation of the GARCH process and EVT applied to its residuals. The following two-step procedure is suggested:

- Some appropriate econometric method used is used to approximate the GARCH-type process and its residual extracted; and
- The EVT is applied to these residuals, and VaR estimates are then derived while accounting for both the dynamic structure and the residual process.

## Dealing with Dependent (or non-i.i.d.) Data

Some form of time dependency, that usually takes the form of clustering, is exhibited by most financial returns. This clustering is important for the following reasons:

- Violation of a crucial premise on which the earlier results depend on and the statistical implications of clustering not well understood;
- Poor estimator performance can be produced by the data dependence;
- Interpretation of results is altered by clustering.

Dealing with the dependency in data entails:

- Applying GEV distribution to block the maxima; and
- An estimation of the tail of the conditional distribution rather than the unconditional one.

## Multivariate EVT

A multivariate extreme value theory can be applied in the modeling of the tails of the multivariate distributions in a way that is theoretically appropriate. Modeling the dependence structure of the extreme events is the key issue here.

# Practice Questions

1) Assuming that we are given the following parameters based on the empirical values from contracts on futures clearing companies: \(\beta=0.699\),\(\xi=0.149\),\(u= 3.3\),\({ { N }_{ u } }/{ n }=5.211\%\).The expected shortfall (ES) using a VaR at the 95.5% confidence level.

- 1.674, 2.453
- 1.824, 2.420
- 1.453, 2.420
- 1.667, 2.554

The correct answer is **B**.

Recall that:

$$ VaR=u+\frac { \beta }{ \xi } \left\{ { \left[ \frac { n }{ { N }_{ u } } \left( 1-\alpha \right) \right] }^{ -\xi }-1 \right\} $$

$$ ES=\frac { VaR }{ 1-\xi } +\frac { \beta -\xi u }{ 1-\xi } $$

Therefore:

$$ VaR=3.3+\frac { 0.699 }{ 0.149 } \left\{ { \left[ \frac { 1 }{ 0.005211 } \left( 1-0.955 \right) \right] }^{ -0.149 }-1 \right\} $$

$$ = 1.824 $$

$$ ES=\frac { 1.824 }{ 1-0.149 } +\frac { 0.699-0.149\times 3.113 }{ 1-0.149 } $$

$$ =2.420 $$