Validating Bank Holding Companies’ Value-at-Risk Models for Market Risk

Validating Bank Holding Companies’ Value-at-Risk Models for Market Risk

After completing this reading, you should be able to:

  • Describe some important considerations for a bank in assessing the conceptual soundness of a VaR model during the validation process.
  • Explain how to conduct sensitivity analysis for a VaR model, and describe the potential benefits and challenges of performing such an analysis.
  • Describe the challenges a financial institution could face when calculating confidence intervals for VaR.
  • Discuss the challenges in benchmarking VaR models and various approaches proposed to overcome them.

Key Considerations for Assessing the Conceptual Soundness of a VaR Model

Banking guidance on model validation requires a review of a model’s conceptual soundness. This entails determining whether the model’s assumptions, techniques, and data are appropriate. This is often documented through a narrative provided by the validator.

Building a large-scale VaR model involves numerous modeling decisions. Validation involves reviewing these decisions and assessing their realism. A core question is whether the model is “suitable for the purpose it is developed for.” While VaR has other uses, such as capital calculations, its primary purpose is to help firms manage the risk of their positions.

A crucial consideration is the limitations of relying solely on a profit and loss (P&L)-based approach for VaR when the objective is active risk management of positions. Comparing bank VaR models to a simple model based on a GARCH model of actual P&L shows that the GARCH model can sometimes appear to perform better in terms of pure P&L forecasting. However, this comparison overlooks a vital aspect.

Consider a hypothetical hedge fund that appears to generate good returns with low risk based on profitability numbers. This fund employs a dynamic strategy, for example, selling out-of-the-money puts. Traditional risk measures, based on P&L, fail to capture the dynamic nature of this portfolio.

A proper VaR model should incorporate positional information, effectively creating a “pseudo history” of portfolio value changes whenever positions change. This approach more accurately reflects the risk inherent in strategies like selling out-of-the-money puts and reveals when the portfolio becomes riskier due to changes in positions.

For example, imagine a trader selling deep out-of-the-money puts. Initially, the probability of these puts being exercised is low, and the P&L might show consistent small profits. A P&L-based VaR would reflect this low apparent risk. However, if the underlying asset’s price moves significantly against the trader, the puts move closer to being in the money, and the potential loss becomes substantial. A VaR model incorporating positional information would reflect this increased risk before the actual loss occurs, as the change in the portfolio’s value due to the changing market conditions and the resulting change in the value of the short puts is reflected in the pseudo history.

This illustrates a critical aspect of VaR modeling: its fundamental purpose is to demonstrate how risk changes when positions change. If a risk manager instructs a trader to reduce portfolio risk measured by VaR, and the trader reduces their positions, a P&L-based VaR would not reflect this reduction as historical P&L is unaffected. However, a VaR model based on the “pseudo history” of P&L would show a decrease in risk because the positions have changed.

Therefore, VaR models that cannot demonstrate how risk changes with position changes are not “fit for purpose” and fail a crucial conceptual soundness test in the model validation process. This is the central consideration regarding conceptual soundness.

In addition to this core consideration of how the model is used, quantitative tests can complement the assessment of conceptual soundness. These include sensitivity analysis, which shows the effects of data limitations or choices, and the calculation of statistical confidence intervals around VaR estimates. These tools help address key questions about the model’s performance and assess the impact of data issues and estimation errors, further informing how the model can be used for position management.

Conducting Sensitivity Analysis for a VaR Model

Sensitivity analysis for a VaR model examines how changes in inputs, particularly positions, affect the model’s output. This is vital for assessing the impact of simplifications or omissions, especially in constructing the “pseudo history” of portfolio value changes. It helps ensure these simplifications don’t materially distort the VaR result and offers a structured way to prioritize model enhancements by incorporating more risk factors or higher-order terms.

A fundamental property of Value-at-Risk is its linear homogeneity in positions. This is captured by the Euler equation:

$$\text{VaR}(V_{PT}) = \sum_{i \in P} \frac{\partial \text{VaR}}{\partial V_{iT}} V_{iT}$$

\begin{align*} \text{Where:} \\ \text{VaR}(V_{PT}) &: \text{Value-at-Risk of portfolio } P \text{ at time } T. \\ V_{iT} &: \text{Value of the } i\text{-th position in the portfolio at time } T. \\ \frac{\partial \text{VaR}}{\partial V_{iT}} &: \text{Marginal VaR (the partial derivative of VaR with respect to the value of the } i\text{-th position)}. \end{align*}

Each term on the right side is the component VaR, and the derivative is the marginal VaR. The marginal VaR tells us how VaR changes with a small change in a position’s size. It’s linked to the regression coefficient (β<sub>i</sub>) from a regression of the change in the position’s value (ΔV<sub>it</sub>) on the change in the portfolio’s value (ΔV<sub>pt</sub>):

$$\Delta V_{it} = \alpha + \beta_i \Delta V_{pt} + \epsilon_t$$

From this regression, we derive the component VaR:

$$\text{component}_i \, \text{VaR}(V_{PT}) = \text{VaR}(V_{PT}) w_i \beta_i$$

Example (Quantitative): Suppose a portfolio has a VaR of $10 million. One position (i) has a weight (w<sub>i</sub>) of 20% and a beta (β<sub>i</sub>) of 1.5. Then, the component VaR for this position is $10 million * 0.20 * 1.5 = $3 million. This means that this particular position contributes $3 million to the overall portfolio VaR.

Therefore, VaR’s sensitivity to a position depends on its weight and its sensitivity to the overall portfolio value.

Consider another example: A bank is concerned about the valuation of complex derivatives in its portfolio. Sensitivity analysis can reveal how much the overall VaR would change if the valuation model for these derivatives were adjusted, say, by changing a key input parameter like volatility.

A challenge arises when a full time series of position value changes is unavailable, precisely when understanding a position’s impact is most needed. This data scarcity hinders the regression approach. Using proxies might seem like a solution, but proxies can have differing volatility or correlations, potentially understating VaR contributions.

When the regression approach is impractical, alternative methods exist.

Tasche and Hallerbach show, using linear homogeneity, that:

$$\text{VaR}_{PT} = \Delta V_{PT}^* = \sum_{i \in P} \mathbb{E}(\Delta V_{iT} \mid \Delta V_{PT} = \Delta V_{PT}^*)$$

Where VaR is determined by a scenario where ΔV<sub>PT</sub><sup>*</sup> is the change in portfolio value. This portfolio value change can be decomposed into the sum of the expected changes in individual positions, conditional on the overall portfolio value change equaling the VaR.

The component VaR for a position can be estimated by the change in its value on the day that defines the historical simulation VaR scenario. If a position’s data is missing, the component VaR estimate begins observing its loss on that day. Because a single observation is less reliable, averaging ordered observations near the historical simulation VaR can improve the estimate.

Example: Consider a newly acquired asset for which historical data is limited. Instead of a full regression, one could examine how this asset performed during the worst historical portfolio moves used in the VaR calculation. This gives a rough estimate of its contribution to VaR.

This approach addresses the challenge of estimating omitted risks. If sufficient data existed, the risk would likely be included in the VaR model already. With limited data, the risk isn’t in the VaR, making precise estimation of the VaR’s sensitivity to the omission difficult.

This process applies to various valuation omissions. For instance, if a Taylor series expansion is used for valuation with only a few terms, the impact of omitting the next higher-order term can be estimated similarly.

In practice, sensitivity depends on portfolio composition, which can change rapidly. Thus, supervisors often require risks not in VaR to be estimated using both component and standalone VaR to understand the omission’s effect independently of the specific portfolio.

Examining the pseudo history of portfolio value changes tests the model’s sensitivity to changes in assumptions, such as the assumption that omitting certain positions or risk factors is immaterial. This analysis also provides a structured way to prioritize model changes to include more risk factors or higher-order terms.

Benefits of Sensitivity Analysis

  • Identifies key drivers of VaR: Reveals which positions or risk factors most influence VaR.
    • Example: Identifying a specific trading desk’s positions as highly sensitive to interest rate changes.
  • Assesses model robustness: Shows how VaR reacts to changes in inputs, assumptions, or data.
    • Example: Testing the impact of different volatility assumptions on VaR.
  • Quantifies the impact of omissions: Helps understand the potential impact of excluding certain risks or using simplified valuation methods.
    • Example: Estimating the impact of not including certain types of operational risk in the VaR calculation.
  • Prioritizes model improvements: Guides decisions on which risks or factors to include for greater accuracy.
    • Example: Determining that including a higher-order term in a Taylor expansion for option pricing significantly improves VaR accuracy.

Challenges of Sensitivity Analysis:

  • Data limitations: Requires sufficient data, which may be scarce.
    • Example: Limited historical data for newly issued financial instruments.
  • Computational complexity: Can be computationally intensive for large portfolios or complex models.
    • Example: Monte Carlo simulations with a large number of scenarios and risk factors.
  • Interpretation of results: Requires careful interpretation of the sensitivity measures.
    • Example: Distinguishing between statistically significant but economically insignificant changes in VaR.
  • Proxy issues: Using proxies can introduce inaccuracies.
    • Example: Using a broad market index as a proxy for a specific stock with different risk characteristics.

Challenges in Calculating Confidence Intervals for VaR

When estimating a Value-at-Risk (VaR) figure for a portfolio, assessing the accuracy of that estimate is crucial. Statistically, this is done by placing confidence intervals around the VaR estimate, indicating that the true VaR value should fall within this interval a certain percentage of the time. This addresses the estimation risk inherent in VaR calculations. While assessing estimation risk is standard in many contexts, many financial institutions do not routinely calculate confidence intervals for their VaR estimates, despite VaR being a statistical framework. This lack of confidence interval calculation is a notable gap in model validation.

Jorion (1996) described a method for estimating the confidence interval of a VaR model based on the asymptotic standard error of a quantile:

$$SE(VaR_{PT}^{1-c}) = \sqrt{\frac{c(1-c)}{T f(VaR_{PT}^{1-c})^2}} $$

When estimating Value at Risk (VaR), placing confidence intervals around the estimate provides insight into its accuracy and highlights the degree of estimation risk. Confidence intervals show the range within which the true VaR value is expected to fall with a specified probability, such as 90\% or 95\%.

Where:

  • \(SE(VaR_{PT}^{1-c})\): standard error of the VaR estimate 
  • \(c\): level of confidence 
  • \(T\): sample size (number of observations
  • \(VaR_{PT}^{1-c}\): probability density function (PDF) evaluated at the VaR estimate.

While the formula appears straightforward with known VaR, T, and c, the main challenge lies in evaluating the PDF at the VaR estimate. Typically, a distributional assumption is imposed on the changes in portfolio values to calculate confidence intervals. However, financial returns are known to be non-normal, making the normal distribution assumption inappropriate.

Furthermore, relying solely on the quantile estimate for the confidence interval might be insufficient. It may be preferable to base the confidence interval on the variance of the distribution of pseudo returns. The key challenge in applying Jorion’s formula is determining f(), the PDF.

Alternative, non-parametric approaches exist in academic literature. Let’s look at a few.

Order Statistics (Dowd 2006)

Order statistics provide a non-parametric method for constructing confidence intervals for VaR estimates. This approach relies on the ordered values of the historical data (the pseudo-history of portfolio value changes) to directly estimate the confidence interval, without requiring strong distributional assumptions. However, like other methods, it presents its own set of challenges that financial institutions must consider when calculating VaR confidence intervals.

The basic idea behind order statistics is to arrange the historical data points in ascending order.
If we have \(T\) historical observations of portfolio value changes, we order them such that \(\Delta V^{T}_1 \leq \Delta V^{T}_2 \leq \ldots \leq \Delta V^{T}_T.\) These ordered values are the order statistics.

The key concept for constructing confidence intervals is the probability that at least \(r\) out of \(T\) observations do not exceed a certain value, \(\Delta V\). This probability is given by the binomial distribution:

$$
G_r(\Delta V) = \sum_{j=r}^{T} \binom{T}{j} [F(\Delta V)]^j [1 – F(\Delta V)]^{T-j}
$$

Where:

  • \(G_r(\Delta V)\): The probability that at least \(r\) observations are less than or equal to \(\Delta V\).
  • \(\binom{T}{j}\): The binomial coefficient, calculated as \(T! / (j! \cdot (T-j)!)\).
  • \(F(\Delta V)\): The cumulative distribution function (CDF) of the portfolio value changes.

To construct a confidence interval for VaR, we use this binomial distribution to find two values,
\(\Delta V_{\text{lower}}\) and \(\Delta V_{\text{upper}}\), such that the probability of the true VaR falling between these values corresponds to our desired confidence level.

Here’s how it works in practice:

1. Ordering the data: The first step is to order the historical portfolio value changes.
This is straightforward but can become computationally intensive for very large datasets.

2. Determining \(r\) for the Confidence bounds: For a given confidence level (e.g., 90\%),
we need to find the appropriate values of \(r\) to determine the lower and upper bounds of the confidence interval.
For a 90\% confidence interval, we would typically set:

  • Lower Bound: \(r = 0.05 *T\) (approximately)
  • Upper Bound: r = \(0.95*T\) (approximately)

These values of \(r\) correspond to the 5th and 95th percentiles of the ordered data.

3. \textbf{Finding the Confidence Interval}: Once we have the lower and upper values of \(r\), we need to solve the binomial distribution equation for \(\Delta V_{\text{lower}}\) and \(\Delta V_{\text{upper}}\), respectively.
This involves finding the values of \(\Delta V\) such that:

  • \(G_r(\Delta V_{\text{lower}}) = 0.05\)
  • \(G_r(\Delta V_{\text{upper}}) = 0.95\)

These equations are typically solved numerically.

The order statistics approach poses several challenges:

  • Discrete nature of the Binomial distribution: The binomial distribution is discrete, meaning that \(r\) must be an integer. This can lead to some imprecision in the confidence level, especially for small sample sizes. For example, if \(T = 100\) and we want a 95\% confidence interval, the upper value of \(r\) would be 95. However, if \(T = 50\), the lower value of \(r\) would be 47.5, which needs to be rounded to either 47 or 48, impacting the exact confidence level achieved. This “granularity” issue becomes less problematic with larger sample sizes.
  • Data requirements and sensitivity to extreme events: Like historical simulation, order statistics rely entirely on the historical data. If the historical data does not contain sufficient extreme events, the confidence intervals will not adequately capture tail risk. This is a significant limitation, especially when dealing with rare but potentially catastrophic events. The confidence interval will be based on the observed extremes in the data, even if those extremes are not truly representative of the tail of the underlying distribution.
  • Assumption of independence: The standard application of order statistics assumes that the data points are independent and identically distributed (i.i.d.).
    If there is autocorrelation or other forms of dependence in the returns, the confidence intervals derived using order statistics will be inaccurate.
    This is a significant concern in financial time series, which often exhibit volatility clustering and other forms of dependence.
  • Computational challenges for numerical solutions: Solving the binomial distribution equations numerically to find \(\Delta V_{\text{lower}}\) and \(\Delta V_{\text{upper}}\) can be computationally intensive, especially for very large datasets. Although generally less computationally expensive than bootstrap methods, this can still be a consideration.

Bootstrap Techniques (Christoffersen and Goncalves 2005)

Bootstrap techniques offer a robust, non-parametric approach to estimating confidence intervals for VaR, particularly useful when distributional assumptions are questionable. However, even these powerful methods come with their own set of challenges, directly relevant to the learning objective of describing the difficulties financial institutions face when calculating VaR confidence intervals.

The core idea is to treat the observed historical data as a proxy for the true distribution and resample from it with replacement to create numerous “bootstrap samples.” This allows us to empirically approximate the sampling distribution of the VaR estimator, crucial for constructing confidence intervals.

Here’s how bootstrap techniques are applied to different VaR estimation methods, with a focus on the challenges related to confidence interval calculation:

Historical Simulation VaR

The bootstrap procedure for historical simulation is conceptually simple: resample the historical returns with replacement, calculate the VaR on each bootstrapped sample, and use the distribution of these bootstrapped VaRs to form the confidence interval. However, challenges arise:

    • Data limitations and representativeness: The bootstrap relies heavily on the assumption that the historical data is representative of future market behavior. If the historical data does not contain sufficient extreme events or if market conditions have fundamentally changed, the bootstrapped samples (and therefore the confidence intervals) will not accurately reflect the true uncertainty surrounding the VaR estimate. This is a significant challenge, especially during periods of structural breaks in markets. The confidence intervals will be too narrow, underestimating the true risk.
    • Block Bootstrap for Dependence: The standard bootstrap assumes independence of returns. If returns exhibit autocorrelation (dependence over time), the standard bootstrap will produce inaccurate confidence intervals. While a “block bootstrap” can be used to address this by resampling blocks of consecutive observations, choosing the appropriate block size is a challenge. Too small a block size fails to capture the dependence, while too large a block size reduces the effective sample size and increases the variance of the bootstrap estimates.
    • Computational burden: While computationally less demanding than some other bootstrap applications, generating a large number of bootstrap samples (often thousands) can still be time-consuming for large portfolios, especially if the VaR calculation itself is complex. This can be a practical constraint for financial institutions with limited computational resources or tight deadlines.

GARCH VaR

The bootstrap for GARCH VaR addresses time-varying volatility by bootstrapping standardized residuals and re-estimating the GARCH model on each bootstrap sample. This is essential for capturing the uncertainty in the model parameters. However, the following challenges remain:

    • Model misspecification: The bootstrap assumes that the chosen GARCH model is the “true” model. If the GARCH model is misspecified (e.g., the wrong order of the GARCH process is chosen, or the distribution of the innovations is incorrectly assumed), the bootstrapped confidence intervals will be inaccurate. This is a crucial challenge, as it’s often difficult to definitively determine the true underlying data-generating process.
    • Computational intensity: Re-estimating the GARCH model for each bootstrap sample adds significantly to the computational burden. This can be a major challenge for large portfolios or complex GARCH specifications, making the process computationally infeasible in some cases.
    • Convergence issues: GARCH models can sometimes have convergence problems during estimation, especially with bootstrapped data that might contain unusual patterns. This can lead to unreliable bootstrap VaR estimates and thus inaccurate confidence intervals.

Filtered Historical Simulation

FHS combines the strengths of both historical simulation and GARCH modeling. The bootstrap for FHS avoids distributional assumptions about the innovations. However:

      • Challenges related to the GARCH filter: FHS still relies on a GARCH model to filter the returns and estimate conditional volatilities. Therefore, it inherits the challenges associated with GARCH model misspecification and computational intensity. Any issues with the GARCH filter will directly impact the accuracy of the bootstrapped confidence intervals.
      • Data limitations and extreme events: Like historical simulation, FHS still relies on the historical data. If the historical data does not adequately represent extreme market events, the bootstrapped confidence intervals will not fully capture tail risk. This is particularly problematic during periods of rapid market changes or crises.

Backtesting

Backtesting is a fundamental aspect of validating Value-at-Risk (VaR) models, involving the comparison of predicted VaR values with actual portfolio profit and loss (P&L). This process helps assess the accuracy and reliability of the VaR model.

Evolution of Backtesting Practices

Since the Basel Committee introduced VaR models for regulatory capital in 1996, backtesting has been a regulatory requirement. Early backtesting efforts revealed shortcomings in banks’ VaR models, with losses exceeding VaR occurring less frequently than expected but often clustered together, indicating dependence in the exceptions. Studies by Perignon and Smith (2006, 2010a, 2010b) and Szerszen and O’Brien (2017) highlighted these issues, with the latter finding that bank VaR models tended to be conservative both before and after the financial crisis, but not during it.

Initially, backtesting compared ex-ante VaR forecasts with ex-post actual P&L. However, this approach faced challenges because the reported P&L often included components like fee income and commissions, which are not typically part of the VaR model’s scope. This inclusion could lead to misleadingly conservative backtesting results, as noted by Fresard, Perignon, and Wilhelmsson (2011). More recently, regulatory changes, such as Basel 2.5 trading requirements, have led to improvements in backtesting practices. US bank holding companies now often report comparisons of VaR to the P&L that would have been realized if the portfolio had been held constant over the one-day horizon. This refined approach provides a more accurate assessment of the VaR model’s performance by focusing on the components of P&L that the model is designed to capture.

Core Principles of Backtesting

The basic principle of backtesting is that a properly calibrated VaR model should produce exceptions (instances where losses exceed VaR) at a rate consistent with its confidence level. For example, a 99% VaR model should, on average, experience exceptions approximately 1% of the time. This principle forms the basis of the unconditional coverage test, which checks if the overall number of exceptions is statistically in line with the expected number.

However, simply having the correct number of exceptions on average is not sufficient. The timing of the exceptions is also important. If exceptions cluster together in time, it suggests that the model is not adequately capturing the dynamics of market risk. This is where the concept of conditional coverage comes in. The conditional coverage test, introduced by Christoffersen (1998), examines whether exceptions are independent over time. A properly specified model should have exceptions that occur randomly, with no predictable patterns or clustering.

Testing Methodologies

Backtesting involves counting the number of times that actual losses exceed the VaR threshold. This is typically done by creating an indicator variable that takes a value of 1 when an exception occurs and 0 otherwise.

The unconditional coverage test assesses whether the probability of an exception is equal to 1 minus the VaR’s coverage rate. It doesn’t consider any information available at the time of the forecast. The conditional coverage test, on the other hand, examines whether the probability of an exception is 1 minus the coverage rate given all information available at the time of the forecast.

Christoffersen (1998) introduced the concept of modeling the dependence structure of exceptions using a first-order Markov chain, which considers the probability of an exception today given whether an exception occurred yesterday. This led to the development of tests for independence and conditional coverage, which is a joint test of both correct coverage and independence.

More recently, methods have been developed to test for higher-order dependence (dependence over more than just the previous day) in exceptions, though these can be computationally challenging. Other approaches, such as the dynamic quantile (DQ) test by Engle and Manganelli (2004) and its logistic regression-based counterpart (LDQ), offer more flexible ways to test for dependence and incorporate additional information into the backtesting process. These regression-based approaches are often easier to implement and have been found to have good statistical power. Quantile regression-based tests have also been proposed, which directly regress the VaR on the actual portfolio value changes.

Practical Considerations

Several practical considerations are important in backtesting:

  • Significance Level: The choice of significance level for the statistical tests used in backtesting involves a trade-off between the risk of rejecting a correct model (Type I error) and the risk of failing to reject an incorrect model (Type II error). In risk management, the cost of failing to reject an incorrect model (underestimating risk) is often considered higher, suggesting that lower confidence levels may be appropriate.
  • Small Sample Size: Backtesting often involves relatively small sample sizes and even fewer exceptions. This can lead to low statistical power, making it difficult to reject incorrect models. To address this, some researchers advocate using Monte Carlo simulated p-values, as suggested by Dufour (2006), instead of relying on asymptotic distributions.

Challenges in Benchmarking VaR Models

Benchmarking, the comparison of a bank’s VaR model against an alternative model, is often the most neglected aspect of market risk model validation. While parallel runs of new and old models are common for initial checks, formal comparisons are rare due to the resource-intensive nature of building even a single VaR model. Constructing and maintaining two models significantly increases the burden. However, during model replacement or significant upgrades, a window of opportunity exists for cost-effective benchmarking.

The most common benchmarking practice involves simply plotting the VaR outputs of two models over time. This visual comparison provides limited insight, typically only revealing whether one model is more conservative than the other or if they exhibit similar behavior. This lack of formal statistical testing weakens the validation process.

Challenges in Benchmarking VaR Models:

Two primary challenges hinder the application of rigorous statistical tests for comparing VaR models:

  1. Non-IID errors due to changing trading portfolios: Trading portfolios change frequently, causing the errors (differences between predicted and realized P&L) to violate the assumption of independence and identical distribution (i.i.d.). This is particularly problematic for regression-based tests, which rely on the i.i.d. assumption. While Christoffersen et al. (2001) proposed methods to address this issue, their approach is limited to location-scale models, where the quantile is a linear function of volatility. This restriction limits the applicability of their methods to a subset of VaR models.
  2. Lack of available alternative models: A more practical challenge is the scarcity of readily available alternative VaR models for comparison. Banks rarely have two fully operational VaR models running simultaneously. Berkowitz and O’Brien (2002) addressed this by using a GARCH(1,1) VaR model estimated on the bank’s trading P&L as a benchmark. This data is readily available to all banks. While a P&L-based VaR model is generally not suitable for active risk management (as discussed in the section on conceptual soundness), it provides a convenient and often challenging benchmark for comparison.

Approaches to Overcome Benchmarking Challenges:

Despite these challenges, several approaches have been proposed to conduct more formal benchmarking of VaR models:

  1. Loss function-based comparisons: Diebold and Mariano (1995) laid the groundwork for comparing forecasts by emphasizing the importance of considering the forecaster’s specific loss function. This means that the evaluation of a VaR model should depend on the consequences of its predictions being incorrect. Different stakeholders may have different loss functions. For example, Lopez (1996) introduced the regulatory loss function, which focuses on the cost of exceeding VaR (exceptions) as this is what determines capital requirements. This loss function penalizes conservatism less than inaccurate underestimation of risk. This is in contrast to a more general loss function such as the “check” function used in quantile regressions, which penalizes both over and under estimation.
  2. Sign test: Sarma et al. (2003) applied the sign test, derived from Diebold and Mariano, to evaluate VaR models. The sign test compares the median of the loss differential between two competing models. The loss differential between models \(i\) and \(j\) is defined as:

    $$
    z_t = l_t^i – l_t^j
    $$

    Where \(l\) represents the loss function for the respective model. The sign test is then calculated and compared to a standard normal distribution:

    $$
    \frac{\sum_{t+1}^{t+N} 1(z_t > 0) – 0.5N}{\sqrt{0.25N}} \sim N(0,1)
    $$

    Rejecting the null hypothesis indicates that one model is significantly better than the other under the chosen loss function. The sign test is easy to implement, making it a practical tool for banks to formalize their model comparisons.

  3. Comparison against a p&l-based var: As mentioned earlier, Berkowitz and O’Brien (2002) used a GARCH(1,1) model fitted to the bank’s trading P&L as a benchmark. This allows for a direct comparison between the bank’s “positional VaR” (based on positions and market data) and a “P&L VaR”. Empirical studies using this approach have shown that banks’ positional VaR models tend to be more conservative than P&L VaR models, especially when evaluated using the regulatory loss function. When using accuracy-based comparison metrics such as the check loss function, P&L VaR often outperforms positional VaR. This conservatism is often attributed to regulatory oversight, as discussed by Perignon, Deng, and Wang (2008).

In conclusion, benchmarking VaR models is essential for robust model validation but faces challenges due to non-i.i.d. errors and the lack of readily available alternative models. However, methods like loss function-based comparisons, the sign test, and comparison against a P&L-based VaR provide practical ways to conduct more formal and informative benchmarking, improving the overall validation process.

Question

A bank’s model validation team is assessing the conceptual soundness of its Value-at-Risk (VaR) model. Which of the following is the MOST crucial consideration regarding conceptual soundness?

  1. The model’s ability to demonstrate how risk changes with position changes.
  2. The model’s adherence to a specific statistical distribution for market returns.
  3. The model’s ability to accurately backtest against historical profit and loss (P&L) data.
  4. The model’s inclusion of all possible risk factors.

The correct answer is A.

Conceptual soundness focuses on whether the model appropriately captures the relationship between risk and portfolio positions, which is fundamental to risk management. A sound VaR model should accurately reflect how risk varies as positions are adjusted.

B is incorrect: While adherence to a statistical distribution is important for the mathematical framework of the model, it is not the most critical aspect of conceptual soundness. Different VaR models may use various distributions, and reliance on a specific distribution might limit flexibility.

C is incorrect: Backtesting is an important aspect of model validation but pertains more to the performance testing of the model rather than its conceptual soundness. Conceptual soundness evaluates the logical framework and assumptions of the model, not its historical accuracy.

D is incorrect: Including all possible risk factors is often impractical and can lead to overfitting. Conceptual soundness is about capturing the material risks relevant to the portfolio and demonstrating how they change with positions, not exhaustively including every possible factor.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.