Measuring Return, Volatility, and Correlation

Measuring Return, Volatility, and Correlation

After completing this reading, you should be able to:

  • Calculate, distinguish, and convert between simple and continuously compounded returns.
  • Define and distinguish between volatility, variance rate, and implied volatility.
  • Describe how the first two moments may be insufficient to describe non-normal distributions.
  • Explain how the Jarque-Bera test is used to determine whether returns are normally distributed.
  • Describe the power law and its use for non-normal distributions.
  • Define correlation and covariance and differentiate between correlation and dependence.
  • Describe properties of correlations between normally distributed variables when using a one-factor model.

Measurement of Returns

A return is a profit from an investment. Two common methods used to measure returns include:

  1. Simple Returns Method
  2. Continuously Compounded Returns Method.

The Simple Returns Method

Denoted \(\text R_{\text t}\) the simple return is given by:

$$ \text R_{\text t}=\cfrac {\text P_{\text t}-\text P_{\text t-1}}{\text P_{\text t-1}} $$

Where

\(\text P_{\text t}\)=Price of an asset at time t (current time)

\(\text P_{\text t-1}\)=Price of an asset at time t-1 (past time)

The time scale is arbitrary or shorter period such monthly or quarterly. Under the simple returns method, the returns over multiple periods is the product of the simple returns in each period. Mathematically given by:

$$ \begin{align*} 1+\text R_{\text T}& = \prod_{\text t=\text i}^{\text T} (1+\text R_{\text t}) \\ ⇒\text R_{\text T} & = \left(\prod_{\text t=\text i}^{\text T} (1+\text R_{\text t}) \right)-1 \\ \end{align*} $$

Example: Calculating the Simple Returns

Consider the following data.

$$ \begin{array}{c|c} \textbf{Time} & \textbf{Price} \\ \hline {0} & {100} \\ \hline {1} & {98.65} \\ \hline {2} & {98.50} \\ \hline {3} & {97.50} \\ \hline {4} & {95.67} \\ \hline {5} & {96.54} \\ \end{array} $$

Calculate the simple return based on the data for all periods.

Solution

We need to calculate the simple return over multiple periods which is given by:

$$ 1+\text R_{\text T}= \prod_{\text t=\text i}^{\text T} (1+\text R_{\text t}) $$

Consider the following table:

$$ \begin{array}{c|c} \textbf{Time} & \textbf{Price} & \bf{\text R_{\text t}} & \bf{1+\text R_{\text t}} \\ \hline {0} & {100} & {-} & {-} \\ \hline {1} & {98.65} & {-0.0135} & {0.9865} \\ \hline {2} & {98.50} & {-0.00152} & {0.998479} \\ \hline {3} & {97.50} & {-0.01015} & {0.989848} \\ \hline {4} & {95.67} & {-0.01877} & {0.981231} \\ \hline {5} & {96.54} & {0.009094} & {1.009094} \\ \hline {} & {} & \textbf{Product} & \bf{0.9654} \\ \end{array} $$

Note that

$$ \text R_{\text t}=\cfrac {\text P_{\text t}-\text P_{\text t-1}}{\text P_{\text t-1}} $$

So that that

$$ \text R_1=\cfrac {\text P_1-\text P_0}{\text P_0} =\cfrac {98.65-100}{100}=-0.0135 $$

And

$$ \text R_2=\cfrac {\text P_2-\text P_1}{\text P_1} =\cfrac {98.50-98.65}{98.65}=-0.00152 $$

And so on.

Also note that:

$$ \prod_{\text t=1}^5 (1+ \text R_{\text t} )=0.9865×0.998479×…×1.009094=0.9654 $$

So,

$$ 1+\text R_{\text T}=0.9654 \Rightarrow \text R_{\text T}=-0.0346=-3.46\% $$

Continuously Compounded Returns Method

Denoted by \(\text r_{\text t}\). Compounded returns is the difference between the natural logarithm of the price of assets at time t and t-1. It is given by:

$$ \text r_{\text t}=\text {ln } \text P_{\text t}-\text {ln} \text P_{\text t-1} $$

Computing the compounded returns over multiple periods is easy because it is just the sum of returns of each period. That is:

$$ \text r_{\text T}=\sum_{\text t=1}^{\text T} \text r_{\text t} $$

Example: Calculating Continuously Compounded Returns

Consider the following data.

$$ \begin{array}{c|c} \textbf{Time} & \textbf{Price} \\ \hline {0} & {100} \\ \hline {1} & {98.65} \\ \hline {2} & {98.50} \\ \hline {3} & {97.50} \\ \hline {4} & {95.67} \\ \hline {5} & {96.54} \\ \end{array} $$

What is the continuously compounded return based on the data over all periods?

Solution

The continuously compounded return over the multiple periods is given by

$$ \text r_{\text T}=\sum_{\text t=1}^{\text T} \text r_{\text t} $$

Where

$$ \text r_{\text t}=\text{ln }⁡ \text P_{\text t}-\text{ln }⁡\text P_{\text t-1} $$

Consider the following table:

$$ \begin{array}{c|c|c} \textbf{Time} & \textbf{Price} & \bf{\text r_{\text t}=\text{ln }⁡ \text P_{\text t}-\text{ln }⁡\text P_{\text t-1}} \\ \hline {0} & {100} & {-} \\ \hline {1} & {98.65} & {-0.01359} \\ \hline {2} & {98.50} & {-0.00152} \\ \hline {3} & {97.50} & {0.0102} \\ \hline {4} & {95.67} & {-0.01895} \\ \hline {5} & {96.54} & {0.009053} \\ \hline {} & \textbf{Sum} & \bf{-0.03521} \\ \end{array} $$

Note that

$$ \begin{align*} \text r_1 & =\text {ln⁡ } \text P_1 -\text {ln } \text P_0=\text {ln }⁡98.65-\text {ln }⁡100=-0.01359 \\ \text r_2 & =\text {ln⁡ } \text P_2 -\text {ln⁡ } \text P_1 =\text {ln }⁡98.50-\text {ln }⁡98.65=-0.00152 \\ \end{align*} $$

And so on.

Also,

$$ \text r_{\text T}=\sum_{\text t=1}^{5} \text r_{\text t}=-0.01359+-0.00152+⋯+0.009053=-0.03521=-3.521\% $$

Relationship between the Compounded and Simple Returns

Intuitively, the compounded returns is an approximation of the simple return. The approximation, however, is prone to significant error over longer time horizons, and thus compounded returns are suitable for short time horizons.

The relationship between the compounded returns and the simple returns is given by the formula:

$$ 1+\text R_{\text t}=\text e^{\text r_{\text t} } $$

Example: Conversion Between the Simple and Compound Returns

What is the equivalent simple return for a 30% continuously compounded return?

Solution.

Using the formula:

$$ \begin{align*} 1+\text R_{\text t} & =\text e^{\text r_{\text t} } \\ \Rightarrow \text R_{\text t} & =\text e^{\text r_{\text t} }-1=\text e^0.3-1=0.3499=34.99\% \\ \end{align*} $$

It is worth noting that compound returns are always less than the simple return. Moreover, simple returns are never less than -100%, unlike compound returns, which can be less than -100%. For instance, the equivalent compound return for -65% simple return is:

$$ {\text r_{\text t} }=\text{ln }⁡(1-0.65)=-104.98\% $$

Measurement of Volatility and Risk

The volatility of a variable denoted as \(\sigma\) is the standard deviation of returns. The standard deviation of returns measures the volatility of the return over the time period at which it is captured.

Consider the linear scaling of the mean and variance over the period at which the returns are measured. The model is given by:

$$ \text r_{\text t}=\mu+\sigma \text e_{\text t} $$

Where \(\text E(\text r_{\text t} )=\mu\) is the mean of the return, \(\text V(\text r_{\text t} )=\sigma^2\) is the variance of the return. \( \text e_{\text t}\) is the shocks, which is assumed to be iid distributed with the mean 0 and variance of 1. Moreover, the return is assumed to be also iid and normally distributed with the mean \(\mu^2\) i.e. \(\text r_{\text t} {\sim}^{iid} \text N(\mu,\sigma^2)\). Note the shock can also be expressed as \(\epsilon_{\text t}=\sigma \text e_{\text t}\) where: \(\epsilon_{\text t}\sim \text N(0,\sigma^2)\).

Assume that we wish to calculate the returns under this model for 10 working days (two weeks). Since the model deals with the compound returns, we have:

$$ \sum_{\text i=1}^{10} {\text r}_{\text t+\text i }=\sum_{\text i=1}^{10} (\mu+\sigma \text e_{\text t+\text i})=10\mu +\sigma \sum_{\text i=1}^{10} \text e_{\text t+\text i} $$

So that the mean of the return over the 10 days is \(10 \mu\) and the variance also is \(10\sigma^2\) since \(\text e_{\text t}\) is iid. The volatility of the return is, therefore:

$$ \sqrt{10}\sigma $$

Therefore, the variance and the mean of return are scaled to the holding period while the volatility is scaled to the square root of the holding period. This feature allows us to convert volatility between different periods.

For instance, given daily volatility, we would to have yearly (annualized) volatility by scaling it by \(\sqrt{252}\). That is:

$$ \sigma_{\text{annual}}=\sqrt{252×\sigma_{\text{daily}}^2} $$

Note that 252 is the conventional number of trading days in a year in most markets.

Example: Calculating the Annualized Volatility

The monthly volatility of the price of gold is 4% in a given year. What is the annualized volatility of the gold price?

Solution

Using the scaling analogy, the corresponding annualized volatility is given by:

$$ \sigma_{\text{annual}}=\sqrt{12×0.04^2 }=13.86\% $$

Variance Rate

The variance rate, also termed as variance, is the square of volatility. Similar to mean, variance rate is linear to holding period and hence can be converted between periods. For instance, an annual variance rate from a monthly variance rate is given by

$$ \sigma_{\text{annual}}^2=12×\sigma_{\text{monthly}}^2 $$

The variance of returns can be approximated as:

$$ {\hat \sigma}^2=\cfrac {1}{\text T} \sum_{\text t-1}^{\text T}(\text r_{\text t}-{\hat \mu})^2 $$

Where \({\hat \mu}\) is the sample mean of return, and T is the sample size.

Example: Calculating the Variance of Return

The investment returns of a certain entity for five consecutive days is 6%, 5%, 8%,10% and 11%. What is the variance estimator of returns?

Solution

We start by calculating the sample mean:

$$ \hat \mu=\cfrac {1}{5} (0.06+0.05+0.08+0.10+0.11)=0.08 $$

So that the variance estimator is:

$$ {\hat \sigma}^2=\cfrac {1}{\text T} \sum_{\text t-1}^{\text T}(\text r_{\text t}-\hat \mu)^2 $$

$$ =\cfrac {1}{5} \left[(0.06-008)^2+(0.05-0.08)^2+(0.08-0.08)^2+(0.10-0.08)^2+(0.11-0.08)^2 \right]=0.00052=0.052\% $$

The Implied Volatility

Implied volatility is an alternative measure of volatility that is constructed using options valuation. The options (both put and call) have payouts that are nonlinear functions of the price of the underlying asset. For instance, the payout from the put option is given by:

$$ \text{max}⁡(\text K-\text P_{\text T}) $$

where \(\text P_{\text T}\) is the price of the underlying asset, K being the strike price, and T is the maturity period. Therefore, the price payout from an option is sensitive to the variance of the return on the asset.

The Black-Scholes-Merton model is commonly used for option pricing valuation. The model relates the price of an option to the risk-free rate of interest, the current price of the underlying asset, the strike price, time to maturity, and the variance of return.

For instance, the price of the call option can be denoted by:

$$ \text C_{\text t}=\text f(\text r_{\text f},\text T,\text P_{\text t},\sigma^2) $$

Where:

\(\text r_{\text f}\)= Risk-free rate of interest

T=Time to maturity

\(\text P_{\text t}\)=Current price of the underlying asset

\(\sigma^2\)=Variance of the return

The implied volatility σ relates the price of an option with the other three parameters. The implied volatility is an annualized value and does not need to be converted further.

The volatility index (VIX) measures the volatility in the S&P 500 over the coming 30 calendar days. VIX is constructed from a variety of options with different strike prices. VIX applies to a large variety of assets such as gold, but it is only applicable to highly liquid derivative markets and thus not applicable to most financial assets.

The Financial Returns Distribution

The financial returns are assumed to follow a normal distribution. Typically, a normal distribution is thinned-tailed, does not have skewness and excess kurtosis. The assumption of the normal distribution is sometimes not valid because a lot of return series are both skewed and mostly heavy-tailed.

To determine whether it is appropriate to assume that the asset returns are normally distributed, we use the Jarque-Bera test.

The Jarque-Bera Test

Jarque-Bera test tests whether the skewness and kurtosis of returns are compatible with that of normal distribution.

Denoting the skewness by S and kurtosis by k, the hypothesis statement of the Jarque-Bera test is stated as:

\(\text H_0:\text S=0\) and k=3 (the returns are normally distributed)

vs

\(\text H_1:\text S\neq 0\) and \(k\neq 3\) (the returns are not normally distributed)

The test statistic (JB) is given by:

$$ { JB }=(T-1)\left( \cfrac { \hat { \text S } ^{ 2 } }{ 6 } +\cfrac { ({ \hat { \text k } }-3)^{ 2 } }{ 24 } \right) $$

Where T is the sample size.

The basis of the test is that, under normal distribution, the skewness is asymptotically normally distributed with the variance of 6 so that the variable \(\frac {{\hat {\text S}}^2}{6}\) is chi-squared distributed with one degree of freedom \((\chi_1^2)\) and kurtosis is also asymptotically normally distributed with the mean of 3 and variance of 24 so that \(\cfrac { ({ \hat { \text k } }-3)^{ 2 } }{ 24 }\) is also \((\chi_1^2)\) variable. Coagulating these arguments given that these variables are independent, then:

$$ \text{JB}\sim \chi_2^2 $$

The Decision Rule of the JB Test

When the test statistic is greater than the critical value, then the null hypothesis is rejected. Otherwise, the alternative hypothesis is true. We use the \(\chi_2^2\) table with the appropriate degrees of freedom:

$$\textbf{ Chi-square Distribution Table}$$

$$\begin{array}{l|ccccccccc}\hline \textbf{d.f.} & .995 & .99 & .975 & .95 & .9 & .1 & .05 & .025 & .01 \\ \hline 1 & 0.00 & 0.00 & 0.00 & 0.00 & 0.02 & 2.71 & 3.84 & 5.02 & 6.63 \\ 2 & 0.01 & 0.02 & 0.05 & 0.10 & 0.21 & 4.61 & 5.99 & 7.38 & 9.21 \\ 3 & 0.07 & 0.11 & 0.22 & 0.35 & 0.58 & 6.25 & 7.81 & 9.35 & 11.34 \\ 4 & 0.21 & 0.30 & 0.48 & 0.71 & 1.06 & 7.78 & 9.49 & 11.14 & 13.28 \\ 5 & 0.41 & 0.55 & 0.83 & 1.15 & 1.61 & 9.24 & 11.07 & 12.83 & 15.09 \\ 6 & 0.68 & 0.87 & 1.24 & 1.64 & 2.20 & 10.64 & 12.59 & 14.45 & 16.81 \\ 7 & 0.99 & 1.24 & 1.69 & 2.17 & 2.83 & 12.02 & 14.07 & 16.01 & 18.48 \\ 8 & 1.34 & 1.65 & 2.18 & 2.73 & 3.49 & 13.36 & 15.51 & 17.53 & 20.09 \\ 9 & 1.73 & 2.09 & 2.70 & 3.33 & 4.17 & 14.68 & 16.92 & 19.02 & 21.67 \\ 10 & 2.16 & 2.56 & 3.25 & 3.94 & 4.87 & 15.99 & 18.31 & 20.48 & 23.21 \\ 11 & 2.60 & 3.05 & 3.82 & 4.57 & 5.58 & 17.28 & 19.68 & 21.92 & 24.72 \\ 12 & 3.07 & 3.57 & 4.40 & 5.23 & 6.30 & 18.55 & 21.03 & 23.34 & 26.22 \end{array}$$

For example, the critical value of a \(\chi_2^2\) at a 5% confidence level is 5.991, and thus, if the computed test statistic is greater than 5.991, the null hypothesis is rejected.

Example: Conducting a JB Test

Investment return is such that it has a skewness of 0.75 and a kurtosis of 3.15. If the sample size is 125, what is the JB test statistic? Does the data qualify to be normally distributed at a 95% confidence level?

Solution

The test statistic is given by:

$$ \text{JB}=(\text T-1) \left(\cfrac {{\hat {\text S}}^2}{6}+\cfrac {({\hat {\text k}}-3)^2}{24}\right)=(125-1)\left(\cfrac {0.75^2}{6}+\cfrac {(3.15-3)^2}{24} \right)=11.74 $$

Since the test statistic is greater than the 5% critical value (5.991), then the null hypothesis that the data is normally distributed is rejected.

The Power Law

The power law is an alternative method of determining whether the returns are normal or not by studying the tails. For a normal distribution, the tail is thinned, such that the probability of any return greater than \(\text k\sigma\) decreases sharply as k increases. Other distributions are such that their tails decrease relatively slowly, given a large deviation.

The power law tails are such that, the probability of observing a value greater than a given value x defined as:

$$ \text P(\text X> \text x)=\text {kx}^{-\alpha } $$

Where k and \(\alpha\) are constants.

The tail behavior of distributions is effectively compared by considering the natural log (ln(P(X>x))) of the tail probability. From the above equation:

$$ \text{ln⁡ } \text{prob}(\text X > \text x)=\text {ln ⁡k}-\alpha \text{ln ⁡x} $$

To test whether the above equation holds, a graph of \(\text{ln⁡ } \text{prob}(\text X > \text x)\) plotted against ln⁡x.

For a normal distribution, the plot is quadratic in x, and hence it decays quickly, meaning that they have thinned tails. For other distributions such as Student’s t distribution, the plots are linear to x, and thus, the tails decay at a slow rate, and hence they have fatter tails (produce values that are far from the mean).

Power LawDependence and Correlation of Random Variables.

The two random variables X and Y are said to be independent if their joint density function is equal to the product of their marginal distributions. Formally stated:

$$ \text f_{\text X,\text Y}=\text f_{\text X} (\text x).\text f_{\text Y} (\text y) $$

Otherwise, the random variables are said to be dependent. The dependence of random variables can be linear or nonlinear.

Linear vs Nonlinear DependenceThe linear relationship of the random variables is measured using the correlation estimator called Pearson’s correlation.

Recall that given the linear equation:

$$ \text Y_{\text i}=\alpha+\beta_{\text i} \text X_{\text i}+\epsilon_{\text i} $$

The slope \(\beta\) is related to the correlation coefficient \(\rho\). That is, if \(\beta=0\), then the random variables \(\text X_{\text i}\) and \(\text Y_{\text i}\) are uncorrelated. Otherwise, \(\beta \neq 0\). Infact, if the variances of the random variables are engineered such that they are both equal to unity \((\sigma_{\text X}^2=\sigma_{\text Y}^2=1)\), the slope of the regression equation is equal to the correlation coefficient \((\beta=\rho)\). Thus, the regression equation reflects how the correlation measures the linear dependence.

Nonlinear dependence is complex and thus cannot be summarized using a single statistic.

Measures of Correlation

The correlation is mostly measured using the rank correlation (Spearman’s rank correlation) and Kendal’s τ correlation coefficient. The values of the correlation coefficient are between -1 and 1. When the value of the correlation coefficient is 0, then the random variables are independent; otherwise, a positive (negative) correlation indicates an increasing (a decreasing) relationship between the random variables.

Rank Correlation

The rank correlation uses the ranks of observations of random variables X and Y. That is, rank correlation depends on the linear relationship between the ranks rather than the random variables themselves.

The ranks are such that 1 is assigned to the smallest value, 2 to the next value, and so on until the largest value is assigned n.

When a rank repeats itself, an average is computed depending on the number of repeated variables, and each is assigned the averaged rank. Consider the ranks 1,2,3,3,3,4,5,6,7,7. Rank 3 is repeated three times, and rank 7 is repeated two times. For the repeated 3’s, the averaged rank is \(\frac {(3+4+5)}{3}=4\). For the repeated 7’s the averaged rank is \(\frac {(9+10)}{2}=8.5\). Note that we are averaging the ranks, which the repeated ranks could have to assume if they were not repeated. So the new ranks are:1,2,4,4,4,4,5,6,8.5,8.5.

Now, denote the rank of X by \(\text R_{\text X}\) and that of Y by \(\text R_{\text Y}\) then the rank correlation estimator is given by:

$$ \hat \rho_{\text s}=\cfrac {\text{Cov} (\widehat{\text R_{\text X},\text R_{\text Y}})}{ \sqrt{\hat {\text V}(R_X)} \sqrt{\hat {\text V} (\text R_{\text Y})}} $$

Alternatively, when all the ranks are distinct (no repeated ranks), the rank correlation estimator is estimated as:

$$ \hat \rho_{\text s}=1-\cfrac {6\sum_{\text i=1}^{\text n}\left(\text R_{\text X_{\text i} }-\text R_{\text Y_{\text i}} \right)^2.}{{\text n(\text n^2-1)}} $$

The intuition of the last formula is that when a highly ranked value of X is paired with corresponding ranked values of Y, then the value of \(\text R_{\text X_{\text i} }-\text R_{\text Y_{\text i}}\) is very small and thus, correlation tends to 1. On the other, if the smaller rank values of X are marched with larger rank values of Y, then \(\text R_{\text X_{\text i} }-\text R_{\text Y_{\text i}}\) is relatively larger and thus, correlation tends to -1.

When the variables X and Y have a linear relationship, linear and rank, correlations have equal value. However, rank correlation is inefficient compared to linear correlation and only used for confirmational checks. On the other hand, rank correlation is insensitive to outliers because it only deals with the ranks and not the values of X and Y.

Example: Calculating the Rank Correlation

Consider the following data.

$$ \begin{array}{c|c|c} \textbf{i} & \textbf{X} & \textbf{Y} \\ \hline {1} & {0.35} & {2.50} \\ \hline {2} & {1.73} & {6.65} \\ \hline {3} & {-0.45} & {-2.43} \\ \hline {4} & {-0.56} & {-5.04} \\ \hline {5} & {4.03} & {3.20} \\ \hline {6} & {3.21} & {2.31} \\ \end{array} $$

What is the value of rank correlation?

Solution

Consider the following table where the ranks of each variable have been filled and the square of their difference in ranks.

$$ \begin{array}{c|c|c|c|c|c} \textbf{i} & \textbf{X} & \textbf{Y} & \bf{\text R_{\text X}} & \bf{\text R_{\text Y}} & \bf{(\text R_{\text X}-\text R_{\text Y})^2} \\ \hline

{1} & {0.35} & {2.50} & {3} & {4} & {1} \\ \hline {2} & {1.73} & {6.65} & {4} & {6} & {4} \\ \hline {3} & {-0.45} & {-2.43} & {2} & {2} & {0} \\ \hline {4} & {-0.56} & {-5.04} & {1} & {1} & {0} \\ \hline {5} & {4.03} & {3.20} & {6} & {5} & {1} \\ \hline {6} & {3.21} & {2.31} & {5} & {3} & {4} \\ \hline {} & {} & {} & {} & \text{Sum} & {10} \\ \end{array} $$

Since there are no repeated ranks, then the rank correlation is given by:

$$ \begin{align*} \hat \rho_{\text s}& =1-\cfrac {6\sum_{\text i=1}^{\text n}\left(\text R_{\text X_{\text i} }-\text R_{\text Y_{\text i}} \right)^2.}{{\text n(\text n^2-1)}} \\ & =1-\cfrac {6×10}{6(6^2-1)} =1-0.2857=0.7143 \\ \end{align*} $$

The Kendal’s Tau (\(\tau\))

Kendal’s Tau is a non-parametric measure of the relationship between two random variables, say, X and Y. Kendal’s \(\tau\) compares the frequency of concordant and discordant pairs.

Consider the set of random variables \(\text X_{\text i}\) and \(\text Y_{\text i}\). These pairs are said to be concordant for all i≠j if the ranks of the components agree. That is, \( \text X_{\text i} > \text X_{\text j}\) when \( \text Y_{\text i} > \text Y_{\text j}\) or \( \text X_{\text i} < \text X_{\text j}\) when \( \text Y_{\text i} < \text Y_{\text j}\). That is, they are concordant if they agree on the same directional position (consistent). When the pairs disagree, they are termed as discordant. Note that ties are neither concordant nor discordant.

Intuitively, random variables with a high number of concordant pairs have a strong positive correlation, while those with a high number of discordant pairs are negatively correlated.

The Kendal’s Tau is defined as:

$$ \hat \tau =\cfrac {\text n_{\text c}-\text n_{\text d}}{\cfrac {\text n(\text n-1)}{2}}=\cfrac {\text n_{\text c}}{\text n_{\text c}+\text n_{\text d}+\text n_{\text t} }-\cfrac {\text n_{\text d}}{\text n_{\text c}+\text n_{\text d}+\text n_{\text t}} $$

Where

\(\text n_{\text c}\)=number of concordant pairs

\(\text n_{\text d}\)=number of discordant pairs

\(\text n_{\text t}\)=number of ties

It is easy to see that Kendal’s Tau is equivalent to the difference between the probabilities of concordance and discordance. Moreover, when all the pairs are concordant, \(\hat \tau =1\) and when all pairs are discordant, \(\hat \tau =-1\).

Example: Calculating the Kendall’s Tau

Consider the following data (same as the example above).

$$ \begin{array}{c|c|c} \textbf{i} & \textbf{X} & \textbf{Y} \\ \hline {1} & {0.35} & {2.50} \\ \hline {2} & {1.73} & {6.65} \\ \hline {3} & {-0.45} & {-2.43} \\ \hline {4} & {-0.56} & {-5.04} \\ \hline {5} & {4.03} & {3.20} \\ \hline {6} & {3.21} & {2.31} \\ \end{array} $$

What is Kendall’s \(\tau\) correlation coefficient?

Solution

The first step is to rank each data:

$$ \begin{array}{c|c|c|c|c} \textbf{i} & \textbf{X} & \textbf{Y} & \bf{\text R_{\text X}} & \bf{\text R_{\text Y}} \\ \hline {1} & {0.35} & {2.50} & {3} & {4} \\ \hline {2} & {1.73} & {6.65} & {4} & {6} \\ \hline {3} & {-0.45} & {-2.43} & {2} & {2} \\ \hline {4} & {-0.56} & {-5.04} & {1} & {1} \\ \hline {5} & {4.03} & {3.20} & {6} & {5} \\ \hline {6} & {3.21} & {2.31} & {5} & {3} \\ \end{array} $$

Next is to arrange ranks in order of rank X, then the concordant (C) pairs are the number of ranks greater than the given rank of Y, and discordant pairs are the number of ranks less than the given rank of Y.

$$ \begin{array}{c|c|c|c|c} \bf{\text R_{\text X}} & \bf{\text R_{\text Y}} & \textbf{C} & \textbf{D}\\ \hline {1} & {1} & {5} & {0} \\ \hline {2} & {2} & {4} & {0} \\ \hline {3} & {4} & {2} & {1} \\ \hline {4} & {6} & {1} & {1} \\ \hline {5} & {3} & {1} & {0} \\ \hline {6} & {5} & {-} & {-} \\ \hline \textbf{Total} & {} & {13} & {2} \\ \end{array} $$

Note that, C=4, are the number of ranks greater than 2 (4,3,5 and 6) below it. Also, D=0 is the number of ranks less than 2 below it. This is continued up to the second last row since there are no more ranks to look up.

So, \(\text n_{\text c}=13\) and \(\text n_{\text d}=2\)

$$ \Rightarrow \hat \tau =\cfrac {\text n_{\text c}-\text n_{\text d}}{\cfrac {\text n(\text n-1)}{2}}=\cfrac {13-2}{\cfrac {6(6-1}{2}}=\cfrac {11}{15}=0.7333 $$

Practice Question

Suppose that we know from experience that \(\alpha\) = 3 for a particular financial variable, and we observe that the probability that X > 10 is 0.04.

Determine the probability that X is greater than 20.

A. 125%

B. 0.5%

C. 4%

D. 0.1%

The correct answer is B.

From the given probability, we can get the value of constant k as follows:

$$ \begin{align*} \text {prob}(\text X > \text x) & = \text{kx}^{(-\alpha)} \\ 0.04 & = \text k(10)^{(-3)} \\ \text k & = 40 \\ \end{align*} $$

Thus,

$$ \text P(\text X > 20) = 40(20)^{(-3)} = 0.005 \text { or } 0.5\% $$

Note: The power law provides an alternative to assuming normal distributions.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.

    Leave a Comment