Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10.

Tests of Independence

quantitative-methods

Tests of Independence

25 Aug 2023

Parametric versus Non-parametric Tests of Independence

A parametric test is a hypothesis test concerning a population parameter used when the data has specific distribution assumptions. If these assumptions are not met, non-parametric tests are used.

In summary, researchers use non-parametric testing when:

Data do not meet distributional assumptions.
There are outliers.
Data is given in the form of ranks.
The hypothesis test objective does not concern a parameter.

Hypotheses Concerning Population Correlation Coefficient

We frequently compare the population correlation coefficient to zero when testing for correlation. This helps us determine whether there’s a relationship between the variables. The population correlation coefficient, represented by \(\rho\), is used to test the relationship. There are three possible hypotheses:

Two-sided; \(H_0: \rho=0 \text{ versus } H_a: \rho\neq 0\).
One-sided right side; \(H_0: \rho \le 0 \text{ versus } H_a: \rho \gt 0\).
One-sided left side; \(H_0: \rho\geq0 \text{ versus } H_a: \rho \lt 0\).

Let’s assume that we have variables X and Y. The sample correlation, \(r_{XY}\), tests the above hypotheses.

Parametric Test of a Correlation

The parametric pairwise correlation coefficient, also known as Pearson correlation, is used to test the correlation in a parametric test. The formula for the sample correlation involves the sample covariance between the \(X\) and \(Y\) variables and their respective standard deviations, which is expressed as:

$$ r=\frac{S_{XY}}{S_XS_Y} $$

Where:

\(S_{XY}\)= Sample covariance between the \(X\) and \(Y\) variables.

\(S_X\)= Standard deviation of the \(X\) variable.

\(S_Y\) = Standard deviation of the \(Y\) variable.

A t-test can determine if the null hypothesis should be rejected using the sample correlation, \(r\) if the two variables are normally distributed. The formula for the t-test is:

$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} } $$

Where:

\(r\)= Sample correlation.

\(n\)= Sample size.

\(\left(n-2\right)\)= Degrees of freedom.

The test statistic follows a t-distribution with \(n-2\) degrees of freedom. From the equation above, it is easy to see that the sample size, \(n\), increases, and the degrees of freedom increase. In other words, as the sample size \(n\) increases, the power of the test increases. This implies that a false null hypothesis is more likely to be rejected as the sample size increases.

Example: Parametric Test of a Correlation

The table below shows the sample correlations between the monthly returns of five different sector-specific exchange-traded funds (ETFs) and the overall market index (Market 1). There are 48 monthly observations, and the following ETFs are included in the analysis:

$$ \begin{array}{c|c|c|c|c|c|c}
& \text{ETF } 1 & \text{ETF } 2 & \text{ETF } 3 & \text{ETF } 4 & \text{ETF } 5 & \text{Market } 1 \\ \hline
\text{ETF } 1 & 1 \\ \hline
\text{ETF } 2 & 0.8214 & 1 \\ \hline
\text{ETF } 3 & 0.5672 & 0.6438 & 1 \\ \hline
\text{ETF } 4 & 0.4276 & 0.5789 & 0.4123 & 1 \\ \hline
\text{ETF } 5 & 0.7121 & 0.7942 & 0.6896 & 0.5614 & 1 \\ \hline
\text{Market } 1 & 0.8375 & 0.9096 & 0.7223 & 0.6954 & 0.7919 & 1
\end{array} $$

Using a 1% significance level and the following hypotheses: \(H_0:\rho=0 \text{ versus } H_a:\rho\neq 0\), calculate the t-statistic for the correlation between ETF 2 and ETF 4. Based on the calculated t-statistic, draw a conclusion about the significance of the correlation using the following sample t-table:

$$\begin{array}{|lccccc}
\hline \text { df } & \boldsymbol{p}=\mathbf{0 . 1 0} & \boldsymbol{p}=\mathbf{0 . 0 5} & \boldsymbol{p}=\mathbf{0 . 0 2 5} & \boldsymbol{p}=\mathbf{0 . 0 1} & \boldsymbol{p}=\mathbf{0 . 0 0 5} \\
\hline \mathbf{3 1} & 1.309 & 1.696 & 2.040 & 2.453 & 2.744 \\
\mathbf{3 2} & 1.309 & 1.694 & 2.037 & 2.449 & 2.738 \\
\mathbf{3 3} & 1.308 & 1.692 & 2.035 & 2.445 & 2.733 \\
\mathbf{3 4} & 1.307 & 1.691 & 2.032 & 2.441 & 2.728 \\
\mathbf{3 5} & 1.306 & 1.690 & 2.030 & 2.438 & 2.724 \\
\mathbf{3 6} & 1.306 & 1.688 & 2.028 & 2.434 & 2.719 \\
\mathbf{3 7} & 1.305 & 1.687 & 2.026 & 2.431 & 2.715 \\
\mathbf{3 8} & 1.304 & 1.686 & 2.024 & 2.429 & 2.712 \\
\mathbf{3 9} & 1.304 & 1.685 & 2.023 & 2.426 & 2.708 \\
\mathbf{4 0} & 1.303 & 1.684 & 2.021 & 2.423 & 2.704 \\
\mathbf{4 1} & 1.303 & 1.683 & 2.020 & 2.421 & 2.701 \\
\mathbf{4 2} & 1.302 & 1.682 & 2.018 & 2.418 & 2.698 \\
\mathbf{4 3} & 1.302 & 1.681 & 2.017 & 2.416 & 2.695 \\
\mathbf{4 4} & 1.301 & 1.680 & 2.015 & 2.414 & 2.692 \\
\mathbf{4 5} & 1.301 & 1.679 & 2.014 & 2.412 & 2.690 \\
\mathbf{4 6} & 1.300 & 1.679 & 2.013 & 2.410 & 2.687 \\
\mathbf{4 7} & 1.300 & 1.678 & 2.012 & 2.408 & 2.685 \\
\mathbf{4 8} & 1.299 & 1.677 & 2.011 & 2.407 & 2.682
\end{array}$$

Solution

To test the significance of the correlation between ETF 2 and ETF 4, we will use the t-test formula:

$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} } $$

Where:

\(r\) = Sample correlation coefficient (in this case, \(r_{EFT2,ETF4}=0.5789\)).

\(n\) = Number of observations (48 in this case).

Now, let’s calculate the t-statistic:

$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} }=\frac{0.5789\sqrt{48-2}}{\sqrt{1-{0.5789}^2}}=6.0505 $$

The calculated t-statistic for the correlation between ETF2 and ETF4 is 6.0505.

At the 1% significance level, with a two-tailed test and degrees of freedom,

\(df=n-2=46\), the critical t-value is approximately \(\pm 2.687\).

Conclusion: We reject the null hypothesis since our calculated t-statistic (6.0505) is greater than the critical value (+2.687). This indicates sufficient evidence to suggest that the correlation between ETF 2 and ETF 4 significantly differs from zero.

Non-Parametric Test of Correlation: The Spearman Rank Correlation Coefficient

The Spearman rank correlation coefficient, \(r_S\), is a non-parametric test used to examine the relationship between two data sets when the population deviates from normality.

The Spearman rank correlation coefficient is like the Pearson correlation coefficient. The difference is that the Spearman coefficient is calculated based on the ranks of variables in the samples.

Consider two variables, \(X\) and \(Y\). We need to calculate Spearman’s Rank Correlation \(r_S\).

Steps of Calculating Spearman’s Rank Correlation Coefficient, \(\bf{{r}_{S}}\)

Rank the observations of each variable \(X\) and \(Y\) in descending order. Note that when there are tied values in the data, their ranks are calculated by taking the average of the ranks that would have been assigned to those values if they were not tied.
Find the difference between the ranks for each pair of observations.
Square the difference and calculate the sum of the difference, that is \(\sum d_i\).
Use the following formula to find \(r_S\):$$ r_s=1-\frac{6\sum_{i=1}^{n}d_i^2}{n\left(n^2-1\right)} $$Where;\(d_i\)=The difference between the ranks for each pair of observations\(n\)= Sample size.

Example: Calculating Spearman’s Rank Correlation Coefficient

An analyst is studying the relationship between returns for two sectors, steel and cement, over the past 5 years by using Spearman’s rank correlation coefficient. The hypotheses are \(H_0: r_S=0\) and \(H_a:r_S\neq0\). The returns of both sectors are provided below.

$$ \begin{array}{c|c|c}
\text{Year} & \text{Steel sector returns} & \text{Cement sector returns} \\ \hline
1 & 10\% & 8\% \\ \hline
2 & 6\% & 7\% \\ \hline
3 & 9\% & 5\% \\ \hline
4 & 12\% & 6\% \\ \hline
5 & 8\% & 9\%
\end{array} $$

The Spearman’s rank correlation coefficient is closest to:

Solution

$$ \begin{array}{c|c|c|c|c|c|c}
\textbf{Year} & {\text{Steel} \\ \text{sector} \\ \text{returns} \\ \text{(X)} } & { \text{Cement} \\ \text{sector} \\ \text{returns} \\ \text{(Y)} } & { \text{Rank} \\ \text{order} \\ \text{for X} } & { \text{Rank} \\ \text{order} \\ \text{for Y} } & D & {{d}^{2}} \\ \hline
1 & 10\% & 8\% & 2 & 2 & 0 & 0 \\ \hline
2 & 6\% & 7\% & 5 & 3 & 2 & 4 \\ \hline
3 & 9\% & 5\% & 3 & 5 & -2 & 4 \\ \hline
4 & 12\% & 6\% & 1 & 4 & -3 & 9 \\ \hline
5 & 8\% & 9\% & 4 & 1 & 3 & 9 \\ \hline
& & & & & \text{Sum}= & 26
\end{array} $$

We can now use the formula:

$$ \begin{align*} r_s & =1-\frac{6\sum_{i=1}^{n}d_i^2}{n\left(n^2-1\right)}=1-\left[\frac{\left(6\times26\right)}{5\times\left(5^2-1\right)}\right]=1-1.3 \\
r_s & =-0.3 \end{align*} $$

This indicates a very weak negative correlation between the returns of the steel and cement sectors.

Hypothesis Test for the Spearman Rank Correlation

The hypothesis test on the Spearman Rank depends on the sample size. If the sample size is small \((n\le30)\), we would need a specialized table of critical value. On the other hand, if the sample size is large \((n \gt 30)\), we can perform a t-test using the test statistic similar to that of Pearson correlation:

$$ t=\frac{r_s\sqrt{n-2}}{\sqrt{\left(1-r_s^2\right)} } $$

Consider the above example. Assume we want to conduct a hypothesis test at a 5% significance level. The hypotheses statement is \(H_0: r_S=0\) and \(H_a: r_S\neq 0\)

Question 1

Assume an investment analyst, John Smith, is studying the relationship between two stocks, \(X\) and \(Y\). Based on 100 observations, he has found that \(S_{XY} = 10, S_X= 2,\) and \(S_Y=8\). Smith needs to find the sample correlation \(r_{XY}\) and use it to perform a t-test to determine if there is a significant correlation between the returns of stocks \(X\) and \(Y\). The critical value for the test statistic at the 0.05 level of significance is approximately 1.96. He should conclude that the statistical relationship between \(X\) and \(Y\) is:

Significant because the test statistic falls outside the range of the critical values.

Significant, because the absolute value of the test statistic is less than the critical value.

Insignificant because the test statistic falls outside the range of the critical values.

Solution

The correct answer is A.

Note that the sample correlation coefficient, \(r_{XY}\) is calculated using the following formula:

$$ r_{XY}=\frac{S_{XY}}{S_XS_Y} $$

Substituting the given values in this formula, we get:

$$ r_{XY}=\frac{10}{2\times8}=0.625 $$

To test the significance of the sample correlation, we can use a t-test with the following null and alternative hypotheses: \(H_0=\rho=0\) and \(H_\propto=\rho\neq0\)

The test statistic for this test is calculated using the following formula:

$$ t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}} $$

Where:

\(r\) = Sample correlation coefficient.

\(n\) = The Number of observations.

Substituting the given values into this formula, we get:

$$ t=\frac{0.625\sqrt{100-2}}{\sqrt{1-{0.625}^2}}\approx\frac{6.1872}{0.7806}=7.9262 $$

The critical value for the test statistic at the 0.05 level of significance is approximately 1.96.

Since our calculated test statistic (7.9262) is greater than the upper bound of the critical values for the test statistic (1.96), we reject the null hypothesis. This indicates sufficient evidence to suggest that the correlation between X and Y is significantly different from zero.

Therefore, John Smith should conclude that the statistical relationship between \(X\) and \(Y\) is significant because the test statistic falls outside the range of the critical values (Option A).

Question 2

An analyst is studying the relationship between returns for two sectors, steel and cement, over the past 5 years by using Spearman’s rank correlation coefficient. The returns of both sectors are provided below.

$$ \begin{array}{c|c|c}
\textbf{Year} & \textbf{Steel Sector Returns} & \textbf{Cement Sector Returns} \\ \hline
1 & 2.5\% & 3.2\% \\ \hline
2 & 5\% & 4.5\% \\ \hline
3 & 5.6\% & 4.2\% \\ \hline
4 & -3\% & -1.7\% \\ \hline
5 & 0.5\% & 1.1\%
\end{array} $$

The Spearman’s rank correlation coefficient is closest to:

0.5

0.6

0.8

Solution

$$ \begin{array}{c|c|c|c|c|c|c}
\textbf{Year} & \bf{\text{Steel} \\ \text{Sector} \\ \text{Returns (X)} } & \bf{ \text{Cement} \\ \text{Sector} \\ \text{Returns (Y)} } & \bf{\text{Rank} \\ \text{of X} } & \bf{ \text{Rank} \\ \text{of Y} } & \bf d & \bf{d^2} \\ \hline
1 & 2.5\% & 3.2\% & 3 & 4 & -1 & 1 \\ \hline
2 & 5\% & 4.5\% & 2 & 1 & 1 & 1 \\ \hline
3 & 5.6\% & 4.2\% & 1 & 2 & -1 & 1 \\ \hline
4 & -3\% & -1.7\% & 5 & 5 & 0 & 0 \\ \hline
5 & 0.5\% & 1.1\% & 3 & 3 & 1 & 1 \\ \hline
& & & & & \textbf{Sum} & \bf 4
\end{array} $$

We now use the formula:

$$ \begin{align*} r_s & =1-\frac{6\sum_{i=1}^{n}d_i^2}{n\left(n^2-1\right)} \\ & =1-\frac{6\times4}{5\left(5^2-1\right)} \\ & =0.8 \end{align*} $$

Sergio Torrico

2021-07-23

Excelente para el FRM 2 Escribo esta revisión en español para los hispanohablantes, soy de Bolivia, y utilicé AnalystPrep para dudas y consultas sobre mi preparación para el FRM nivel 2 (lo tomé una sola vez y aprobé muy bien), siempre tuve un soporte claro, directo y rápido, el material sale rápido cuando hay cambios en el temario de GARP, y los ejercicios y exámenes son muy útiles para practicar.

diana

2021-07-17

So helpful. I have been using the videos to prepare for the CFA Level II exam. The videos signpost the reading contents, explain the concepts and provide additional context for specific concepts. The fun light-hearted analogies are also a welcome break to some very dry content. I usually watch the videos before going into more in-depth reading and they are a good way to avoid being overwhelmed by the sheer volume of content when you look at the readings.

Kriti Dhawan

2021-07-16

A great curriculum provider. James sir explains the concept so well that rather than memorising it, you tend to intuitively understand and absorb them. Thank you ! Grateful I saw this at the right time for my CFA prep.

nikhil kumar

2021-06-28

Very well explained and gives a great insight about topics in a very short time. Glad to have found Professor Forjan's lectures.

Marwan

2021-06-22

Great support throughout the course by the team, did not feel neglected

Benjamin anonymous

2021-05-10

I loved using AnalystPrep for FRM. QBank is huge, videos are great. Would recommend to a friend

Daniel Glyn

2021-03-24

I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!

michael walshe

2021-03-18

Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.

Optimal Capital Structure

Tests of Independence Using Contingency Table Data

quantitative-methods

Applications of Big Data and Data Scie ...

Data science is an interdisciplinary field that uses developments in computer science, statistics,... Read More

quantitative-methods

Data Types

This reading will teach you the tools and techniques used to organize, visualize,... Read More

quantitative-methods

Calculating and Interpreting Quartiles ...

Quartiles, quintiles, deciles, and percentiles are values or cut points that partition a... Read More

quantitative-methods

Defining Properties of a Probability

Defining properties of a probability refer to the rules that constitute any given... Read More