Data Visualization
Data visualization refers to the presentation of data in a pictorial or graphical... Read More
A parametric test is a hypothesis test concerning a population parameter used when the data has specific distribution assumptions. If these assumptions are not met, non-parametric tests are used.
In summary, researchers use non-parametric testing when:
We frequently compare the population correlation coefficient to zero when testing for correlation. This helps us determine whether there’s a relationship between the variables. The population correlation coefficient, represented by \(\rho\), is used to test the relationship. There are three possible hypotheses:
Let’s assume that we have variables X and Y. The sample correlation, \(r_{XY}\), tests the above hypotheses.
The parametric pairwise correlation coefficient, also known as Pearson correlation, is used to test the correlation in a parametric test. The formula for the sample correlation involves the sample covariance between the \(X\) and \(Y\) variables and their respective standard deviations, which is expressed as:
$$ r=\frac{S_{XY}}{S_XS_Y} $$
Where:
\(S_{XY}\)= Sample covariance between the \(X\) and \(Y\) variables.
\(S_X\)= Standard deviation of the \(X\) variable.
\(S_Y\) = Standard deviation of the \(Y\) variable.
A t-test can determine if the null hypothesis should be rejected using the sample correlation, \(r\) if the two variables are normally distributed. The formula for the t-test is:
$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} } $$
Where:
\(r\)= Sample correlation.
\(n\)= Sample size.
\(\left(n-2\right)\)= Degrees of freedom.
The test statistic follows a t-distribution with \(n-2\) degrees of freedom. From the equation above, it is easy to see that the sample size, \(n\), increases, and the degrees of freedom increase. In other words, as the sample size \(n\) increases, the power of the test increases. This implies that a false null hypothesis will likely be rejected as the sample size increases.
Example: Parametric Test of a Correlation
The table below shows the sample correlations between the monthly returns of five different sector-specific exchange-traded funds (ETFs) and the overall market index (Market 1). There are 48 monthly observations, and the following ETFs are included in the analysis:
$$ \begin{array}{c|c|c|c|c|c|c}
& \text{ETF } 1 & \text{ETF } 2 & \text{ETF } 3 & \text{ETF } 4 & \text{ETF } 5 & \text{Market } 1 \\ \hline
\text{ETF } 1 & 1 \\ \hline
\text{ETF } 2 & 0.8214 & 1 \\ \hline
\text{ETF } 3 & 0.5672 & 0.6438 & 1 \\ \hline
\text{ETF } 4 & 0.4276 & 0.5789 & 0.4123 & 1 \\ \hline
\text{ETF } 5 & 0.7121 & 0.7942 & 0.6896 & 0.5614 & 1 \\ \hline
\text{Market } 1 & 0.8375 & 0.9096 & 0.7223 & 0.6954 & 0.7919 & 1
\end{array} $$
Using a 1% significance level and the following hypotheses: \(H_0:\rho=0 \text{ versus } H_a:\rho\neq 0\), calculate the t-statistic for the correlation between ETF 2 and ETF 4. Based on the calculated t-statistic, draw a conclusion about the significance of the correlation using the following sample t-table:
$$\begin{array}{|lccccc}
\hline \text { df } & \boldsymbol{p}=\mathbf{0 . 1 0} & \boldsymbol{p}=\mathbf{0 . 0 5} & \boldsymbol{p}=\mathbf{0 . 0 2 5} & \boldsymbol{p}=\mathbf{0 . 0 1} & \boldsymbol{p}=\mathbf{0 . 0 0 5} \\
\hline \mathbf{3 1} & 1.309 & 1.696 & 2.040 & 2.453 & 2.744 \\
\mathbf{3 2} & 1.309 & 1.694 & 2.037 & 2.449 & 2.738 \\
\mathbf{3 3} & 1.308 & 1.692 & 2.035 & 2.445 & 2.733 \\
\mathbf{3 4} & 1.307 & 1.691 & 2.032 & 2.441 & 2.728 \\
\mathbf{3 5} & 1.306 & 1.690 & 2.030 & 2.438 & 2.724 \\
\mathbf{3 6} & 1.306 & 1.688 & 2.028 & 2.434 & 2.719 \\
\mathbf{3 7} & 1.305 & 1.687 & 2.026 & 2.431 & 2.715 \\
\mathbf{3 8} & 1.304 & 1.686 & 2.024 & 2.429 & 2.712 \\
\mathbf{3 9} & 1.304 & 1.685 & 2.023 & 2.426 & 2.708 \\
\mathbf{4 0} & 1.303 & 1.684 & 2.021 & 2.423 & 2.704 \\
\mathbf{4 1} & 1.303 & 1.683 & 2.020 & 2.421 & 2.701 \\
\mathbf{4 2} & 1.302 & 1.682 & 2.018 & 2.418 & 2.698 \\
\mathbf{4 3} & 1.302 & 1.681 & 2.017 & 2.416 & 2.695 \\
\mathbf{4 4} & 1.301 & 1.680 & 2.015 & 2.414 & 2.692 \\
\mathbf{4 5} & 1.301 & 1.679 & 2.014 & 2.412 & 2.690 \\
\mathbf{4 6} & 1.300 & 1.679 & 2.013 & 2.410 & 2.687 \\
\mathbf{4 7} & 1.300 & 1.678 & 2.012 & 2.408 & 2.685 \\
\mathbf{4 8} & 1.299 & 1.677 & 2.011 & 2.407 & 2.682
\end{array}$$
Solution
To test the significance of the correlation between ETF 2 and ETF 4, we will use the t-test formula:
$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} } $$
Where:
\(r\) = Sample correlation coefficient (in this case, \(r_{EFT2,ETF4}=0.5789\)).
\(n\) = Number of observations (48 in this case).
Now, let’s calculate the t-statistic:
$$ t=\frac{r\sqrt{n-2}}{\sqrt{\left(1-r^2\right)} }=\frac{0.5789\sqrt{48-2}}{\sqrt{1-{0.5789}^2}}=4.815$$
The calculated t-statistic for the correlation between ETF2 and ETF4 is 4.815.
At the 1% significance level, with a two-tailed test and degrees of freedom,
\(df=n-2=46\), the critical t-value is approximately \(\pm 2.687\).
Conclusion: We reject the null hypothesis since our calculated t-statistic (4.815) is greater than the critical value (+2.687). This indicates sufficient evidence to suggest that the correlation between ETF 2 and ETF 4 significantly differs from zero.
The Spearman rank correlation coefficient, \(r_S\), is a non-parametric test used to examine the relationship between two data sets when the population deviates from normality.
The Spearman rank correlation coefficient is like the Pearson correlation coefficient. The difference is that the Spearman coefficient is calculated based on the ranks of variables in the samples.
Consider two variables, \(X\) and \(Y\). We need to calculate Spearman’s Rank Correlation \(r_S\).
Example: Calculating Spearman’s Rank Correlation Coefficient
An analyst is studying the relationship between returns for two sectors, steel and cement, over the past 5 years using Spearman’s rank correlation coefficient. The hypotheses are \(H_0: r_S=0\) and \(H_a:r_S\neq0\). The returns of both sectors are provided below.
$$ \begin{array}{c|c|c}
\text{Year} & \text{Steel sector returns} & \text{Cement sector returns} \\ \hline
1 & 10\% & 8\% \\ \hline
2 & 6\% & 7\% \\ \hline
3 & 9\% & 5\% \\ \hline
4 & 12\% & 6\% \\ \hline
5 & 8\% & 9\%
\end{array} $$
The Spearman’s rank correlation coefficient is closest to:
Solution
$$ \begin{array}{c|c|c|c|c|c|c}
\textbf{Year} & {\text{Steel} \\ \text{sector} \\ \text{returns} \\ \text{(X)} } & { \text{Cement} \\ \text{sector} \\ \text{returns} \\ \text{(Y)} } & { \text{Rank} \\ \text{order} \\ \text{for X} } & { \text{Rank} \\ \text{order} \\ \text{for Y} } & D & {{d}^{2}} \\ \hline
1 & 10\% & 8\% & 2 & 2 & 0 & 0 \\ \hline
2 & 6\% & 7\% & 5 & 3 & 2 & 4 \\ \hline
3 & 9\% & 5\% & 3 & 5 & -2 & 4 \\ \hline
4 & 12\% & 6\% & 1 & 4 & -3 & 9 \\ \hline
5 & 8\% & 9\% & 4 & 1 & 3 & 9 \\ \hline
& & & & & \text{Sum}= & 26
\end{array} $$
We can now use the formula:
$$ \begin{align*} r_s & =1-\frac{6\sum_{i=1}^{n}d_i^2}{n\left(n^2-1\right)}=1-\left[\frac{\left(6\times26\right)}{5\times\left(5^2-1\right)}\right]=1-1.3 \\
r_s & =-0.3 \end{align*} $$
This indicates a very weak negative correlation between the returns of the steel and cement sectors.
The hypothesis test on the Spearman Rank depends on the sample size. If the sample size is small \((n\le30)\), we would need a specialized table of critical value. On the other hand, if the sample size is large \((n \gt 30)\), we can perform a t-test using the test statistic similar to that of Pearson correlation:
$$ t=\frac{r_s\sqrt{n-2}}{\sqrt{\left(1-r_s^2\right)} } $$
Consider the above example. Assume we want to conduct a hypothesis test at a 5% significance level. The hypotheses statement is \(H_0: r_S=0\) and \(H_a: r_S\neq 0\)
Question
Assume an investment analyst, John Smith, is studying the relationship between two stocks, \(X\) and \(Y\). Based on 100 observations, he has found that \(S_{XY} = 10, S_X= 2,\) and \(S_Y=8\). Smith needs to find the sample correlation \(r_{XY}\) and use it to perform a t-test to determine if there is a significant correlation between the returns of stocks \(X\) and \(Y\). The critical value for the test statistic at the 0.05 level of significance is approximately 1.96. He should conclude that the statistical relationship between \(X\) and \(Y\) is:
- Significant because the test statistic falls outside the range of the critical values.
- Significant, because the absolute value of the test statistic is less than the critical value.
- Insignificant because the test statistic falls outside the range of the critical values.
Solution
The correct answer is A.
Note that the sample correlation coefficient, \(r_{XY}\) is calculated using the following formula:
$$ r_{XY}=\frac{S_{XY}}{S_XS_Y} $$
Substituting the given values in this formula, we get:
$$ r_{XY}=\frac{10}{2\times8}=0.625 $$
To test the significance of the sample correlation, we can use a t-test with the following null and alternative hypotheses: \(H_0=\rho=0\) and \(H_\propto=\rho\neq0\)
The test statistic for this test is calculated using the following formula:
$$ t=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}} $$
Where:
\(r\) = Sample correlation coefficient.
\(n\) = The Number of observations.
Substituting the given values into this formula, we get:
$$ t=\frac{0.625\sqrt{100-2}}{\sqrt{1-{0.625}^2}}\approx\frac{6.1872}{0.7806}=7.9262 $$
The critical value for the test statistic at the 0.05 level of significance is approximately 1.96.
Since our calculated test statistic (7.9262) is greater than the upper bound of the critical values for the test statistic (1.96), we reject the null hypothesis. This indicates sufficient evidence to suggest that the correlation between X and Y is significantly different from zero.
Therefore, John Smith should conclude that the statistical relationship between \(X\) and \(Y\) is significant because the test statistic falls outside the range of the critical values (Option A).