Quantiles and Related Visualizations
Quantiles Quartiles, quintiles, deciles, and percentiles are values or cut points that partition... Read More
With categorical or discrete data, correlation is not suitable for assessing relationships between variables. Instead, we use a non-parametric test called the chi-square test of independence, which employs a chi-square distributed test statistic.
We employ a contingency table to structure the data when examining the connection between two categorical variables. Subsequently, we apply a test of independence utilizing a chi-square distribution to assess whether a noteworthy relationship exists between these variables. The test statistic is calculated as follows:
$$ \chi=\sum_{i=1}^{m}\frac{\left(O_{ij}-E_{ij}\right)^2}{\left(E_{ij}\right)} $$
Where:
\(E_{ij}=\frac{\left(\text{Total row i}\right)\times\left(\text{Total column j}\right)}{\text{Overall Total}} \)
\(m\)= Number of cells in the table, the Number of groups in the first class, multiplied by the number of groups in the second class.
\(O_{ij}\)= Number of observations in each cell of row \(i\) and column \(j\) (i.e., observed frequency).
\(E_{ij}\)= Expected number of observations in each cell of row \(i\) and column \(j\), assuming independence (i.e., expected frequency).
The degrees of freedom are given by:
$$ \text{Degrees of freedom}=(r-1)(c-1) $$
Where:
\(r\)= Number of rows.
\(c\)= Number of columns.
Example: Testing Independence Based on Contingency Table Data
The following contingency table shows the responses of two categories of investors (employed vs. retired) with regard to their primary investment objectives (growth, income, or both). The total sample size is 173.
$$ \begin{array}{c|c|c|c|c}
& \textbf{Growth} & \textbf{Income} & \textbf{Both} & \textbf{Total} \\ \hline
\text{Employed} & 52 & 25 & 10 & \bf{87} \\ \hline
\text{Retired} & 32 & 47 & 7 & \bf{86} \\ \hline
\textbf{Total} & \bf{84} & \bf{72} & \bf{17} & \bf{173}
\end{array} $$
Use a 95% significance level to test whether there is any significant difference between employed and retired investors concerning primary investment objectives.
Solution
\(H_0\): There is no significant difference between employed and retired investors with regard to primary investment objectives.
\(H_\alpha\): There is a significant difference between employed and retired investors with regard to primary investment objectives.
Step 1: We calculate the expected frequency of investors by their category (employed vs. retired) and investment objective using the following formula:
$$ E_{ij}=\frac{\left(\text{Total row i}\right)\times\left(\text{Total column j}\right)}{\text{Overall Total} } $$
$$ \begin{array}{c|c|c|c|c}
& \textbf{Growth} & \textbf{Income} & \textbf{Both} & \textbf{Total} \\ \hline
\text{Employed} & {\frac{\left(87\times84\right)}{173} =42.24} & {\frac{\left(87\times72\right)}{173} =36.20} & {\frac{\left(87\times17\right)}{173} =8.55} &\bf{87} \\ \hline
\text{Retired} & {\frac{\left(86\times84\right)}{173} =41.75} & {\frac{\left(86\times72\right)}{173} =35.79} & {\frac{\left(86\times17\right)}{173} =8.45} & \bf{86} \\ \hline
\textbf{Total} & \bf{84} & \bf{72} & \bf{17} & \bf{173}
\end{array} $$
Step 2: We calculate the scaled squared deviation for each combination of investor category and investment objective as follows:
$$ \begin{array}{c|c|c|c}
& \textbf{Growth} & \textbf{Income} & \textbf{Both} \\ \hline
\text{Employed} & \frac{\left(52-42\right)^2}{42}=2.254 & \frac{\left(25-36\right)^2}{36}=0.469 & \frac{\left(10-9\right)^2}{9}=0.246 \\ \hline
\text{Retired} & \frac{\left(32-42\right)^2}{42}=2.280 & \frac{\left(47\times 36 \right)^2}{36}=3.510 & \frac{\left(7-8\right)^2}{8}=0.349 \\ \hline
\textbf{Total} & \bf{4.534} & \bf{6.979} & \bf{0.495}
\end{array} $$
Step 3: We calculate the value of \(\chi^2\):
$$ \chi^2=4.534+6.979+0.495=12.008 $$
Step 4: The critical value of \(X^2\) is 5.99. It is determined as follows:
Decision rule: The calculated value of \(\chi^2 =12.008\) is greater than the critical value of 5.99. As such, sufficient evidence supports the conclusion that retired and employed investors have different primary investment objectives.
Question
Regarding the chi-square test of independence, which statement is accurate? The chi-square test of independence is:
- A parametric hypothesis test.
- Used to test whether two categorical variables are related to each other.
- Used to test whether two continuous variables are related to each other.
Solution
The correct answer is B. The chi-square test of independence is a non-parametric hypothesis test that can be used to test whether two categorical variables are related.
A is incorrect because the chi-square test of independence is non-parametric, not parametric.
C is incorrect because the chi-square test of independence is used for categorical variables, not continuous variables.