Calculate probabilities for linear com ...
Definition: Let \(X_{1}, X_{2}, \ldots, X_{n}\) be random variables and let \(c_{1}, c_{2},... Read More
Let \(\text{X}\) and \(\text{Y}\) be two discrete random variables, with a joint probability mass function, \(\text{f}(\text{x}, \text{y})\). Then, the random variables \(\text{X}\) and \(\text{Y}\) are said to be independent if and only if,
$$
\text{f}(\text{x}, \text{y})=\text{f}(\text{x}) * \text{f}(\text{y}), \quad \text { for all } \text{x}, \text{y}
$$
However, if the above condition does not hold, then the two random variables \(\text{X}\) and \(\text{Y}\) are dependent.
In this reading, we will base our discussion on the relationship between two or more discrete random variables. If we have two dependent discrete random variables, \(X\) and \(Y\), we would wish to establish how one varies with respect to the other. If \(\text{X}\) increases, for example, does \(\text{Y}\) also tend to increase or decrease? And if so, how strong is the dependence between the two? Two measures that can help us answer these questions are covariance and correlation coefficient.
Let \(\text{X}\) and \(\text{Y}\) be two discrete non-independent random variables. The covariance of \(\text{X}\) and \(\text{Y}\) denoted, \(\operatorname{Cov}[\text{X}, \text{Y}]\), is defined by:
$$
\operatorname{Cov}[\text{X}, \text{Y}]=\text{E}[(\text{X}-\text{E}[\text{X}])(\text{Y}-\text{E}[\text{Y}])]
$$
This simplifies to:
$$
\operatorname{Cov}[\text{X}, \text{Y}]=\text{E}[\text{XY}]-\text{E}[\text{X}] * \text{E}[\text{Y}]
$$
Recall that in the univariate case, we defined the variance of a discrete random variable, \(\text{X}\), as:
$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}
$$
Which can be expressed as:
$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}=\text{E}[\text{X} * \text{X}]-\text{E}[\text{X}] * \text{E}[\text{X}]
$$
Thus, we can conclude that,
$$
\operatorname{Var}(\text{X})=\operatorname{Cov}[\text{X}, \text{X}]
$$
Note:
The units of \(\operatorname{Cov}(\text{X}, \text{Y})\) are the product of those of \(\text{X}\) and \(\text{Y}\). So, for example, if \(X\) is time in hours, and \(\mathrm{Y}\) is a sum of money in \(\$\), then Cov is in \(\$\) hours.
\(\text{E}[\text{XY}]\) can be computed directly from the joint probability function, \(\text{f}(\text{x}, \text{y})\) while \(\text{E}[\text{X}]\) and \(\text{E}[\text{Y}]\) can be computed using their respective marginal probability functions, i.e.,
$$
\begin{align}
\text{E(X Y)} &=\sum_{\text {all } \text {x }\text { all y}} \sum_{\text{y}} * \text{f(x, y)} \\
\text{E[X]} &=\sum_{\text{x}} \text{x} *\text{ f(x)} ; \text { and } \\
\text{E[Y]} &=\sum_{\text{y}} \text{y} * \text{f(y)}
\end{align}
$$
Let \(\text{X}, \text{Y}\), and \(\text{Z}\) be random variables and let \(\text{a}, \text{b}\), and \(\text{c}\) be non-zero constants.
Then, the following properties hold:
i. \(\operatorname{Cov}(\text{X}, \text{Y})=\operatorname{Cov}(\text{Y}, \text{X})\)
ii. \(\operatorname{Cov}(\text{X}, \text{X})=\operatorname{Var}(\text{X})\)
iii. \(\operatorname{Cov}(a X, b Y)=\operatorname{abCov}(X, Y)\)
iv. \(\operatorname{Cov}[a X+b, c Y+d]=a c \cdot \operatorname{Cov}[X, Y]\)
v. \(\operatorname{Cov}[\text{X}, \text{Y}+\text{Z}]=\operatorname{Cov}[\text{X}, \text{Y}]+\operatorname{Cov}[\text{X}, \text{Z}]\)
vi. If \(X\) and \(Y\) are independent, \(\operatorname{Cov}[X, Y]=0\)
vii. \(\operatorname{Cov}(\text{X}, \text{c})=\text{E}[(\text{X}-\text{E}(\text{X}))(\text{c}-\text{c})]=\text{E}(0)=0\)
A motor insurance company covers \(100 \%\) of losses that occur due to accident and only \(90 \%\) of losses that occur due to theft. Let \(X\) be the number of accident claims, and let \(Y\) be the number of theft claims. You are also provided with the following information:
\(\text{E}(\text{X})=100 ; \text{SD}(\text{X})=25 ; \text{E}(\text{Y})=30 ; \text{SD}(\mathrm{Y})=20\) and \(\operatorname{Cov}(\text{X}, \text{Y})=1000\)
Find the covariance between the accident claims and the insurance coverage.
Solution
Let \(\text{Z}\) be the insurance coverage so that,
$$
\text{Z}=\text{X}+0.90 \text{Y}
$$
We wish to find \(\operatorname{Cov}(\text{X}, \text{Z})=\operatorname{Cov}(\text{X}, \text{X}+0.90 \text{Y})\). At this point, we can apply the properties of covariance, i.e.,
$$
\begin{align}
\operatorname{Cov}(\text{X}, \text{X}+0.90 \text{Y}) &=\operatorname{Cov}(\text{X}, \text{X})+\operatorname{Cov}(\text{X}, 0.90 \text{Y}) \\
&=\operatorname{Var}(\text{X})+0.90 \operatorname{Cov}(\text{X}, \text{Y})=25^{2}+0.90 \times 1,000 \\
&=1,525\end{align}
$$
The covariance between \(\text{X}\) and \(\text{Y}\) is a measure of the strength of the “linear association” or “linear relationship” between the variables.
The covariance can have a positive or a negative sign depending on the relationship between the variables \(\text{X}\) and \(\text{Y}\). When the covariance is positive, it means we have a positive association between the random variables \(X\) and \(Y\), while a negative covariance implies a negative association exists between the variables \(X\) and \(Y\). However, one of the major drawbacks of variance as a measure of linear relationships is that its value depends on the variables’ units of measurement. This can be corrected by computing a measure known as correlation coefficient, a dimensionless(unitless) quantity.
The correlation coefficient \((\text{X}, \text{Y})\), usually written as \(\text{Corr}(\text{X}, \text{Y})\) or \(\rho(\text{X}, \text{Y})\) of two discrete random variables \(X\) and \(Y\), is defined by:
$$
\rho(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sigma_{\text{X}} * \sigma_{\text{Y}}}
$$
\(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\) can be computed from their respective marginal distribution functions.
$$
\text{V}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}
$$
and,
$$
\text{V(Y)}=\text{E}\left(\text{Y}^{2}\right)-[\text{E(Y)}]^{2}
$$
i. The correlation coefficient between any two random variables \(\text{X}\) and \(\text{Y}\) satisfies,
$$
-1 \leq \rho(\text{X}, \text{Y}) \leq 1
$$
ii. Let \(\text{Y}=\text{mX}+\text{c} ; \text{m} \neq 0\), then,
$$
\rho(\text{X, Y})=\left\{\begin{array}{rc}
1, & \text { if } m>0 \\
-1, & \text { if } m<0
\end{array}\right.
$$
The correlation coefficient is a measure of the degree of linearity between two random variables, \(\text{X}\) and \(\text{Y}\). A value of \(\rho\) near \(+1\) or \(-1\) indicates a high degree of linearity between \(\text{X}\) and \(\text{Y}\), whereas a value near 0 indicates that such linearity is absent. A positive value of \(\rho\) indicates that \(\text{Y}\) tends to increase when \(\text{X}\) does, whereas a negative value indicates that \(\text{Y}\) tends to decrease when \(\text{X}\) increases. If \(\rho=\mathbf{0}\), then \(\text{X}\) and \(\text{Y}\) are said to be uncorrelated.
An actuary wishes to determine the number of accidents in two neighboring towns, \(M\) and \(N\). Let \(X\) be the number of accidents in town \(M\), and let \(Y\) be the number of accidents in town \(N\). The actuary has established that \(X\) and \(Y\) are jointly distributed as in the table below:
$$\begin{array}{c|c|c|c}
\mathrm{X} & 0 & 1 & 2 \\
\hline 1 & 0.1 & 0.1& 0 \\
\hline 2 & 0.1& 0.1 & 0.2 \\
\hline 3 & 0.2 & 0.1& 0.1\\
\end{array}$$
Calculate \(\operatorname{Cov}(\text{X, Y)}\)
We will use the formula
$$
\operatorname{Cov(X, Y)}=\text{E[X Y]}-\text{E[X]E[Y]}
$$
Using data from the table,
$$
\begin{align}
\text{E}(\text{XY}) &=\sum_{\text {all } \text{x}} \sum_{\text{all y}} \text{xy} * \text{f}(\text{x}, \text{y}) \\
&=[0 \times 1] \times 0.1+[1 \times 1] \times 0.1+\cdots+[2 \times 3] \times 0.1=2
\end{align}
$$
The (marginal) probability mass function of \(\text{X}\) is:
$$\begin{array}{c|c|c|c}\text{X}& 0 & 1 & 2 \\
\hline \text{P}(\text{X}=\text{X}) & 0.4 & 0.3 & 0.3 \\
\end{array}$$
Thus,
$$
\text{E}(\text{X})=0 \times 0.4+1 \times 0.3+2 \times 0.3=0.9
$$
The (marginal) probability mass function of \(\text{Y}\) is:
Thus,
$$\begin{array}{c|c|c|c}\mathrm{Y}& 1 & 2 & 3 \\
\hline \mathbf{P}(\mathrm{Y}=\mathrm{y})& 0.2& 0.4 & 0.4 \\
\end{array}$$
$$
\text{E(Y)}=1 \times 0.2+2 \times 0.4+3 \times 0.4=2.2
$$
Hence,
$$
\operatorname{Cov}(\text{X, Y})=2-0.9 \times 2.2=0.02
$$
Using the results in Example 1 above, find the correlation coefficient between \(\text{X}\) and \(\text{Y}\).
We know that,
$$
\operatorname{Corr}(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}
$$
Using the respective marginal distributions, we can calculate \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). i.e.,
$$
\begin{align}
\operatorname{Var}(\text{X}) &=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2} \\
&=\left[0^{2} \times 0.4+1^{2} \times 0.3+2^{2} \times 0.3\right]-0.9^{2}=0.69\end{align}
$$
Similarly,
$$
\begin{align}
\operatorname{Var}(\text{Y}) &=\text{E}\left(\text{Y}^{2}\right)-[\text{E}(\text{Y})]^{2} \\
&=\left[1^{2} \times 0.2+2^{2} \times 0.4+3^{2} \times 0.4\right]-2.2^{2}=0.56\end{align}
$$
Therefore,
$$
\operatorname{Corr}(X, Y)=\frac{0.02}{\sqrt{0.69 \times 0.56}} \approx 0.03
$$
An actuary wishes to determine the relationship between the yearly number of days of hurricanes on two neighboring coasts, \(A\) and \(B\). Let \(X\) be the yearly number of days of hurricanes on coast \(A\), and let \(Y\) be the yearly number of days of hurricanes on coast \(B\). \(X\) and \(Y\) are jointly distributed as:
$$
\text{f(x, y)}=\frac{1}{33}(x+2 y) \quad x=1,2 \quad y=1,2,3
$$
Compute \(\operatorname{Corr}(\text{X}, \text{Y})\)
First, we need,
$$
\begin{align}
\text{E}(\text{XY})=& \sum_{\text {all } \text{x}} \sum_{\text{all} \text{y}} \text{xy} \text{f}(\text{x}, \text{y})=\sum_{\text{x}=1}^{2} \sum_{\text{y}=1}^{3} \text{xy} \frac{\text{x}+2 \text{y}}{33} \\
=&(1)(1) \frac{(1)+2(1)}{33}+(1)(2) \frac{(1)+2(2)}{33}+(1)(3) \frac{(1)+2(3)}{33}+(2)(1) \frac{(2)+2(1)}{33} \\
&+(2)(2) \frac{(2)+2(2)}{33}+(2)(3) \frac{(2)+2(3)}{33} \\
=&(1) \frac{3}{33}+(2) \frac{5}{33}+(3) \frac{7}{33}+(2) \frac{4}{33}+(4) \frac{6}{33}+(6) \frac{8}{33}=\frac{38}{11}
\end{align}
$$
We also need \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). As such, we need to find the marginal probability mass functions for \(\text{X}\) and \(\text{Y}\) first.
The marginal probability mass function of \(\text{X}\) is given by:
$$
\text{f(x)}=\sum_{\text{y=1}}^{3} \frac{1}{33}(\text{x}+2 \text{y})=\frac{\text{x}+2(1)}{33}+\frac{\text{x}+2(2)}{33}+\frac{\text{x}+2(3)}{33}=\frac{3\text{x}+12}{33}, \quad \text{ for x=1,2}
$$
Now,
$$
\text{E(X)}=\sum_{\text {all x}} \text{x} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{2} \text{x} * \frac{3\text{x}+12}{33}=(1) \frac{3(1)+12}{33}+(2) \frac{3(2)+12}{33}=\frac{51}{33}=\frac{17}{11}
$$
and,
$$
\text{E}\left(\text{X}^{2}\right)=\sum_{\text {all x}} \text{x}^{2} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{2} \text{x}^{2} \frac{3 \text{x}+12}{33}=(1)^{2} \frac{3(1)+12}{33}+(2)^{2} \frac{3(2)+12}{33}=\frac{87}{33}=\frac{29}{11}
$$
Thus,
$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}=\frac{29}{11}-\left(\frac{17}{11}\right)^{2}=\frac{30}{121}
$$
Similarly, the marginal probability mass function for \(\text{Y}\) is given by:
$$
\text{f}_{\text{Y}}(\text{y})=\sum_{\text{x=1}}^{2} \frac{1}{33}(\text{x}+2 y)=\frac{(1)+2 y}{33}+\frac{(2)+2\text{y}}{33}=\frac{4\text{y}+3}{33}, \text{for y=1,2,3}
$$
The mean and the variance of \(Y\) can be calculated as follows:
$$
\text{E(Y)}=\sum_{\text{y=1}}^{3} \text{y} \frac{\text{4y}+3}{33}=(1) \frac{4(1)+3}{33}+(2) \frac{4(2)+3}{33}+(3) \frac{4(3)+3}{33}=\frac{74}{33}
$$
and,
$$
\text{E}\left(\text{Y}^{2}\right)=\sum_{\text{y}=1}^{3} \text{y}^{2} \frac{4 \text{y}+3}{33}=(1)^{2} \frac{4(1)+3}{33}+(2)^{2} \frac{4(2)+3}{33}+(3)^{2} \frac{4(3)+3}{33}=\frac{62}{11}
$$
Thus,
$$
\operatorname{Var}(\text{Y})=\text{E}\left(\text{Y}^{2}\right)-[\text{E}(\text{Y})]^{2}=\frac{62}{11}-\left(\frac{74}{33}\right)^{2}=\frac{662}{1089}
$$
The covariance of \(\text{X}\) and \(\text{Y}\) is
$$
\operatorname{Cov}(\text{X, Y})=\text{E(X Y)}-\text{E(X)}{E(Y)}=\frac{38}{11}-\frac{17}{11} \times \frac{62}{33}=-\frac{4}{363}
$$
Hence,
$$
\rho(X, Y)=\frac{\operatorname{Cov}(X, Y)}{\sqrt{\operatorname{var}(X) \operatorname{var}(Y)}}=\frac{-\frac{4}{363}}{\sqrt{\frac{662}{1089} * \frac{30}{121}}}=-0.02838
$$
Let \(X\) be the number of days of sickness for patient \(A\) and let \(Y\) be the number of days of sickness for patient B. \(X\), and \(Y\) is jointly distributed as:
$$
\text{f(x, y)}=\text{c}\left(\text{x}^{2}+3\text{y}\right) \quad \text{x=1,2,3,4}; \quad \text{y=1,2}
$$
Determine \(\rho(\text{X}, \text{Y})\)
First, we need to find the value of \(\text{c}\) and then proceed to determine the marginal functions.
We know that:
$$
\begin{aligned}& \sum_{\text{x}} \sum_{\text{y}} \text{f(x, y)}=1 \\&
\Rightarrow c\left(1^{2}+3(1)\right)+c\left(1^{2}+3(2)\right)+\cdots+c\left(4^{2}+3(2)\right)\\
&\Rightarrow 4 \mathrm{c}+7 \mathrm{c}+7 \mathrm{c}+10 \mathrm{c}+12 \mathrm{c}+15 \mathrm{c}+19 \mathrm{c}+22=1 \\
&\therefore \mathrm{c}=\frac{1}{96}
\end{aligned}
$$
We know that,
$$
\rho(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}
$$
We need to compute \(\operatorname{Cov}(\text{X}, \text{Y}), \operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\).
Now,
$$
\begin{align}
\text{E(X Y)} &=\sum_{\text{x}=1}^{4} \sum_{\text{y}=1}^{2} \text{xy} \frac{\text{x}^{2}+3\text{y}}{96} \\&=(1)(1) \frac{4}{96}+(1)(2) \frac{7}{96}+(2)(1) \frac{7}{96}+(2)(2) \frac{10}{96}+(3)(1) \frac{12}{96}+(3)(2) \frac{15}{96} \\
&+(4)(1) \frac{19}{96}+(4)(2) \frac{22}{96}=\frac{75}{16}
\end{align}
$$
We also need \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). As such, we need to find the marginal probability mass functions for \(\text{X}\) and \(\text{Y}\) before we proceed:
$$
\begin{gather}
\text{f}_{\text{X}}(\text{x})=\frac{2 \text{x}^{2}+9}{96}, \text { for } \text{x}=1,2,3,4 \text{ and } \text{f}_{\text{Y}}(\text{y})=\frac{12 \text{y}+30}{96}, \text { for } \text{y}=1,2 \\\\
\Rightarrow \text{E}(\text{X})=\sum_{\text{x}=1}^{4} \text{x} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{4} \text{x} \frac{2 \text{x}^{2}+9}{96}=(1) \frac{11}{96}+(2) \frac{17}{96}+(3) \frac{27}{96}+(4) \frac{41}{96}=\frac{145}{48}
\end{gather}
$$
and,
$$
\begin{aligned}
\operatorname{Var}(\text{X}) &=\sum_{\text{x}=1}^{4} \text{x}^{2} \text{f}_{\text{X}}(\text{x})-[\text{E}(\text{X})]^{2}=\sum_{\text{x}=1}^{4} \text{x}^{2} \frac{2 \text{x}^{2}+9}{96}-\left(\frac{145}{48}\right)^{2} \\
&=(1)^{2} \frac{11}{96}+(2)^{2} \frac{17}{96}+(3)^{2} \frac{27}{96}+(4)^{2} \frac{41}{96}-\left(\frac{145}{48}\right)^{2}=\frac{163}{16}-\left(\frac{145}{48}\right)^{2}=1.062
\end{aligned}
$$
Similarly,
$$
\text{E}(\text{Y})=\sum_{\text{y}=1}^{2} \text{y} * \text{f}_{\text{Y}}(\text{y})=\sum_{\text{y}=1}^{2} \text{y} \frac{12 \text{y}+30}{96}=(1) \frac{42}{96}+(2) \frac{54}{96}=\frac{25}{16}
$$
and,
$$
\operatorname{Var}(\text{Y})=\sum_{\text{y}=1}^{2} \text{y}^{2} \frac{12 \text{y}+30}{96}-\left(\frac{25}{16}\right)^{2}=(1)^{2} \frac{42}{96}+(2)^{2} \frac{54}{96}-\left(\frac{25}{16}\right)^{2}=\frac{63}{256}
$$
Therefore,
$$
\operatorname{Cov}(\text{X, Y})=\text{E(X Y)}-\text{E(X)E(Y)}=\frac{75}{16}-\left(\frac{145}{48}\right)\left(\frac{25}{16}\right)=-\frac{25}{768}
$$
And lastly,
$$
\rho(\text{X, Y})=-\frac{\frac{25}{768}}{\sqrt{1.062 \times\left(\frac{63}{256}\right)}}=-0.0637
$$
Learning Outcome
Topic 3. e: Multivariate Random Variables – Calculate joint moments, such as the covariance and the correlation coefficient.