Calculate variance and standard deviat ...
Variance and Standard Deviation for Conditional Discrete Distributions Recall that, in the previous... Read More
Let \(X\) and \(Y\) be two discrete random variables, with a joint probability mass function, \(f\left(x, y\right)\). Then, the random variables \(X\) and \(Y\) are said to be independent if and only if,
$$ f\left(x,\ y\right)=f\left(x\right)\times f\left(y\right),\ \ \ \ \ \text{for all } x, y $$
However, if the above condition does not hold, then the two random variables, \(X\) and \(Y\), are dependent.
In this reading, we will base our discussion on the relationship between two or more discrete random variables. If we have two dependent discrete random variables, \(X\) and \(Y\), we would wish to establish how one varies with respect to the other. If \(X\) increases, for example, does \(Y\) also tend to increase or decrease? And if so, how strong is the dependence between the two? Two measures that can help us answer these questions are covariance and correlation coefficient.
Recall that the variance of a random variable \(X\) is defined as:
$$ \begin{align*} Var\left(X\right) & =E[\left(X-\mu\right)^2] \\
& =E\left[X^2\right]-{(E\left[X\right])}^2 \end{align*} $$
Now, covariance is actually just a generalization of the above definition when we have two random variables.
Let \(X\) and \(Y\) be two discrete non-independent random variables. The covariance of \(X\) and \(Y\) denoted, \(Cov[X,Y]\), is defined by:
$$ Cov\left[X, Y\right]=E[(X-E\left[X\right])(Y-E[Y])] $$
This simplifies to:
$$ Cov\left[X, Y\right]=E\left[XY\right]-E\left[X\right]\times E[Y] $$
Recall that in the univariate case, we defined the variance of a discrete random variable, \(X\), as:
$$ Var\left(X\right)=E\left(X^2\right)-\left[E\left(X\right)\right]^2 $$
Which can be expressed as:
$$ Var\left(X\right)=E\left(X^2\right)-\left[E\left(X\right)\right]^2=E\left[X\times X\right]-E\left[X\right]\times E[X] $$
Thus, we can conclude that,
$$ Var(X)=Cov\left[X, X\right] $$
Note:
The units of \({Cov}({X},{Y})\) are the product of those of \({X}\) and \({Y}\). So, for example, if \(X\) is time in hours, and \(Y\) is a sum of money in $, then Cov is in $hours.
\(E\left[XY\right]\) can be computed directly from the joint probability function, \(f\left(x, y\right)\) while \(E\left[X\right]\) and \(E[Y]\) can be computed using their respective marginal probability functions, i.e.,
$$ \begin{align*}
E\left(XY\right) & =\sum_{all\ x}\sum_{all\ y}{xy\times f\left(x,y\right);} \\
E\left[X\right] & =\sum_{x}{x\times f(x)};\ \text{and} \\
E\left[Y\right] & =\sum_{y}{y\times f\left(y\right) } \end{align*} $$
Let \(X\), \(Y\), and \(Z\) be random variables, and let \(a\), \(b\), and \(c\) be non-zero constants.
Then, the following properties hold:
A motor insurance company covers 100% of losses that occur due to accident and only 90% of losses that occur due to theft. Let \(X\) be the number of accident claims, and let \(Y\) be the number of theft claims. You are also provided with the following information:
$$ E\left(X\right)=100; SD\left(X\right)=25; E\left(Y\right)=30; SD\left(Y\right)=20 \text{ and } Cov\left(X, Y\right)=1000 $$
Find the covariance between the accident claims and the insurance coverage.
Solution
Let \(Z\) be the insurance coverage so that,
$$ Z=X+0.90Y $$
We wish to find \(Cov\left(X, Z\right)=Cov\left(X, X+0.90Y\right)\). At this point, we can apply the properties of covariance, i.e.,
$$ \begin{align*}
Cov\left(X, X+0.90Y\right) & =Cov\left(X, X\right)+Cov\left(X, 0.90Y\right)\\
& =Var\left(X\right)+0.90Cov\left(X, Y\right)={25}^2+0.90\times1,000 \\
& =1,525
\end{align*} $$
The covariance between \(X\) and \(Y\) is a measure of the strength of the “linear association” or “linear relationship” between the variables.
The covariance can have a positive or a negative sign depending on the relationship between the variables \(X\) and \(Y\). When the covariance is positive, it means we have a positive association between the random variables \(X\) and \(Y\), while a negative covariance implies a negative association exists between the variables \(X\) and \(Y\).
However, one of the major drawbacks of variance as a measure of linear relationships is that its value depends on the variables’ units of measurement. This can be corrected by computing a measure known as correlation coefficient, a dimensionless(unitless) quantity.
Remarks:
$$ Var\left(X+Y\right)=Cov\left(X+Y,X+Y\right) $$
But by algebra, we know that:
$$ \begin{align*}
\left(X+Y\right)\left(X+Y\right) & =X^2+2XY+Y^2 \\
\Rightarrow Var\left(X+Y\right) & =Cov\left(X+Y,X+Y\right)\\ & =Cov\left(X, X\right)+2Cov\left(X, Y\right)+Cov(Y,Y) \\
& =Var(X) +2Cov\left(X, Y\right)+Var(Y)
\end{align*} $$
Now, if \(X\) and \(Y\) are independent, \(Cov\left(X,Y\right)=0\)
$$ \Rightarrow Var\left(X+Y\right)=Var\left(X\right)+Var(Y) $$
Again,
$$ Var\left(X-Y\right)=Cov\left(X-Y,X-Y\right) $$
But by algebra, we know that,
$$ \begin{align*}
\left(X-Y\right)\left(X-Y\right) & =X^2-2XY+Y^2 \\
\Rightarrow Var\left(X-Y\right) & =Cov\left(X-Y,X-Y\right) \\ & =Cov\left(X,X\right)-2Cov\left(X,Y\right)+Cov(Y,Y) \\
& =Var\left(X\right)-2Cov\left(X,Y\right)+Var\left(Y\right)
\end{align*} $$
But, if \(X\) and \(Y\) are independent, \(Cov\left(X,Y\right)=0\)
and,
$$ \Rightarrow Var\left(X-Y\right)=Cov(X-Y, X-Y) $$
But by algebra,
$$ \left(X-Y\right)\left(X-Y\right)=X^2-2XY+Y^2 $$
So that,
$$ Var\left(X-Y\right)=Cov\left(X-Y,X-Y\right)=Cov\left(X,X\right)-2Cov\left(X,Y\right)+Cov(Y,Y) $$
Again, if \(X\) and \(Y\) are independent, \(Cov\left(X,Y\right)=0\)
$$ \Rightarrow Var\left(X-Y\right)=Var\left(X\right)+Var(Y) $$
The correlation coefficient \((X, Y)\), usually written as \(Corr(X, Y)\) or \(\rho(X, Y)\) of two discrete random variables \(X\) and \(Y\), is defined by:
$$ \rho\left(X, Y\right)=\frac{Cov(X,Y)}{\sqrt{Var\left(X\right)\times Var\left(Y\right)}}=\frac{Cov(X,Y)}{\sigma_X\times \sigma_Y},\ \ \ \ -1\le\rho\left(X, Y\right)\le 1 $$
\(Var\left(X\right)\) and \(Var\left(Y\right)\) can be computed from their respective marginal distribution functions.
$$ V\left(X\right)=E\left(X^2\right)-\left[E\left(X\right)\right]^2 $$
and,
$$ V\left(Y\right)=E\left(Y^2\right)-\left[E\left(Y\right)\right]^2 $$
Properties of Correlation:
Note:
The correlation coefficient is a measure of the degree of linearity between two random variables, \(X\) and \(Y\). A value of \(\rho\) near +1 or -1 indicates a high degree of linearity between \(X\) and \(Y\), whereas a value near 0 indicates that such linearity is absent. A positive value of \(\rho\) indicates that \(Y\) tends to increase when \(X\) does, whereas a negative value indicates that \(Y\) tends to decrease when \(X\) increases. If \(\rho={0}\), then \(X\) and \(Y\) are said to be uncorrelated.
$$ \rho_{X,Y}=\frac{Cov\left(X,Y\right)}{\sigma_X\cdot \sigma_Y},\ \ {-1\le \rho}_{X,Y} \le1 $$
An actuary analyzes a company’s annual personal auto claims, \(M\), and annual commercial auto claims, \(N\). The analysis reveals that \(Var\left(M\right)=1600\), \(Var\left(N\right)=900\), and the correlation between \(M\) and \(N\) is 0.64.
Calculate \(Var\left(M+N\right)\).
Solution:
We know that
$$ \begin{align*} Var\left(M+N\right) & =Var\left(M\right)+2Cov(M,N)\ +Var\left(N\right) \\
& =1600+2Cov\left(M,N\right)+900=2500+2Cov\left(M,N\right) \end{align*} $$
But we know that,
$$ \begin{align*} \rho_{M,N} & =\frac{Cov(M,N)}{\sigma_M\sigma_N} \ \Rightarrow 0.64=\frac{Cov(M,N)}{(40)(30)} \Rightarrow Cov\left(M,N\right)=768 \\
\therefore Var\left(M+N\right) & =2500+2\left(768\right)=4036
\end{align*} $$
An actuary wishes to determine the number of accidents in two neighboring towns, \(M\) and \(N\). Let \(X\) be the number of accidents in town \(M\), and let \(Y\) be the number of accidents in town \(N\). The actuary has established that \(X\) and \(Y\) are jointly distributed, as in the table below:
$$ \begin{array}{c|c|c|c|c} {\begin{matrix} X \\ \huge{\diagdown} \\ Y \end{matrix}} & {0} & {1} & {2} \\ \hline {1} & {0.1} & {0.1} & {0} \\ \hline {2} & {0.1} & {0.1} & {0.2} \\ \hline {3} & {0.2} & {0.1} & {0.1} \end{array} $$
Calculate \(Cov(X, Y)\)
Solution
We will use the formula
$$ Cov \left(X,\ Y\right)=E\left[XY\right]-E\left[X\right]E\left[Y\right] $$
Using data from the table,
$$ \begin{align*} E\left(XY\right) & =\sum_{all\ x}\sum_{all\ y}{xy\times f(x,y)} \\ & =\left[0\times1\right]\times0.1+\left[1\times1\right]\times0.1+\ldots+[2\times3] \times 0.1=2 \end{align*} $$
The (marginal) probability mass function of \(X\) is:
$$ \begin{array}{c|c|c|c}
X & 0 & 1 & 2 \\ \hline
{P}(X=x) & 0.4 & 0.3 & 0.3
\end{array} $$
Thus,
$$ E\left(X\right)=0\times0.4+1\times0.3+2\times0.3=0.9 $$
The (marginal) probability mass function of \(Y\) is:
$$ \begin{array}{c|c|c|c}
Y & 1 & 2 & 3 \\ \hline
{P}(Y=y) & 0.2 & 0.4 & 0.4
\end{array} $$
Thus,
$$ E\left(Y\right)=1\times0.2+2\times0.4+3\times0.4=2.2 $$
Hence,
$$ Cov\left(X,\ Y\right)=2-0.9\times2.2=0.02 $$
Using the results in Example 1 above, find the correlation coefficient between \(X\) and \(Y\).
Solution
We know that,
$$
Corr\left(X, Y\right)=\frac{Cov\left(X, Y\right)}{\sqrt{Var\left(X\right)\times Var\left(Y\right)}} $$
Using the respective marginal distributions, we can calculate \(Var(X)\) and \(Var(Y)\). i.e.,
$$ \begin{align*}
Var\left(X\right) & =E\left(X^2\right)-\left[E\left(X\right)\right]^2 \\
& =\left[0^2\times0.4+1^2\times0.3+2^2\times0.3\right]-{0.9}^2=0.69 \end{align*} $$
Similarly,
$$ \begin{align*}
Var\left(Y\right) & =E\left(Y^2\right)-\left[E\left(Y\right)\right]^2 \\
& =\left[1^2\times0.2+2^2\times0.4+3^2\times0.4\right]-{2.2}^2=0.56
\end{align*} $$
Therefore,
$$ Corr\left(X,\ Y\right)=\frac{0.02}{\sqrt{0.69\times0.56}}\approx0.03 $$
An actuary wishes to determine the relationship between the yearly number of days of hurricanes on two neighboring coasts, \(A\) and \(B\). Let \(X\) be the yearly number of days of hurricanes on coast \(A\), and let \(Y\) be the yearly number of days of hurricanes on coast B. \(X\), and \(Y\) are jointly distributed as:
$$ f\left(x,y\right)=\frac{1}{33}\left(x+2y\right)\ \ \ \ \ \ \ x=1,2\ \ \ \ y=1,2,3. $$
Compute \(Corr\left(X, Y\right)\).
Solution
First, we need,
$$ \begin{align*}
E\left(XY\right) & =\sum_{\text{all } x}\sum_{\text{all } y}{xy \ f\left(x,y\right)}=\sum_{x=1}^{2}\sum_{y=1}^{3}{xy\frac{x+2y}{33}} \\
& =\left(1\right)\left(1\right)\frac{\left(1\right)+2\left(1\right)}{33}+\left(1\right)\left(2\right)\frac{\left(1\right)+2\left(2\right)}{33}+\left(1\right)\left(3\right)\frac{\left(1\right)+2\left(3\right)}{33} \\ & +\left(2\right)\left(1\right)\frac{\left(2\right)+2\left(1\right)}{33}+\left(2\right)\left(2\right)\frac{\left(2\right)+2\left(2\right)}{33} +\left(2\right)\left(3\right)\frac{\left(2\right)+2\left(3\right)}{33} \\
& =\left(1\right)\frac{3}{33}+\left(2\right)\frac{5}{33} +\left(3\right)\frac{7}{33}+\left(2\right)\frac{4}{33} +\left(4\right)\frac{6}{33}+\left(6\right)\frac{8}{33} \\ & =\frac{38}{11}
\end{align*} $$
We also need \(Var(X)\) and \(Var(Y)\). As such, we need to find the marginal probability mass functions for \(X\) and \(Y\) first.
The marginal probability mass function of X is given by:
$$ \begin{align*} f\left(x\right) & =\sum_{y=1}^{3}{\frac{1}{33}\left(x+2y\right)} \\ & =\frac{x+2\left(1\right)}{33}+\frac{x+2\left(2\right)}{33}+\frac{x+2\left(3\right)}{33} \\ & =\frac{3x+12}{33}, \text{ for } x=1, 2
\end{align*} $$
Now,
$$ \begin{align*}
E\left(X\right) & =\sum_{\text{all } x}{x{\times f}_X\left(x\right)} \\ & =\sum_{x=1}^{2}{x\times \frac{3x+12}{33}} \\ & =\left(1\right)\frac{3\left(1\right)+12}{33}+\left(2\right)\frac{3\left(2\right)+12}{33} \\ & =\frac{51}{33}=\frac{17}{11}
\end{align*} $$
and,
$$ \begin{align*}
E\left(X^2\right) & =\sum_{\text{all } x}{x^2{\times f}_X\left(x\right)} \\ & =\sum_{x=1}^{2}{x^2\frac{3x+12}{33}} \\ & =\left(1\right)^2\frac{3\left(1\right)+12}{33}+\left(2\right)^2\frac{3\left(2\right)+12}{33} \\ & =\frac{87}{33}=\frac{29}{11}
\end{align*} $$
Thus,
$$ Var\left(X\right)=E\left(X^2\right)-\left[E\left(X\right)\right]^2=\frac{29}{11}-\left(\frac{17}{11}\right)^2=\frac{30}{121} $$
Similarly, the marginal probability mass function for \(Y\) is given by:
$$ f_Y\left(y\right)=\sum_{x=1}^{2}{\frac{1}{33}\left(x+2y\right)}=\frac{\left(1\right)+2y}{33}+\frac{\left(2\right)+2y}{33}=\frac{4y+3}{33},\ \ \ \text{for } y=1,\ 2,\ 3 $$
The mean and the variance of \(Y\) can be calculated as follows:
$$ \begin{align*}
E\left(Y\right) & =\sum_{y=1}^{3}{y\frac{4y+3}{33}} \\ & =\left(1\right)\frac{4\left(1\right)+3}{33}+\left(2\right)\frac{4\left(2\right)+3}{33}+\left(3\right)\frac{4\left(3\right)+3}{33} \\ & =\frac{74}{33} \end{align*} $$
and,
$$ \begin{align*}
E\left(Y^2\right) & =\sum_{y=1}^{3}{y^2\frac{4y+3}{33}} \\ & = \left(1\right)^2\frac{4\left(1\right)+3}{33}+\left(2\right)^2\frac{4\left(2\right)+3}{33}+\left(3\right)^2\frac{4\left(3\right)+3}{33} \\ & =\frac{62}{11}
\end{align*} $$
Thus,
$$ Var\left(Y\right)=E\left(Y^2\right)-\left[E\left(Y\right)\right]^2=\frac{62}{11}-\left(\frac{74}{33}\right)^2=\frac{662}{1089} $$
The covariance of \(X\) and \(Y\) is
$$ Cov\left(X, Y\right)=E\left(XY\right)-E\left(X\right)E\left(Y\right)=\frac{38}{11}-\frac{17}{11}\times\frac{62}{33}=-\frac{4}{363} $$
Hence,
$$ \rho\left(X, Y\right)=\frac{Cov\left(X, Y\right)}{\sqrt{var\left(X\right)var\left(Y\right)}}=\frac{-\frac{4}{363}}{\sqrt{\frac{662}{1089}\times \frac{30}{121}}}=-0.02838 $$
Let \(X\) be the number of days of sickness for patient \(A\) and let \(Y\) be the number of days of sickness for patient \(B\). \(X\) and \(Y\) is jointly distributed as:
$$ f\left(x,y\right)=c\left(x^2+3y\right)\ \ \ \ \ \ x=1,2,3,4,\ \ \ y=1, 2 $$
Determine \(\rho\left(X, Y\right)\).
Solution:
First, we need to find the value of \(c\) and then proceed to determine the marginal functions.
We know that:
$$ \sum_{x}\sum_{y}{f(x, y)}=1 $$
$$ \begin{align*}
\Rightarrow & c\left(1^2+3\left(1\right)\right)+c\left(1^2+3\left(2\right)\right)+\ldots+c\left(4^2+3\left(2\right)\right) \\
\Rightarrow & 4c+7c+7c+10c+12c+15c+19c+22=1 \\
\therefore & c =\frac{1}{96}
\end{align*} $$
We know that,
$$ \rho\left(X, Y\right)=\frac{Cov\left(X, Y\right)}{\sqrt{Var\left(X\right)\times Var\left(Y\right)}} $$
We need to compute \(Cov\left(X, Y\right)\), \(Var\left(X\right)\) and \(Var\left(Y\right)\).
Now,
$$ \begin{align*}
E\left(XY\right) & =\sum_{x=1}^{4}\sum_{y=1}^{2}{xy\frac{x^2+3y}{96}} \\
& =\left(1\right)\left(1\right)\frac{4}{96}+\left(1\right)\left(2\right)\frac{7}{96}+\left(2\right)\left(1\right)\frac{7}{96}+\left(2\right)\left(2\right)\frac{10}{96}+\left(3\right)\left(1\right)\frac{12}{96} \\ & +\left(3\right)\left(2\right)\frac{15}{96} +\left(4\right)\left(1\right)\frac{19}{96}+\left(4\right)\left(2\right)\frac{22}{96}=\frac{75}{16}
\end{align*} $$
We also need \(Var(X)\) and \(Var(Y)\). As such, we need to find the marginal probability mass functions for \(X\) and \(Y\) before we proceed:
$$ f_X\left(x\right)=\frac{2x^2+9}{96},\ \text{ for } x=1, 2, 3, 4 \text{ and } f_Y\left(y\right)=\frac{12y+30}{96}, \text{ for } y=1, 2 $$
$$ \begin{align*} \Rightarrow E\left(X\right) & =\sum_{x=1}^{4}{x{\times f}_X\left(x\right)} \\ & =\sum_{x=1}^{4}{x\frac{2x^2+9}{96}}=\left(1\right)\frac{11}{96}+\left(2\right)\frac{17}{96}+\left(3\right)\frac{27}{96}+\left(4\right)\frac{41}{96} \\ & =\frac{145}{48} \end{align*} $$
and,
$$ \begin{align*}
Var\left(X\right) & =\sum_{x=1}^{4}{x^2f_X\left(x\right)-\left[E\left(X\right)\right]^2} =\sum_{x=1}^{4}{x^2\frac{2x^2+9}{96}}-\left(\frac{145}{48}\right)^2 \\ & =\left(1\right)^2\frac{11}{96}+\left(2\right)^2\frac{17}{96}+\left(3\right)^2\frac{27}{96}+\left(4\right)^2\frac{41}{96}-\left(\frac{145}{48}\right)^2 \\ & =\frac{163}{16}-\left(\frac{145}{48}\right)^2=1.062
\end{align*} $$
Similarly,
$$ \begin{align*} E\left(Y\right) & =\sum_{y=1}^{2}{y\times f_Y\left(y\right)} \\ & =\sum_{y=1}^{2}{y\frac{12y+30}{96} }\\ & ={\left(1\right)\frac{42}{96}+\left(2\right)\frac{54}{96} } \\ & =\frac{25}{16} \end{align*} $$
and,
$$ \begin{align*} Var(Y)& =\sum_{y=1}^{2}{y^2\frac{12y+30}{96}-\left(\frac{25}{16}\right)^2} \\ & =\left(1\right)^2\frac{42}{96}+\left(2\right)^2\frac{54}{96}-\left(\frac{25}{16}\right)^2 \\ & =\frac{63}{256} \end{align*} $$
Therefore,
$$ \begin{align*} Cov\left(X, Y\right) & =E\left(XY\right)-E\left(X\right)E\left(Y\right) \\ & =\frac{75}{16}-\left(\frac{145}{48}\right)\left(\frac{25}{16}\right) \\ & =-\frac{25}{768} \end{align*} $$
And lastly,
$$ \rho\left(X, Y\right)=-\frac{\frac{25}{768}}{\sqrt{1.062\cdot \left(\frac{63}{256}\right)}}=-0.0637 $$
Question
A technology company offers a warranty that covers both hardware failures and software issues within the first year. The variance of the number of hardware failures reported is 9. The variance of the number of software issues reported is 4. The covariance between the number of hardware failures and software issues is -2.
Calculate the variance of the total number of hardware and software issues reported under this warranty.
- 12
- 10
- 9
- 6
- 13
Solution
The correct answer is C.
To calculate the variance of the total number of issues (both hardware and software) reported, we use the formula for the variance of the sum of two dependent random variables:
$$ \begin{align*} Var(\text{Total}) & = Var(\text{Hardware}) + Var(\text{Software}) \\ & + 2 \cdot Cov(\text{Hardware},\text{ Software}) \end{align*} $$
We are given:
\(Var(\text{Hardware}) = 9\)
\(Var(\text{Software}) = 4\)
\(Cov(\text{Hardware}, \text{ Software})=−2Cov(\text{Hardware}, \text{ Software})= −2\)
Plugging these values into the formula, we get:
$$ Var (\text{Total}) = 9 + 4 +2(-2) = 9 $$
Therefore, the variance of the total number of warranty issues reported is 9.
Learning Outcome
Topic 3. e: Multivariate Random Variables – Calculate joint moments, such as the covariance and the correlation coefficient.