Calculate joint moments, such as the covariance and the correlation coefficient for discrete random variables only

Calculate joint moments, such as the covariance and the correlation coefficient for discrete random variables only

Let \(\text{X}\) and \(\text{Y}\) be two discrete random variables, with a joint probability mass function, \(\text{f}(\text{x}, \text{y})\). Then, the random variables \(\text{X}\) and \(\text{Y}\) are said to be independent if and only if,
$$
\text{f}(\text{x}, \text{y})=\text{f}(\text{x}) * \text{f}(\text{y}), \quad \text { for all } \text{x}, \text{y}
$$
However, if the above condition does not hold, then the two random variables \(\text{X}\) and \(\text{Y}\) are dependent.

In this reading, we will base our discussion on the relationship between two or more discrete random variables. If we have two dependent discrete random variables, \(X\) and \(Y\), we would wish to establish how one varies with respect to the other. If \(\text{X}\) increases, for example, does \(\text{Y}\) also tend to increase or decrease? And if so, how strong is the dependence between the two? Two measures that can help us answer these questions are covariance and correlation coefficient.

Covariance

Let \(\text{X}\) and \(\text{Y}\) be two discrete non-independent random variables. The covariance of \(\text{X}\) and \(\text{Y}\) denoted, \(\operatorname{Cov}[\text{X}, \text{Y}]\), is defined by:
$$
\operatorname{Cov}[\text{X}, \text{Y}]=\text{E}[(\text{X}-\text{E}[\text{X}])(\text{Y}-\text{E}[\text{Y}])]
$$
This simplifies to:
$$
\operatorname{Cov}[\text{X}, \text{Y}]=\text{E}[\text{XY}]-\text{E}[\text{X}] * \text{E}[\text{Y}]
$$
Recall that in the univariate case, we defined the variance of a discrete random variable, \(\text{X}\), as:

$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}
$$
Which can be expressed as:
$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}=\text{E}[\text{X} * \text{X}]-\text{E}[\text{X}] * \text{E}[\text{X}]
$$

Thus, we can conclude that,
$$
\operatorname{Var}(\text{X})=\operatorname{Cov}[\text{X}, \text{X}]
$$

Note:

The units of \(\operatorname{Cov}(\text{X}, \text{Y})\) are the product of those of \(\text{X}\) and \(\text{Y}\). So, for example, if \(X\) is time in hours, and \(\mathrm{Y}\) is a sum of money in \(\$\), then Cov is in \(\$\) hours.

\(\text{E}[\text{XY}]\) can be computed directly from the joint probability function, \(\text{f}(\text{x}, \text{y})\) while \(\text{E}[\text{X}]\) and \(\text{E}[\text{Y}]\) can be computed using their respective marginal probability functions, i.e.,
$$
\begin{align}
\text{E(X Y)} &=\sum_{\text {all } \text {x }\text { all y}} \sum_{\text{y}} * \text{f(x, y)} \\
\text{E[X]} &=\sum_{\text{x}} \text{x} *\text{ f(x)} ; \text { and } \\
\text{E[Y]} &=\sum_{\text{y}} \text{y} * \text{f(y)}
\end{align}
$$

Properties of Covariance

Let \(\text{X}, \text{Y}\), and \(\text{Z}\) be random variables and let \(\text{a}, \text{b}\), and \(\text{c}\) be non-zero constants.

Then, the following properties hold:

i. \(\operatorname{Cov}(\text{X}, \text{Y})=\operatorname{Cov}(\text{Y}, \text{X})\)

ii. \(\operatorname{Cov}(\text{X}, \text{X})=\operatorname{Var}(\text{X})\)

iii. \(\operatorname{Cov}(a X, b Y)=\operatorname{abCov}(X, Y)\)

iv. \(\operatorname{Cov}[a X+b, c Y+d]=a c \cdot \operatorname{Cov}[X, Y]\)

v. \(\operatorname{Cov}[\text{X}, \text{Y}+\text{Z}]=\operatorname{Cov}[\text{X}, \text{Y}]+\operatorname{Cov}[\text{X}, \text{Z}]\)

vi. If \(X\) and \(Y\) are independent, \(\operatorname{Cov}[X, Y]=0\)

vii. \(\operatorname{Cov}(\text{X}, \text{c})=\text{E}[(\text{X}-\text{E}(\text{X}))(\text{c}-\text{c})]=\text{E}(0)=0\)

Example: Properties of Covariance

A motor insurance company covers \(100 \%\) of losses that occur due to accident and only \(90 \%\) of losses that occur due to theft. Let \(X\) be the number of accident claims, and let \(Y\) be the number of theft claims. You are also provided with the following information:

\(\text{E}(\text{X})=100 ; \text{SD}(\text{X})=25 ; \text{E}(\text{Y})=30 ; \text{SD}(\mathrm{Y})=20\) and \(\operatorname{Cov}(\text{X}, \text{Y})=1000\)

Find the covariance between the accident claims and the insurance coverage.

Solution

Let \(\text{Z}\) be the insurance coverage so that,

$$
\text{Z}=\text{X}+0.90 \text{Y}
$$
We wish to find \(\operatorname{Cov}(\text{X}, \text{Z})=\operatorname{Cov}(\text{X}, \text{X}+0.90 \text{Y})\). At this point, we can apply the properties of covariance, i.e.,

$$
\begin{align}
\operatorname{Cov}(\text{X}, \text{X}+0.90 \text{Y}) &=\operatorname{Cov}(\text{X}, \text{X})+\operatorname{Cov}(\text{X}, 0.90 \text{Y}) \\
&=\operatorname{Var}(\text{X})+0.90 \operatorname{Cov}(\text{X}, \text{Y})=25^{2}+0.90 \times 1,000 \\
&=1,525\end{align}
$$
The covariance between \(\text{X}\) and \(\text{Y}\) is a measure of the strength of the “linear association” or “linear relationship” between the variables.

The covariance can have a positive or a negative sign depending on the relationship between the variables \(\text{X}\) and \(\text{Y}\). When the covariance is positive, it means we have a positive association between the random variables \(X\) and \(Y\), while a negative covariance implies a negative association exists between the variables \(X\) and \(Y\). However, one of the major drawbacks of variance as a measure of linear relationships is that its value depends on the variables’ units of measurement. This can be corrected by computing a measure known as correlation coefficient, a dimensionless(unitless) quantity.

Correlation Coefficient

The correlation coefficient \((\text{X}, \text{Y})\), usually written as \(\text{Corr}(\text{X}, \text{Y})\) or \(\rho(\text{X}, \text{Y})\) of two discrete random variables \(X\) and \(Y\), is defined by:
$$
\rho(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sigma_{\text{X}} * \sigma_{\text{Y}}}
$$
\(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\) can be computed from their respective marginal distribution functions.
$$
\text{V}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}
$$
and,
$$
\text{V(Y)}=\text{E}\left(\text{Y}^{2}\right)-[\text{E(Y)}]^{2}
$$

Properties of Correlation Coefficient

i. The correlation coefficient between any two random variables \(\text{X}\) and \(\text{Y}\) satisfies,
$$
-1 \leq \rho(\text{X}, \text{Y}) \leq 1
$$
ii. Let \(\text{Y}=\text{mX}+\text{c} ; \text{m} \neq 0\), then,
$$
\rho(\text{X, Y})=\left\{\begin{array}{rc}
1, & \text { if } m>0 \\
-1, & \text { if } m<0
\end{array}\right.
$$

Note:

The correlation coefficient is a measure of the degree of linearity between two random variables, \(\text{X}\) and \(\text{Y}\). A value of \(\rho\) near \(+1\) or \(-1\) indicates a high degree of linearity between \(\text{X}\) and \(\text{Y}\), whereas a value near 0 indicates that such linearity is absent. A positive value of \(\rho\) indicates that \(\text{Y}\) tends to increase when \(\text{X}\) does, whereas a negative value indicates that \(\text{Y}\) tends to decrease when \(\text{X}\) increases. If \(\rho=\mathbf{0}\), then \(\text{X}\) and \(\text{Y}\) are said to be uncorrelated.

Example 1: Covariance and Correlation Coefficient

An actuary wishes to determine the number of accidents in two neighboring towns, \(M\) and \(N\). Let \(X\) be the number of accidents in town \(M\), and let \(Y\) be the number of accidents in town \(N\). The actuary has established that \(X\) and \(Y\) are jointly distributed as in the table below:

$$\begin{array}{c|c|c|c}
\mathrm{X} & 0 & 1 & 2 \\
\hline 1 & 0.1 & 0.1& 0 \\
\hline 2 & 0.1& 0.1 & 0.2 \\
\hline 3 & 0.2 & 0.1& 0.1\\
\end{array}$$
Calculate \(\operatorname{Cov}(\text{X, Y)}\)

Solution

We will use the formula

$$
\operatorname{Cov(X, Y)}=\text{E[X Y]}-\text{E[X]E[Y]}
$$
Using data from the table,

$$
\begin{align}
\text{E}(\text{XY}) &=\sum_{\text {all } \text{x}} \sum_{\text{all y}} \text{xy} * \text{f}(\text{x}, \text{y}) \\
&=[0 \times 1] \times 0.1+[1 \times 1] \times 0.1+\cdots+[2 \times 3] \times 0.1=2
\end{align}
$$
The (marginal) probability mass function of \(\text{X}\) is:

$$\begin{array}{c|c|c|c}\text{X}& 0 & 1 & 2 \\
\hline \text{P}(\text{X}=\text{X}) & 0.4 & 0.3 & 0.3 \\
\end{array}$$
Thus,

$$
\text{E}(\text{X})=0 \times 0.4+1 \times 0.3+2 \times 0.3=0.9
$$
The (marginal) probability mass function of \(\text{Y}\) is:

Thus,

$$\begin{array}{c|c|c|c}\mathrm{Y}& 1 & 2 & 3 \\
\hline \mathbf{P}(\mathrm{Y}=\mathrm{y})& 0.2& 0.4 & 0.4 \\
\end{array}$$
$$
\text{E(Y)}=1 \times 0.2+2 \times 0.4+3 \times 0.4=2.2
$$
Hence,
$$
\operatorname{Cov}(\text{X, Y})=2-0.9 \times 2.2=0.02
$$

Example 2: Covariance and Correlation Coefficient

Using the results in Example 1 above, find the correlation coefficient between \(\text{X}\) and \(\text{Y}\).

Solution

We know that,

$$
\operatorname{Corr}(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}
$$
Using the respective marginal distributions, we can calculate \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). i.e.,
$$
\begin{align}
\operatorname{Var}(\text{X}) &=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2} \\
&=\left[0^{2} \times 0.4+1^{2} \times 0.3+2^{2} \times 0.3\right]-0.9^{2}=0.69\end{align}
$$
Similarly,
$$
\begin{align}
\operatorname{Var}(\text{Y}) &=\text{E}\left(\text{Y}^{2}\right)-[\text{E}(\text{Y})]^{2} \\
&=\left[1^{2} \times 0.2+2^{2} \times 0.4+3^{2} \times 0.4\right]-2.2^{2}=0.56\end{align}
$$
Therefore,

$$
\operatorname{Corr}(X, Y)=\frac{0.02}{\sqrt{0.69 \times 0.56}} \approx 0.03
$$

Example 3: Covariance and Correlation Coefficient

An actuary wishes to determine the relationship between the yearly number of days of hurricanes on two neighboring coasts, \(A\) and \(B\). Let \(X\) be the yearly number of days of hurricanes on coast \(A\), and let \(Y\) be the yearly number of days of hurricanes on coast \(B\). \(X\) and \(Y\) are jointly distributed as:
$$
\text{f(x, y)}=\frac{1}{33}(x+2 y) \quad x=1,2 \quad y=1,2,3
$$
Compute \(\operatorname{Corr}(\text{X}, \text{Y})\)

Solution

First, we need,
$$
\begin{align}
\text{E}(\text{XY})=& \sum_{\text {all } \text{x}} \sum_{\text{all} \text{y}} \text{xy} \text{f}(\text{x}, \text{y})=\sum_{\text{x}=1}^{2} \sum_{\text{y}=1}^{3} \text{xy} \frac{\text{x}+2 \text{y}}{33} \\
=&(1)(1) \frac{(1)+2(1)}{33}+(1)(2) \frac{(1)+2(2)}{33}+(1)(3) \frac{(1)+2(3)}{33}+(2)(1) \frac{(2)+2(1)}{33} \\
&+(2)(2) \frac{(2)+2(2)}{33}+(2)(3) \frac{(2)+2(3)}{33} \\
=&(1) \frac{3}{33}+(2) \frac{5}{33}+(3) \frac{7}{33}+(2) \frac{4}{33}+(4) \frac{6}{33}+(6) \frac{8}{33}=\frac{38}{11}
\end{align}
$$
We also need \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). As such, we need to find the marginal probability mass functions for \(\text{X}\) and \(\text{Y}\) first.

The marginal probability mass function of \(\text{X}\) is given by:

$$
\text{f(x)}=\sum_{\text{y=1}}^{3} \frac{1}{33}(\text{x}+2 \text{y})=\frac{\text{x}+2(1)}{33}+\frac{\text{x}+2(2)}{33}+\frac{\text{x}+2(3)}{33}=\frac{3\text{x}+12}{33}, \quad \text{ for x=1,2}
$$
Now,
$$
\text{E(X)}=\sum_{\text {all x}} \text{x} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{2} \text{x} * \frac{3\text{x}+12}{33}=(1) \frac{3(1)+12}{33}+(2) \frac{3(2)+12}{33}=\frac{51}{33}=\frac{17}{11}
$$
and,
$$
\text{E}\left(\text{X}^{2}\right)=\sum_{\text {all x}} \text{x}^{2} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{2} \text{x}^{2} \frac{3 \text{x}+12}{33}=(1)^{2} \frac{3(1)+12}{33}+(2)^{2} \frac{3(2)+12}{33}=\frac{87}{33}=\frac{29}{11}
$$
Thus,

$$
\operatorname{Var}(\text{X})=\text{E}\left(\text{X}^{2}\right)-[\text{E}(\text{X})]^{2}=\frac{29}{11}-\left(\frac{17}{11}\right)^{2}=\frac{30}{121}
$$
Similarly, the marginal probability mass function for \(\text{Y}\) is given by:
$$
\text{f}_{\text{Y}}(\text{y})=\sum_{\text{x=1}}^{2} \frac{1}{33}(\text{x}+2 y)=\frac{(1)+2 y}{33}+\frac{(2)+2\text{y}}{33}=\frac{4\text{y}+3}{33}, \text{for y=1,2,3}
$$
The mean and the variance of \(Y\) can be calculated as follows:

$$
\text{E(Y)}=\sum_{\text{y=1}}^{3} \text{y} \frac{\text{4y}+3}{33}=(1) \frac{4(1)+3}{33}+(2) \frac{4(2)+3}{33}+(3) \frac{4(3)+3}{33}=\frac{74}{33}
$$
and,

$$
\text{E}\left(\text{Y}^{2}\right)=\sum_{\text{y}=1}^{3} \text{y}^{2} \frac{4 \text{y}+3}{33}=(1)^{2} \frac{4(1)+3}{33}+(2)^{2} \frac{4(2)+3}{33}+(3)^{2} \frac{4(3)+3}{33}=\frac{62}{11}
$$
Thus,
$$
\operatorname{Var}(\text{Y})=\text{E}\left(\text{Y}^{2}\right)-[\text{E}(\text{Y})]^{2}=\frac{62}{11}-\left(\frac{74}{33}\right)^{2}=\frac{662}{1089}
$$

The covariance of \(\text{X}\) and \(\text{Y}\) is

$$
\operatorname{Cov}(\text{X, Y})=\text{E(X Y)}-\text{E(X)}{E(Y)}=\frac{38}{11}-\frac{17}{11} \times \frac{62}{33}=-\frac{4}{363}
$$
Hence,
$$
\rho(X, Y)=\frac{\operatorname{Cov}(X, Y)}{\sqrt{\operatorname{var}(X) \operatorname{var}(Y)}}=\frac{-\frac{4}{363}}{\sqrt{\frac{662}{1089} * \frac{30}{121}}}=-0.02838
$$

Example 4: Covariance and Correlation Coefficient

Let \(X\) be the number of days of sickness for patient \(A\) and let \(Y\) be the number of days of sickness for patient B. \(X\), and \(Y\) is jointly distributed as:

$$
\text{f(x, y)}=\text{c}\left(\text{x}^{2}+3\text{y}\right) \quad \text{x=1,2,3,4}; \quad \text{y=1,2}
$$
Determine \(\rho(\text{X}, \text{Y})\)

Solution:

First, we need to find the value of \(\text{c}\) and then proceed to determine the marginal functions.

We know that:
$$
\begin{aligned}& \sum_{\text{x}} \sum_{\text{y}} \text{f(x, y)}=1 \\&
\Rightarrow c\left(1^{2}+3(1)\right)+c\left(1^{2}+3(2)\right)+\cdots+c\left(4^{2}+3(2)\right)\\
&\Rightarrow 4 \mathrm{c}+7 \mathrm{c}+7 \mathrm{c}+10 \mathrm{c}+12 \mathrm{c}+15 \mathrm{c}+19 \mathrm{c}+22=1 \\
&\therefore \mathrm{c}=\frac{1}{96}
\end{aligned}
$$
We know that,
$$
\rho(\text{X}, \text{Y})=\frac{\operatorname{Cov}(\text{X}, \text{Y})}{\sqrt{\operatorname{Var}(\text{X}) * \operatorname{Var}(\text{Y})}}
$$
We need to compute \(\operatorname{Cov}(\text{X}, \text{Y}), \operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\).

Now,
$$
\begin{align}
\text{E(X Y)} &=\sum_{\text{x}=1}^{4} \sum_{\text{y}=1}^{2} \text{xy} \frac{\text{x}^{2}+3\text{y}}{96} \\&=(1)(1) \frac{4}{96}+(1)(2) \frac{7}{96}+(2)(1) \frac{7}{96}+(2)(2) \frac{10}{96}+(3)(1) \frac{12}{96}+(3)(2) \frac{15}{96} \\
&+(4)(1) \frac{19}{96}+(4)(2) \frac{22}{96}=\frac{75}{16}
\end{align}
$$
We also need \(\operatorname{Var}(\text{X})\) and \(\operatorname{Var}(\text{Y})\). As such, we need to find the marginal probability mass functions for \(\text{X}\) and \(\text{Y}\) before we proceed:

$$
\begin{gather}
\text{f}_{\text{X}}(\text{x})=\frac{2 \text{x}^{2}+9}{96}, \text { for } \text{x}=1,2,3,4 \text{ and } \text{f}_{\text{Y}}(\text{y})=\frac{12 \text{y}+30}{96}, \text { for } \text{y}=1,2 \\\\
\Rightarrow \text{E}(\text{X})=\sum_{\text{x}=1}^{4} \text{x} * \text{f}_{\text{X}}(\text{x})=\sum_{\text{x}=1}^{4} \text{x} \frac{2 \text{x}^{2}+9}{96}=(1) \frac{11}{96}+(2) \frac{17}{96}+(3) \frac{27}{96}+(4) \frac{41}{96}=\frac{145}{48}
\end{gather}
$$
and,

$$
\begin{aligned}
\operatorname{Var}(\text{X}) &=\sum_{\text{x}=1}^{4} \text{x}^{2} \text{f}_{\text{X}}(\text{x})-[\text{E}(\text{X})]^{2}=\sum_{\text{x}=1}^{4} \text{x}^{2} \frac{2 \text{x}^{2}+9}{96}-\left(\frac{145}{48}\right)^{2} \\
&=(1)^{2} \frac{11}{96}+(2)^{2} \frac{17}{96}+(3)^{2} \frac{27}{96}+(4)^{2} \frac{41}{96}-\left(\frac{145}{48}\right)^{2}=\frac{163}{16}-\left(\frac{145}{48}\right)^{2}=1.062
\end{aligned}
$$
Similarly,
$$
\text{E}(\text{Y})=\sum_{\text{y}=1}^{2} \text{y} * \text{f}_{\text{Y}}(\text{y})=\sum_{\text{y}=1}^{2} \text{y} \frac{12 \text{y}+30}{96}=(1) \frac{42}{96}+(2) \frac{54}{96}=\frac{25}{16}
$$
and,
$$
\operatorname{Var}(\text{Y})=\sum_{\text{y}=1}^{2} \text{y}^{2} \frac{12 \text{y}+30}{96}-\left(\frac{25}{16}\right)^{2}=(1)^{2} \frac{42}{96}+(2)^{2} \frac{54}{96}-\left(\frac{25}{16}\right)^{2}=\frac{63}{256}
$$
Therefore,
$$
\operatorname{Cov}(\text{X, Y})=\text{E(X Y)}-\text{E(X)E(Y)}=\frac{75}{16}-\left(\frac{145}{48}\right)\left(\frac{25}{16}\right)=-\frac{25}{768}
$$
And lastly,
$$
\rho(\text{X, Y})=-\frac{\frac{25}{768}}{\sqrt{1.062 \times\left(\frac{63}{256}\right)}}=-0.0637
$$

Learning Outcome

Topic 3. e: Multivariate Random Variables – Calculate joint moments, such as the covariance and the correlation coefficient.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.