# Covariance and Correlation Coefficient for Joint Random Variables

In pasts chapters we introduced mathematical expectations of two random variables, as well, their special names: mean and variance.

These were:

$$\mu_X = E(X); \mu_Y=E(Y) \qquad \text{and} \qquad \sigma_{X}^{2}=E[(X-\mu_X)^2]; \sigma_{Y}^2=E[(Y-\mu_Y)^2]$$

Now we introduce two new special names:

If \(u(X,Y) = (X-\mu_X)(Y-\mu_Y)\), then

$$E[u(X,Y)]= E[(X-\mu_X)(Y-\mu_Y)] = \sigma_{XY} = Cov(X,Y)$$

is called the **covariance** of \(X\) and \(Y\).

If the standard deviations \(\sigma_{X}\) and \(\sigma_{Y}\) are positive, then

$$\rho = \frac{Cov(X,Y)}{\sigma_{X}\sigma_{Y}}= \frac{\sigma_{XY}}{\sigma_{X}\sigma_{Y}}$$

and it’s called the **correlation coefficient** of \(X\) and \(Y\).

It is rather convenient that the mean and variance of any variable can be computed from either the joint pmf (or pdf) or the marginal pmf (or pdf) of same variable. For example in the discrete case for \(X\),

\begin{align*} \mu_X = E(X) & = \sum_{x}\sum_{y}xf(x,y)\\ & = \sum_{x}x\bigg[\sum_{y}f(x,y)\bigg] = \sum_{x}xf_X(x). \end{align*}

However, to compute the covariance, we need the joint pmf(or pdf).

Before considering the significance of the covariance and correlation coefficient, let us note a few simple facts. First,

\begin{align*} E[(X – \mu_X)(Y-\mu_Y)] & = E(XY – \mu_XY – \mu_YX + \mu_X\mu_Y)\\ & = E(XY) – \mu_XE(Y) – \mu_YE(X) + \mu_X\mu_Y, \end{align*}

because, even in the bivariate situation, \(E\) is still a linear or distributive operator. Thus,

$$Cov(X,Y) = E(XY) – \mu_x\mu_Y – \mu_Y\mu_X + \mu_X\mu_Y = E(XY) – \mu_X\mu_Y.$$

Since \(\rho = Cov(X,Y)/\sigma_X\sigma_Y\), we also have

$$E(XY) = \mu_X\mu_Y + \rho\sigma_X\sigma_Y.$$

That is, the expected value of the product of two random variables is equal to the product, \(\mu_X\mu_Y\) of their expectations, plus their covariance \(\rho\sigma_X\sigma_Y\).

Let’s see this in an example:

**Example:** Let \(X\) and \(Y\) have the joint pmf

$$f(x,y)=\frac{1}{33}(x+2y) \qquad x=1,2\qquad y=1,2,3.$$

The marginal probability mass functions are, respectively,

$$f_x(x) = \sum_{y=1}^{3}\frac{1}{33}(x+2y) = \frac{3x+12}{33}$$

and

$$f_y(y)= \sum_{x=1}^{2}\frac{1}{33}(x+2y)= \frac{4y+3}{33}$$

Since \(f(x,y) \neq f_X(x)f_Y(y)\), \(X\) and \(Y\) are dependent. The mean and the variance of \(X\) are, respectively,

$$\mu_X = \sum_{x=1}^{2}x\frac{3x+12}{33} = \frac{17}{11}$$

and

$$\sigma_{X}^2 = \sum_{x=1}^{2} x^2\frac{3x+12}{33} – \bigg(\frac{17}{11}\bigg)^2 = \frac{13}{11} – \frac{289}{121} = \frac{30}{121}$$

The mean and the variance of \(Y\) are, respectively

$$\mu_Y = \sum_{y=1}^{3}y\frac{4y+3}{33} = \frac{74}{33}$$

And,

$$\sigma_Y^2 = \sum_{y=1}^{3}y^2 \frac{4y+3}{33} – \bigg(\frac{74}{33}\bigg)^2 = \frac{62}{11} – \bigg(\frac{74}{33}\bigg)^2 = 0.607$$

The covariance of \(X\) and \(Y\) is

\begin{align*} Cov(X,Y) & = \sum_{x=1}^{2}\sum_{y=1}^{3}xy\frac{x+2y}{33} – \bigg(\frac{17}{11}\bigg)\bigg(\frac{74}{33}\bigg)\\ & = (1)(1)\frac{3}{33} + (1)(2)\frac{5}{33} + (1)(3)\frac{7}{33} + (2)(1)\frac{4}{33}\\ & + (2)(2)\frac{6}{33} + (2)(3) \frac{8}{33} – \bigg(\frac{17}{11}\bigg)\bigg(\frac{74}{33}\bigg)\\ & = \frac{106}{33} – \bigg(\frac{17}{11}\bigg)\bigg(\frac{74}{33}\bigg) = -\frac{92}{363} \end{align*}

And lastly the correlation coefficient for this example is

$$\rho = \frac{-92/363}{\sqrt{(30/121)(0.607)}} = -0.6533$$

Insight into the correlation coefficient \(\rho\) of two discrete random variable \(X\) and \(Y\) may be gained by thoughtfully examining the definition of \(\rho\), namely,

$$\rho = \frac{\sum_{x}\sum_{y}(x – \mu_X)(y – \mu_Y)f(x,y)}{\sigma_X\sigma_Y},$$

where \(\mu_X,\mu_Y,\sigma_X\) and \(\sigma_Y\) denote the respective means and standard deviations. If positive probabilities are assignated to pairs \((x,y)\) in which both \(x\) and \(y\) are either simultaneously above or simultaneously below their respective means, then the corresponding terms in the summation that defines \(\rho\) are positive because both factors \((x – \mu_X)\) and \((y – \mu_Y)\) will be positive or both will be negative. If, on the one hand, pairs \((x,y)\), in which one component is below its mean and the other above its mean, have most of the probability, then the coefficient of correlation will tend to be negative because the products \((x – \mu_X)\) and \((y – \mu_Y)\) having higher probabilities are negative.

**Other important remarks:**

Suppose that \(X\) and \(Y\) are independent, so that \(f(x,y) = f_X(x)f_Y(y)\). Suppose also that we want to find the expected value of the product \(u(X)v(Y)\). Subject to the existence of the expectations, we know that

\begin{align*} E[u(X)v(Y)] & = \sum_{S_X}\sum_{S_Y}u(x)v(y)f(x,y)\\ & = \sum_{S_X}\sum_{S_Y} u(x)v(y)f_X(x)f_Y(y)\\ & = \sum_{S_X}u(x)f_X\sum_{S_Y}v(y)f_Y(y)\\ & = E[u(X)]E[v(Y)]. \end{align*}

This formula can be used to show that the correlation coefficient of two independent variables is zero. For, in standard notation, we have

\begin{align*} Cov(X,Y) & = E[(X-\mu_X)(Y-\mu_Y)]\\ & = E(X -\mu_X)E(Y-\mu_Y) = 0. \end{align*}

The converse of this equation is not necessarily true, however: Zero correlation does not, in general, imply independence. It is most important to keep the relationship straight: Independence implies zero correlation, but zero correlation does not necessarily imply independence.

Here are some properties of covariance.

- \(Cov(X,Y) = Cov(Y,X)\)
- \(Cov(X,X) = Var(X)\)
- \(Cov(aX,Y) = a Cov(X,Y)\)
- \(Cov\bigg(\sum_{i=1}^{n}X_i,\sum_{j=1}^{m}Y_j\bigg) = \sum_{i=1}^{n}\sum_{j=1}^{m}Cov(X_i,Y_j)\)

The correlation coefficient is a measure of the degree of linearity between \(X\) and \(Y\). A value of \(\rho\) near \(+1\) or \(-1\) indicates a high degree of linearity between \(X\) and \(Y\), whereas a value near 0 indicates that such linearity is absent. A positive value of \(\rho\) indicates that \(Y\) tends to increase when \(X\) does, whereas a negative value indicates that \(Y\) tends to decrease when \(X\) increases. If \(\rho=0\), then \(X\) and \(Y\) are said to be *uncorrelated*.

**Example 2:** To show again how covariance and correlation coefficient are calculated we take a function similar to one we have already worked with:

$$ f(x,y) = c(x^2 + 3y) \qquad x=1,2,3,4,\quad y=1,2. $$

First we need to find the value of \(c\) then we can proceed to calculate accordingly the marginal functions(since we have calculated them before for the similar function we just need to replace a value):

\begin{align*} f(1,1)+f(1,2)+f(2,1)+f(2,2)+f(3,1)+f(3,2)+f(4,1)+f(4,2) = & 1\\ 4c+7c+7c+10c+12c+15c+19c+22 = & 1\\ 96c = & 1\\ c = & \frac{1}{96} \end{align*}

With this we find it’s marginal functions:

$$ f_x(x)= \frac{2x^2+9}{96} \qquad f_y(y) = \frac{12y+30}{96} $$

Since we know the marginal functions before hand, it is only needed to find the appropriated values:

First, we compute for \(X\),

\begin{align*} \mu_X & = \sum_{x=1}^{4}xf(x)\\ & = \sum_{x=1}^{4}x\frac{2x^2+9}{96} = (1) \frac{11}{96} + (2) \frac{17}{96} + (3)\frac{27}{96} + (4) \frac{41}{96} \\ & = \frac{11}{96}+ \frac{34}{96} + \frac{81}{96} + \frac{164}{96} = \frac{145}{48} \end{align*}

And,

\begin{align*} \sigma_{X}^2 & = \sum_{x=1}^{4}x^2f(x) – [\mu_X]^2\\ & = \sum_{x=1}^{4}x^2\frac{2x^2+9}{96} – \bigg(\frac{145}{48}\bigg)^2\\ & = (1)^2 \frac{11}{96} + (2)^2 \frac{17}{96} + (3)^2 \frac{27}{96} + (4)^2 \frac{41}{96} – \bigg(\frac{145}{48}\bigg)^2\\ & = \frac{11}{96} + \frac{68}{96}+\frac{243}{96} + \frac{656}{96} – \bigg(\frac{145}{48}\bigg)^2\\ & = \frac{163}{16} – \bigg(\frac{145}{48}\bigg)^2 = 1.062 \end{align*}

Then we find the needed values for \(Y\),

\begin{align*} \mu_Y & = \sum_{y=1}^{2}yf_y(y)\\ & = \sum_{y=1}^{2}(y)\frac{12y+30}{96} = (1)\frac{42}{96} + (2)\frac{54}{96}\\ & = \frac{42}{96} + \frac{108}{96} = \frac{25}{16} \end{align*}

And,

\begin{align*} \sigma_{Y}^2 & = \sum_{y=1}^{2}y^2f_y(y) – [\mu_Y]^2 \\ & = \sum_{y=1}^{2} y^2 \frac{12y+30}{96} – \bigg(\frac{25}{16}\bigg)^2 \\ & = (1)^2 \frac{42}{96} + (2)^2 \frac{54}{96} – \bigg(\frac{25}{16}\bigg)^2 \\ & = \frac{42}{96} + \frac{216}{96} – \frac{625}{256} = \frac{43}{16} – \frac{625}{256} = \frac{63}{256} \end{align*}

After we have found these values we can apply what we discussed so far,

\begin{align*} Cov(X,Y) & = \sum_{x=1}^{4}\sum_{y=1}^{2}xyf(x,y) – \mu_X\mu_Y\\ & = \sum_{x=1}^{4}\sum_{y=1}^{2}xy\frac{x^2+3y}{96} – \bigg(\frac{145}{48}\bigg)\bigg(\frac{25}{16}\bigg)\\ & = (1)(1)\frac{4}{96} + (1)(2)\frac{7}{96} + (2)(1)\frac{7}{96} + (2)(2)\frac{10}{96} + (3)(1)\frac{12}{96} \\ & + (3)(2)\frac{15}{96} + (4)(1)\frac{19}{96} + (4)(2)\frac{22}{96} – \bigg(\frac{145}{48}\bigg)\bigg(\frac{25}{16}\bigg)\\ & = (1)\frac{4}{96} + (2)\frac{7}{96} + (2)\frac{7}{96} + (4)\frac{10}{96} + (3)\frac{12}{96} \\ & + (6)\frac{15}{96} + (4)\frac{19}{96} + (8)\frac{22}{96} – \bigg(\frac{145}{48}\bigg)\bigg(\frac{25}{16}\bigg)\\ & = \frac{4}{96} + \frac{14}{96} + \frac{14}{96} +\frac{40}{96} + \frac{36}{96} + \frac{90}{36} + \frac{76}{96} + \frac{176}{96} – \frac{3625}{672} = \frac{75}{16} – \frac{3625}{672} = -\frac{25}{768} \end{align*}

And lastly,

\begin{align*} \rho & = \frac{Cov(X,Y)}{\sqrt{\sigma_{X}^2\sigma_{Y}^2}}\\ & = \frac{-25/768}{\sqrt{(1.062)(63/256)}} = -0.0636 \end{align*}

# Covariance for Continuous Random Variables

The formulae for Covariance and Correlation coefficient are exactly the same as the ones explored on the past section. Properties are the same and can be proved for the discrete and continuous case the same way assuming the same premises.

Since the formulae are exactly the same we must only find the marginal functions, then individual means and then apply the same formula. We recommend instead of doing as in the discrete case, follow the next steps for Covariance.

- Find \(E(X)\) and \(E(Y)\) at once with iterated integrals
- Find \(E(XY)\) applying the iterated integrals \(\int_{X}\int_{Y}xyf(x,y)dydx\)
- Calculate \(Cov(X,Y)\) with \(E(XY) – E(X)E(Y)\)

After finding the value of covariance, the reader should already be familiar on how to find the Variance and then calculate the correlation coefficient.

**Example:** Let

\begin{equation*} f(x,y)= \begin{cases} \frac{1}{4}(2x+y), & 0 < x < 1,0 < y < 1\\ 0, & \text{otherwise} \end{cases} \end{equation*}

Find \(Cov(X,Y)\)

We apply steps as we proposed them:

\begin{align*} E[X] & = \int_{0}^{1}\int_{0}^{1}xf(x,y)dydx\\ & = \int_{0}^{1}\int_{0}^{1}x\frac{1}{4}(2x+y)dydx\\ & \text{First let’s solve our first integral,}\\ & = \frac{1}{2}x^2\int_{0}^{1}dy + \frac{1}{4}x \int_{0}^{1}ydy = \frac{1}{2}x[y]_{0}^{1} + \frac{1}{4}x\bigg[\frac{y^2}{2}\bigg]_{0}^{1}\\ & = \frac{1}{2}x^2[1-0] + \frac{1}{4}x\bigg[\frac{1}{2}-0\bigg] = \frac{x^2}{2}+\frac{x}{8}\\ &\text{Then we can solve our second integral,}\\ & = \int_{0}^{1}\frac{x^2}{2} + \frac{x}{8}dx = \bigg[\frac{x^3}{6}+\frac{x^2}{16}\bigg]_{0}^{1}\\ & = \frac{1}{6}+\frac{1}{16}= \frac{11}{48} \end{align*} \begin{align*} E[Y] & = \int_{0}^{1}\int_{0}^{1}yf(x,y)dxdy\\ & = \int_{0}^{1}\int_{0}^{1}y\frac{1}{4}(2x+y)dxdy\\ & \text{First,}\\ & = \frac{1}{2}y\int_{0}^{1}xdx+\frac{1}{4}y^2\int_{0}^{1}dx=\frac{1}{2}y\bigg[\frac{x^2}{2}\bigg]_{0}^{1} + \frac{1}{4}y^2[x]_{0}^{1}\\ & = \frac{1}{4}y+\frac{1}{4}y^2\\ & \text{Then the second,}\\ & =\int_{0}^{1}\bigg(\frac{1}{4}y+\frac{1}{4}y^2\bigg)dy = \bigg[\frac{y^2}{8}+\frac{y^3}{12}\bigg]_{0}^{1} = \frac{1}{8}+\frac{1}{12} = \frac{5}{24} \end{align*} \begin{align*} E[XY] & = \int_{0}^{1}\int_{0}^{1}xyf(x,y)dydx\\ & = \int_{0}^{1}\int_{0}^{1}xy\frac{1}{4}(2x+y)\\ & \text{First,}\\ & = \int_{0}^{1}xy\frac{1}{4}(2x+y)dy\\ & = \frac{1}{2}x^2\int_{0}^{1}ydy+ \frac{1}{4}x\int_{0}^{1}y^2dy = \frac{1}{2}x^2\bigg[\frac{y^2}{2}\bigg]_{0}^{1} + \frac{1}{4}x\bigg[\frac{y^3}{3}\bigg]_{0}^{1}\\ & = \frac{x^2}{4}+\frac{x}{12}\\ & \text{Then,}\\ & = \int_{0}^{1}\frac{x^2}{4} + \frac{x}{12}dx = \bigg[\frac{x^3}{12} + \frac{x^2}{24} \bigg]_{0}^{1}\\ & = \frac{1}{12} + \frac{1}{24} = \frac{1}{8} \end{align*}

Having found all these values we can calculate the covariance for this function:

\begin{align*} Cov(X,Y) & = E[XY] – E[X]E[Y]\\ & = \frac{1}{8} – \bigg(\frac{11}{48}\bigg)\bigg(\frac{5}{24}\bigg) = 0.0772 \end{align*}

**Learning Outcome**

**Topic 3.f: Multivariate Random Variables – Calculate joint moments, such as the covariance and the correlation coefficient.**