# Bivariate Distributions of the discrete type (Joint Probability)

Sometimes certain events can be defined by the interaction of two measurements. These types of events that are explained by the interaction of the two variables constitute what we call bivariate distributions. Let’s look at an example:

Let’s say your boss has decided to randomly pick the leaders of a new committee and you’re interested in the male to female ratio. The committee will consist of a president, a vice president, and a secretary. Let’s assume that the group has 20 individuals comprising of 13 females and 7 males. You proceed to define \(X\) as the number of males and \(Y\) as the number of females in the committee. It follows that there would be 4 possible scenarios in favor of each gender: (0,3),(1,2),(2,1), (3,0). This implies we would have a total of 8 possible outcomes. We will explore this example later on.

Let’s now look at a formal definition of bivariate distribution:

Let \(X\) and \(Y\) be two random variables defined on a discrete space. Let \(S\) denote the corresponding two-dimensional space of \(X\) and \(Y\), the two random variables of the discrete type. The probability that \(X = x\) and \(Y = y\) is denoted by \(f(x, y) = P(X = x, Y = y)\). The function \(f(x, y)\) is called the joint probability mass function (joint pmf) of \(X\) and \(Y\). It specifies how the total probability of 1 is divided up amongst the possible values of (x,y) and so gives the (joint/bivariate) probability distribution of (X,Y). It has the following properties:

- \(0\le f(x,y)\le 1\)
- \((x,y) \in S f(x, y) = 1.\)
- \( \begin{equation*} P[(X,Y \in A)] = \sum_{x,y \in A}\sum f(x,y) \text{where } A\text{ is a subset of the space } S \end{equation*}\\ \)

The first condition tells us that the probability of any possible outcome must not be less than 0 and greater than 1. The second condition requires all the probabilities over the entire support *S* to sum up to 1. The third condition tells us that whenever we want to establish the probability of a given event *A*, we do so by simply summing up the probabilities of the (*x*,* y*) values in *A*.

For two discrete random variables defined in a certain probability space, the probability of a certain event for each random variable is denoted by a joint probability mass function (pmf). These functions are bound by properties of probabilities: the probability for each event must be between 0 and 1, the sum of all of the events is equal to 1, and for a given subset \(A\) of the sample space \(S\), the sum of the probabilities defined by the function and this subset is defined in the sample space \(S\).

Example: If we were to roll a pair of fair dice there would be a total of 36 outcomes, each with a probability of \(\frac{1}{36}\). Let X denote the smaller and Y the larger outcome on the dice. For example, let’s take \(f(X = 5, Y = 6)\). There are two ways in which this event could happen: a 5 for dice one and a 6 for dice two, or a 6 for dice one and a 5 for dice two. Thus, the probability of this particular event is \(\frac{1}{36} + \frac{1}{36} = \frac{2}{36} \).

However, an event like \(f(4,4)\) can only occur in 1 way: a 4 in both dice. In this case, its probability is \(\frac{1}{36}\). The joint pmf can therefore be presented as follows:

\begin{equation*} F(x,y) = \left\{ \begin{array}{rl} \frac{1}{36} & \text{if } 1 \leq x = y \leq 6,\\ \frac{2}{36} & \text{if } 1 \leq x < y \leq 6, \end{array}\right.\ \end{equation*}

The numbers on the margins(axis) of Figure 1 are the column and row total probabilities. These totals describe the probability mass functions of \(X\) and \(Y\), respectively.

## Marginal Probability Mass Function.

Now forget about the bivariate distribution for a moment. Suppose we are interested in the distribution of one of the two variables without considering the values that the other variable can take. This would give us the marginal probability mass function.

Let \(X\) and \(Y\) have the joint probability mass function \(f(x, y)\) with space \(S\). The probability mass function of \(X\) alone, which is called the marginal probability mass function of \(X\), is defined by

\begin{equation*} f_{x}(x)= \sum_{y}f(x,y) = P(X = x), \qquad x \in S_x \end{equation*} where the summation is taken over all possible values of y for each given x in \(x\) space \(S_X\) .

That is, the summation is over all \((x, y) \in S\) with a given value \(x\). Similarly, the marginal probability mass function of \(Y\) is defined by

\begin{equation*} f_{y}(y)= \sum_{x}f(x,y) = P(Y = y), \qquad y \in S_y \end{equation*} where the summation is taken over all possible \(x\) values for each given \(y\) in the \(y \text{ space } S_y\).

The random variables \(X \text{ and }Y\) are independent if and only if, for every

\(x \in S_X \text{ and every } y \in SY\),

$$ P(X = x, Y = y) = P(X = x)P(Y = y) $$

or, equivalently,

$$ f(x, y) = f_{X}(x)f_{Y}(y);$$

otherwise, X and Y are said to be dependent.

This definition expresses a way to synthesize the bivariate function into a single function for each variable by iterating over the other variable. This way, one can prove the independence of these events and at the same time, simplify the bivariate function to be worked piece by piece.

Let’s look at an example:

Let the joint pmf of \(X\text{ and }Y\) be

$$f(x,y) = \frac{x^2y^3}{50},\qquad x = 1,2,3,\qquad y = 1,2.$$

The marginal probability mass functions are

\begin{align} f_{x}(x) &= \sum_{y=1}^{2}\frac{x^2y^3}{50} \qquad y=1,2\\ & = \frac{x^2(1)^3}{50}+\frac{x^2(2)^3}{50}\\ & = \frac{x^2}{50} + \frac{8x^2}{50}\\ & = \frac{9x^2}{50}, \end{align} and, \begin{align*} f_{y}(y) & = \sum_{x=1}^{3}\frac{x^2y^3}{50} \qquad x=1,2,3\\ & = \frac{1y^3}{50} + \frac{4y^3}{50} + \frac{9y^3}{50}\\ & = \frac{14y^2}{50} \end{align*} In this case, \(f(x,y)\neq { f }_{ X }(x){ f }_{ Y }(y)\text{ for }x=1,2,3\text{ and }y=1,2\)

thus, \(X\text{ and }Y\) are dependent.

### Expectations of functions of two variables

Now that we have introduced at length the concept of bivariate distributions, it is important to note that sometimes we may encounter multivariate (more than two) functions where we would have to work with more than two variables. For this reason, it is important to switch notation** **from \(X\text{ and }Y\) to \(X_1\text{ and }X_2\). (\(X\) and \(Y\) notation will be used on integrals and we are simply assuming that \(X_1 = x\) and \(X_2 = y\)).

The expected value of a function \(u(X_1,X_2)\) of the random variables \((X_1,X_2)\) can be found by summing the product:

**value \(\times\) probability of assuming that value**

over all values (or combinations of) \((x_1,x_2)\)

When dealing with discrete random variables therefore,

$$ E[u(X_1,X_2)] = \sum_{(x_1,x_2) \in S}\sum u(x_1,x_2)f(x_1,x_2), $$

where the double summation is over all possible values of \(x_1\) and \(x_2\)

**Example**:

\(X_1\) and \(X_2\) have the following joint pmf:

\(f({ x }_{ 1 },{ x }_{ 2 })=\frac { { x }_{ 1 }{ x }_{ 2 }^{ 2 } }{ 30 } \) where \(x_1\) = 1,2,3 and \(x_2\) = 1,2

Calculate the expected value of [\(X_1\) + \(X_2\)]

Solution:

\(E[{ X }_{ 1 }+{ X }_{ 2 }]=\sum _{ { x }_{ 1 }=1 }^{ 3 }{ \sum _{ { x }_{ 2 }=1 }^{ 2 }{ ({ x }_{ 1 }+{ x }_{ 2 })P({ X }_{ 1 }={ x }_{ 1 },{ X }_{ 2 }={ x }_{ 2 }) } } \)

\(E[{ X }_{ 1 }+{ X }_{ 2 }]=\sum _{ { x }_{ 1 }=1 }^{ 3 }{ \sum _{ { x }_{ 2 }=1 }^{ 2 }{ ({ x }_{ 1 }+{ x }_{ 2 })(\cfrac { x_{ 1 }{ x }_{ 2 }^{ 2 } }{ 30 } ) } } \)

\(E[{ X }_{ 1 }+{ X }_{ 2 }]=\sum _{ { x }_{ 2 }=1 }^{ 2 }{ (1+{ x }_{ 2 })\cfrac { { 1x }_{ 2 }^{ 2 } }{ 30 } +(2+{ x }_{ 2 })\cfrac { { 2x }_{ 2 }^{ 2 } }{ 30 } +(3+{ x }_{ 2 })\cfrac { { 3x }_{ 2 }^{ 2 } }{ 30 } }\)

\(E[{ X }_{ 1 }+{ X }_{ 2 }]=(2)\cfrac { 1 }{ 30 } +(3)\cfrac { 2 }{ 30 } +(4)\cfrac { 3 }{ 30 } +(3)\cfrac { 4 }{ 30 } +(4)\cfrac { 8 }{ 30 } +(5)\cfrac { 12 }{ 30 } =\cfrac { 124 }{ 30 } \)

# Probability density function for Bivariate variables

Joint distributions of two random variables of the discrete type can be extended to two random variables of the continuous type. The definitions are the same, except that the summations are replaced with integrals. The joint probability density function (joint pdf) of two continuous-type random variables is an integrable function \(f(x,y)\) with the following properties (the same properties extended to this case):

\(\text{(a) }f(x_1,x_2)\geq 0,\text{ where }f(x_1,x_2)\text{ is not in the support(space) } S\text{ of }X_1\text{ and } X_2\).

\(\text{(b) }\int_{-\infty}^{\infty}f(x,y)dxdy=1\).

\(\text{(c) }P[(X_1,X_2) \in A] = \int_A\int f(x,y)dxdy, \text{where } {(X_1,X_2) \in A}\) is an event defined in the plane.

The mathematical expectations are the same as the discrete case, with integrals replacing summations. The respective marginal pdfs of continuous-type random variables \(X\) and \(Y\) are given by:

$$f_x(x) = \int_{-\infty}^{\infty} f(x,y)d_y, \qquad x \in S_x,$$

and

$$f_y(y) = \int_{-\infty}^{\infty} f(x,y)d_x, \qquad y \in S_y,$$

Where \(S_x\) and \(S_y\) are the respective spaces of \(X_1\) and \(X_2\). The definitions associated with mathematical expectations in the continuous case are the same as those associated with the discrete case after replacing the summations with integrations. Example:

Let \(X_1\) and \(X_2\) have the joint pdf

$$ f(x,y) = \left(\frac{8}{9}\right)(1 + xy), \qquad 0\leq x \leq 1, \qquad 0\leq y\leq 1. $$

The marginal pdfs are

\begin{align} f_x(x) &= \int_{0}^{1}\left(\frac{8}{9}\right)(1+xy)d_y \\ & = \frac{8}{9} (\int_{0}^{1}1d_y + x \int_{0}^{1} yd_y)\\ & = \left(\frac{8}{9}\right)\left(1+\frac{x}{2}\right), \qquad 0\leq x \leq 1, \end{align} and \begin{align} f_y(y) &= \int_{0}^{1}\left(\frac{8}{9}\right)(1+xy)d_x \\ & = \frac{8}{9} (\int_{0}^{1}1d_x + y \int_{0}^{1} xd_x)\\ & = \left(\frac{8}{9}\right)(1+\frac{y}{2}), \qquad 0\leq y \leq 1, \end{align}

Given a joint probability distribution, sometimes you may be required to determine values related to only one of the two variables, say, E(X) or V (X). To do so, first calculate the marginal distribution of the relevant variable, say X, and then proceed to work out the solution as we would normally do for the univariate case.

**Example:**

The joint density of X and Y is given by:

\({ f }_{ XY }(x,y)=\cfrac { 2x+y }{ 3000 }\) Where 10 < \(x\) < 20 and -5 < \(y\) < 5

(a) Find E(X)

To find the PDF of the marginal distribution of X , we integrate out Y :

\({ f }_{ X }(x)=\int _{ -5 }^{ 5 }{ \cfrac { 2x+y }{ 3,000 } \partial y } ={ \cfrac { 1 }{ 3,000 } \left| 2xy+\cfrac { { y }^{ 2 } }{ 2 } \right| }_{ -5 }^{ 5 }=\cfrac { 1 }{ 3,000 } \left\{ (10x+\cfrac { 25 }{ 2 } )\quad -\quad (-10x+\cfrac { 25 }{ 2 } ) \right\} =\cfrac { x }{ 150 } \)

Therefore, the marginal distribution of X is \({ f }_{ X }(x)=\cfrac { x }{ 150 } ,\quad 10<x<20\)

\(E(X)=\int _{ 10 }^{ 20 }{ x\left( \cfrac { x }{ 150 } \right) \partial x=\cfrac { 1 }{ 150 } } { \left| \cfrac { { x }^{ 3 } }{ 3 } \right| }_{ 10 }^{ 20 }=\cfrac { 1 }{ 150 } (\cfrac { 8,000 }{ 3 } -\cfrac { 1,000 }{ 3 } )=\cfrac { 7,000 }{ 450 } =\cfrac { 140 }{ 9 } \)

(b) Find the variance of X

\({ \sigma }_{ x }^{ 2 }=E{ (X }^{ 2 })-{ \left[ E(X) \right] }^{ 2 }\)

\(E({ X }^{ 2 })=\int _{ 10 }^{ 20 }{ { x }^{ 2 }\left( \cfrac { x }{ 150 } \right) \partial x=\cfrac { 1 }{ 150 } } { \left| \cfrac { { x }^{ 4 } }{ 4 } \right| }_{ 10 }^{ 20 }=\cfrac { 1 }{ 150 } (\cfrac { 160,000 }{ 4 } -\cfrac { 10,000 }{ 4 } )=\cfrac { 150,000 }{ 4 } \)

Therefore,

\({ \sigma }_{ x }^{ 2 }=\cfrac { 150,000 }{ 4 } -{ (\cfrac { 140 }{ 9 } ) }^{ 2 }=\cfrac { 3,017,900 }{ 81 } \)

# Cumulative distribution functions

Let \(X_1\) and \(X_2\) be random variables, define a function of them by:

$$F(a,b) = P(X_1 \leq a, X_2 \leq b) \quad -\infty \leq a,b \leq \infty$$

The distribution of \(X\) can be obtained from the joint distribution of \(X\) and \(Y\) as follows:

\begin{align} F_x(a) & = P(X_1 \leq a)\\ & = P(X_1 \leq a, X_2 \leq \infty)\\ & = P \left(\lim_{b \rightarrow \infty}(X_1 \leq a, X_2 \leq b)\right)\\ & = \lim_{b \rightarrow \infty} P(X_1 \leq a, X_2 \leq b)\\ & = \lim_{b \rightarrow \infty} F(a,b)\\ & = F(a,\infty) \end{align}

Note that, in the preceding set of equalities, we have once again made use of the fact that probability is a continuous set function. Similarly, the cumulative distribution of \(X_2\) is given by \begin{align} F_y(b) & = P(X_2 \leq b)\\ & = \lim_{a \rightarrow \infty} F(a,b)\\ & = F(\infty,b) \end{align} All joint probability statements about \(X_1\) and \(X_2\) can, in theory, be answered in terms of their joint distribution function. For instance, suppose we wanted to compute the joint probability that \(X_1\) is greater than \(a\) and \(X_2\) is greater than \(b\). This could be done as follows:

\begin{align} P(X_1 \ge a, X_2 \ge b) & = 1 – P((X_1 \ge a, X_2 \ge b)^c)\\ & = 1 – P((X_1 \ge a)^c \cup (X_2 \ge b)^c)\\ & = 1 – P((X_1 \leq a) \cup (X_2 \leq b))\\ & = 1 – (P(X_1 \leq a) + P(X_2 \leq b) – P(X_1 \leq a, X_2 \leq b))\\ & = 1 – F_{x_1}(a) – F_{x_2}(b) + F(a,b) \end{align}

# Sums of independent random variables

It is often important to be able to calculate the distribution of \(X_1 + X_2\) from the distributions of \(X_1\) and \(X_2\) when \(X_1\) and \(X_2\) are independent. Suppose that \(X_1\) and \(X_2\) are independent, continuous random variables having probability density functions \(f_{X_1}\) and \(f_{X_2}\). The cumulative distribution of \(X_1 + X_2\) is obtained as follows:

\begin{align} F_{X_1+X_2}(a) & = P(X_1 + X_2 \leq a)\\ & = \int\int_{x_1+x_2\leq a}f_{X_1}(x_1)f_{X_2}(x_2)d_{x_1}d_{x_2}\\ & = \int_{-\infty}^{\infty}\int_{-\infty}^{a-x_2} f_{X_2}(x_1)f_{X_2}(x_2)d_{x_1}d_{x_2}\\ & = \int_{-\infty}^{\infty}\int_{-\infty}^{a-x_2} f_{X_1}(x_1)d_{x_1}f_{X_2}(x_2)d_{x_2}\\ & = \int_{-\infty}^{\infty}F_{X_1}(a – x_2)f_{X_2}(x_2)d_{x_2} \end{align}

The cumulatie distribution function \(F_{X_1 + X_2}\) is called the convolution of the distribution \(F_{x_1}\) and \(F_{x_2}\) (the cumulative distribution functions of \(X_1\) and \(X_2\), respectively). By differentiating Equation above, we find that the probability density function \(f_{X_1+X_2}\) of \(X_1 + X_2\) is given by

\begin{align} f_{X_1+X_2}(a) & = \frac{d}{da}\int_{-\infty}^{\infty}F_{X_1}(a – x_2)f_{X_2}(x_2)d_{x_2}\\ & = \int_{-\infty}^{\infty} \frac{d}{da}F_{X_1}(a – x_2)f_{X_2}(x_2)d_{x_2}\\ & = \int_{-\infty}^{\infty} f_{X_1}(a – x_2)f_{X_2}(x_2)d_{x_2} \end{align}

**Learning Outcome**

**Topic 3.a: Multivariate Random Variables – Explain and perform calculations concerning joint probability functions, probability density functions, and cumulative distribution functions.**