# Conditional Distributions

**Conditional probability** is a key part of Baye’s theorem. In plain language, it is the probability of one thing being true given that another thing is true. It differs from** joint probability**, which is the probability that both things are true without knowing that one of them must be true. Before diving deeper into the math part, let’s see some practical application of conditional distributions.

Assume that a law enforcement department is looking into the connection between road accidents and intoxication among automobile drivers. On one hand, the department could come up with the probability that a driver is intoxicated **and** involved in an accident. That would be a joint probability. On the other, the department could determine the probability that a driver is involved in an accident **given that** they are intoxicated (It’s already known that the driver is intoxicated). This would be a conditional probability. The probabilities of these two events cannot be the same. Adding information (in the conditional case) alters probability.

In the medical field, a team of doctors could gather data to help them analyze and predict cases of kidney failure. On one hand, the probability that a patient’s left **and** right kidneys are both infected is a joint probability, whereas a conditional probability is the probability that the left kidney is infected **if we know that** the right one is infected. We can use a Euler diagram to demonstrate the difference. In the diagram, each large square has area 1, and the smaller squares represent probabilities.

Let *X* be the probability that a patient’s left kidney is infected, and let *Y* be the probability that the right kidney is infected. On the left side of the diagram, the green area represents the probability that both of the patient’s kidneys are infected. This is the joint probability *$P(X,Y)$*. If Y is true (e.g., given that the right kidney is definitely infected), then the space of everything not *$Y$* is dropped and everything in *$Y$* is rescaled to the size of the original space. The rescaled green area on the right hand side is now the conditional probability of *X* given *Y*, expressed as *$P(X|Y)$*. Put different, this is the probability that the left kidney is infected if we know that the right kidney is infected. It’s also important to note that the conditional probability of *$X$* given *Y* is not necessarily equal to the conditional probability of *$Y$* given *$X$*.

## Discrete Conditional Functions

The **conditional probability mass function of** \(X\), given that \(Y = y\), is defined by

$$ g(x|y) = \frac{f(x,y)}{f_Y(y)}, \qquad \text{provided that } f_Y(y) > 0. $$

Similarly, the **conditional probability mass function of ** \(Y\), given that \(X = x\), is defined by

$$ h(y|x) = \frac{f(x,y)}{f_X(x)}, \qquad \text{provided that } f_X(x) > 0. $$

If we compare this definition with the univariate case we had

$$ P(A|B) = \frac{P(A \cap B)}{P(B)}, \qquad \text{provided that } P(B) > 0. $$

where \(P(B)\) happened first and had an impact on how A occurred (because if their intersection is empty the occurence of B wouldn’t have any effect on the probability of A). In the bivariate case, the intersection is given by joint pmf \(f(x,y)\) and the event that would have an effect on how \(Y\) occurs is \(X\), and values of \(X\) can be got from its marginal pmf, \(f_X(x)\).

**Example 1: discrete conditional probability function**

A bivariate distribution has the following probability function:

Determine:

**(I) the marginal distribution of X , and the marginal distribution of Y**

The marginal distribution of *X* can be found by summing the columns in the

table

*\(P(X = 0) = 0.4, P(X = 1) = 0.3 , P(X = 2) = 0.3\)*

The marginal distribution of *Y* can be found by summing the rows in the

table

*\(P(Y = 1) = 0.2, P(Y = 2) = 0.4 , P(Y = 2) = 0.4\)*

**(II) the conditional distribution of \(Y|X=2\)**

Using the definition of conditional probability:

\(P(Y=1|X=2)=\cfrac { P(Y=1,X=2) }{ P(X=2) } =\cfrac { 0 }{ 0.3 } =0\)

\(P(Y=2|X=2)=\cfrac { P(Y=2,X=2) }{ P(X=2) } =\cfrac { 0.2 }{ 0.3 } =0.67\)

\(P(Y=3|X=2)=\cfrac { P(Y=3,X=2) }{ P(X=2) } =\cfrac { 0.1 }{ 0.3 } =0.33\)

**Example 2: discrete conditional probability function**

Let \(X\) and \(Y\) have the joint pmf

$$ f(x,y) = \frac{5x+3y}{81}, \quad x=1,2, \quad y=1,2,3. $$

We proceed to find their marginal functions

$$ f_X(x)= \frac{15x + 18}{81}, \quad x=1,2, $$

and

$$ f_Y(y) =\frac{6y+15}{81} \quad y=1,2,3. $$

Thus, the conditional pmf of \(Y\), given that \(X=x\), is equal to

$$ h(y|x) = \frac{(5x+3y)/81}{(15x+18)/81} = \frac{5x+3y}{15x+18}, \quad y=1,2,3, \text{when}\quad x=1\text{ or }2 $$

Then finding the probabilities of an event in the space:

$$ P(Y=1|X=2)= h(1|2)=\frac{13}{48} $$

*The candidate can practice this concept by finding the “conditional” opposite, \(g(x|y)\).*

If we find all the probabilities for this conditional probability functions we would see that they behave as the joint probability mass functions seen in last chapter. Let’s keep \(X=2\) fixed and check this:

$$ P(Y=2|X=2)= h(2|2)=\frac{16}{48} \text{ and } P(Y=3|X=2)= h(3|2)=\frac{19}{48} $$

Then if we sum these values with X fixed

$$ P(Y|X=2) = h(1|2) + h(2|2) + h(3|2) = \frac{13}{48} + \frac{16}{48} + \frac{19}{48} = 1 $$

This condition is fulfilled for each value of X summed by each value of Y, and the same occurs for the \(g(x|y)\) conditional function.

Thus, \(h(y|x)\) and \(g(x|y)\) both satisfy the conditions of a probability mass function, and we can do the same operations we did on a joint pmf, such as:

*Computing conditional probabilities such as*

$$ P(a < Y < b| X = x) = \sum_{\{y:a < y < b\}} h(y|x) $$

and *conditional expectations such as*

$$ E[u(Y)|X = x] = \sum_{y}u(y)h(y|x) $$

in a manner similar to those associated with unconditional probabilities and expectations.

Two special conditional moments are the *conditional mean* of \(Y\), given that \(X = x\) defined by

$$ \mu_{Y|x}=E(Y|x)=\sum_{y}yh(y|x), $$

and the **conditional variance** of \(Y\), given that \(X = x\), defined by

$$ \sigma_{Y|x}^2 = E\{[Y – E(Y|x)]^2|x\} = \sum_{y}[y-E(Y|x)]^2h(y|x), $$

We can compute this as:

$$ \sigma_{Y|x}^2 = E(Y^2|x) – [E(Y|x)]^2. $$

We would use the same logic to find the conditional mean \(\mu_{X|y}\) and the conditional variance \(\sigma_{X|y}^2\)

Using the values we found in example 2, let’s compute the conditional mean and variance:

\begin{align} \mu_{Y|2} & = E(Y|X=2) = \sum_{y=1}^{3}yh(y|2)\\ & = \sum_{y=1}^{3}y\left(\frac{10+3y}{48}\right) = 1 \left(\frac{13}{48}\right) + 2 \left(\frac{16}{48}\right) + 3 \left(\frac{19}{48}\right)\\ & = \frac{17}{8} \end{align}

and

\begin{align} \sigma_{Y|2}^2 & = E\left[\left(Y – \frac{17}{8}\right)^2\bigg|X=2\right] = \sum_{y=1}^{3}\left(y-\frac{17}{8}\right)^2\left(\frac{10+3y}{48}\right)\\ & = \left(1-\frac{17}{8}\right)^2\frac{13}{48}+\left(2-\frac{17}{8}\right)^2\frac{16}{48}+\left(3-\frac{17}{8}\right)^2\frac{19}{48}\\ & = \left(\frac{81}{64}\right)\frac{13}{48}+\left(\frac{1}{64}\right)\frac{16}{48}+\left(\frac{49}{64}\right)\frac{19}{48} \approx 0.651. \end{align}

## Continuous Conditional functions

*All the concepts studied on the previous section can be extended to the continuous case.*

If \(X\) and \(Y\) have a joint probability density function \(f(x,y)\), then the conditional probability density function of \(X\) given that \(Y=y\) is defined, for all values of \(y\) such that \(f_y(y) > 0\), by

$$ g(x|y) = \frac{f(x,y)}{f_y(y)} $$

We can also express the conditional probability density function of \(Y\) given \(X\), \(h(y|x)\), in a similar manner. Various probabilities of the continuous type can also be computed.

For calculations of expected values and variances then we proceed to the following definitions: Let \(X\) and \(Y\) have a distribution of the continuous type with joint pdf \(f(x,y)\) and marginal pdfs \(f_X(x)\) and \(f_Y(y)\), respectively. Then the conditional pdf, mean, and variance of \(Y\), given that \(X = x\), are, respectively,

$$ h(y|x) = \frac{f(x,y)}{f_X(x)}, \qquad \text{provided that }f_X(x)>0; $$

$$ E(Y|x)= \int_{-\infty}^{\infty}yh(y|x)dy; $$

and

\begin{align} Var(Y|x) & = E\{[Y – E(Y|x)]^2|x\}\\ & = \int_{-\infty}^{\infty}[y-E(Y|x)]^2h(y|x)dy\\ & = E[Y^2|x]-[E(Y|x)]^2. \end{align}

Similar expressions are associated with the conditional distribution of \(X\), given that \(Y = y\).

**Example:** Let \(X\) and \(Y\) be random continuous joint variables. With the following joint and marginal functions:

$$ f(x,y) = x \qquad 0\le x \le y \le 1 $$

$$ F_x = x + x^2 \qquad 0 \le x \le 1 $$

$$ F_Y = 2xy \qquad 0 \le x \le y \le 1 $$

Before doing any calculations we must think of the region that is defined by the bounds, it is the triangular region formed by \(y=x\), \(y=1\) and \(x=0\). We can see it in Figure 2.1, this is the area that this function will have its mass, the graph is for the function \(f(x)=1-x\) with this then that y will have the values that \(X\) doesn’t take exclusively \(x \leq y \leq 1\) unless \(X=Y=1\) which in any case, would still be an integrable region.

This is an intuitive fact that must be noted for all calculations with this kind of regions. Proceeding to calculate the expected formula we would then have

$$ h(y|x) = \frac{f(x,y)}{F_X(x)}= \frac{x}{x+x^2} = \frac{x}{x(x+1)} = \frac{1}{x+1} $$

Then the conditional mean of \(Y\), given that \(X = x\) is

$$ E(Y|x)= \int_{x}^{1} y\frac{1}{x+1}dy = \bigg[\frac{y^2}{2x+2}\bigg]_{y=x}^{y=1} = \frac{1-x^2}{2x+2} $$

As way of practicing this, you can calculate \(E(X|y)\) yourself.

The conditional variance of \(Y\), given that \(X=x\), is

\begin{align} E\{[Y-E(Y|x)]^2|x\} & = \int_{x}^{1}\bigg(y-\frac{1-x^2}{2(x+1)}\bigg)^2 \frac{1}{x+1}dy\\ & = \bigg[ \frac{1}{3(x+1)}\bigg(y – \frac{1-x^2}{2(x+1)}\bigg)^3\bigg]_{y=x}^{y=1}\\ & = \frac{1}{3(x+1)}\bigg[\bigg(\frac{x+1}{2}\bigg)^3-\bigg(\frac{(-3x+1)(-x-1)}{2(x+1)}\bigg)^3\bigg] \end{align}

Finding the variance of \(X\), given that \(Y=y\) will give a similar result and you can find it yourself as a way of practicing. This process of analyzing graphs will result in a good practice and will help the reader on completing more complex exercises where the region is particularly not linear.

**Learning Outcome**

**Topic 3.a: Multivariate Random Variables – Determine conditional and marginal probability functions, probability density functions, and cumulative distribution functions.**