Determine conditional and marginal probability functions

Conditional Distributions

Conditional probability is a key part of Baye’s theorem. In plain language, it is the probability of one thing being true given that another thing is true. It differs from joint probability, which is the probability that both things are true without knowing that one of them must be true. Before diving deeper into the math part, let’s see some practical application of conditional distributions. 

Assume that a law enforcement department is looking into the connection between road accidents and intoxication among automobile drivers. On one hand, the department could come up with the probability that a driver is intoxicated and involved in an accident. That would be a joint probability. On the other, the department could determine the probability that a driver is involved in an accident given that they are intoxicated (It’s already known that the driver is intoxicated). This would be a conditional probability. The probabilities of these two events cannot be the same. Adding information (in the conditional case) alters probability.

In the medical field, a team of doctors could gather data to help them analyze and predict cases of kidney failure. On one hand, the probability that a patient’s left and right kidneys are both infected is a joint probability, whereas a conditional probability is the probability that the left kidney is infected if we know that the right one is infected. We can use a Euler diagram to demonstrate the difference. In the diagram, each large square has area 1, and the smaller squares represent probabilities.

Let X be the probability that a patient’s left kidney is infected, and let Y be the probability that the right kidney is infected. On the left side of the diagram, the green area represents the probability that both of the patient’s kidneys are infected. This is the joint probability (X,Y). If Y is  true (e.g., given that the right kidney is definitely infected), then the space of everything not Y is dropped and everything in Y is rescaled to the size of the original space. The rescaled green area on the right hand side is now the conditional probability of X given Y, expressed as P(X|Y). Put different, this is the probability that the left kidney is infected if we know that the right kidney is infected. It’s also important to note that the conditional probability of X given Y is not necessarily equal to the conditional probability of Y given X

simple terms, we define conditional distribution as the distribution of one random variable given the value of another random variable.

Discrete Conditional Functions

The conditional probability mass function of \(X\), given that \(Y = y\), is defined by

$$ g(x|y) = \frac{f(x,y)}{f_Y(y)}, \qquad \text{provided that } f_Y(y) > 0. $$

Similarly, the conditional probability mass function of \(Y\), given that \(X = x\), is defined by

$$ h(y|x) = \frac{f(x,y)}{f_X(x)}, \qquad \text{provided that } f_X(x) > 0. $$

If we compare this definition with the univariate case we had

$$ P(A|B) = \frac{P(A \cap B)}{P(B)}, \qquad \text{provided that } P(B) > 0. $$

where \(P(B)\) happened first and had an impact on how A occurred (because if their intersection is empty the occurence of B wouldn’t have any effect on the probability of A). In the bivariate case, the intersection is given by joint pmf \(f(x,y)\) and the event that would have an effect on how \(Y\) occurs is \(X\), and values of \(X\) can be got from its marginal pmf, \(f_X(x)\).

Example 1: Discrete conditional probability function

A bivariate distribution has the following probability function:


(I) the marginal distribution of X, and the marginal distribution of Y

The marginal distribution of X can be found by summing the columns in the

\(P(X = 0) = 0.4, P(X = 1) = 0.3 , P(X = 2) = 0.3\)

The marginal distribution of Y can be found by summing the rows in the

\(P(Y = 1) = 0.2, P(Y = 2) = 0.4 , P(Y = 2) = 0.4\)

(II)  the conditional distribution of \(Y|X=2\)

Using the definition of conditional probability:

\(P(Y=1|X=2)=\cfrac { P(Y=1,X=2) }{ P(X=2) } =\cfrac { 0 }{ 0.3 } =0\)

\(P(Y=2|X=2)=\cfrac { P(Y=2,X=2) }{ P(X=2) } =\cfrac { 0.2 }{ 0.3 } =0.67\)

\(P(Y=3|X=2)=\cfrac { P(Y=3,X=2) }{ P(X=2) } =\cfrac { 0.1 }{ 0.3 } =0.33\)

Example 2: Discrete conditional probability function

Let \(X\) and \(Y\) have the joint pmf

$$ f(x,y) = \frac{5x+3y}{81}, \quad x=1,2, \quad y=1,2,3. $$

We proceed to find their marginal functions

$$ f_X(x)= \frac{15x + 18}{81}, \quad x=1,2, $$


$$ f_Y(y) =\frac{6y+15}{81} \quad y=1,2,3. $$

Thus, the conditional pmf of \(Y\), given that \(X=x\), is equal to

$$ h(y|x) = \frac{(5x+3y)/81}{(15x+18)/81} = \frac{5x+3y}{15x+18}, \quad y=1,2,3, \text{when}\quad x=1\text{ or }2 $$

Then finding the probabilities of an event in the space:

$$ P(Y=1|X=2)= h(1|2)=\frac{13}{48} $$

The candidate can practice this concept by finding the “conditional” opposite, \(g(x|y)\).

If we find all the probabilities for this conditional probability functions we would see that they behave as the joint probability mass functions seen in last chapter. Let’s keep \(X=2\) fixed and check this:

$$ P(Y=2|X=2)= h(2|2)=\frac{16}{48} \text{ and } P(Y=3|X=2)= h(3|2)=\frac{19}{48} $$

Then if we sum these values with X fixed

$$ P(Y|X=2) = h(1|2) + h(2|2) + h(3|2) = \frac{13}{48} + \frac{16}{48} + \frac{19}{48} = 1 $$

This condition is fulfilled for each value of X summed by each value of Y, and the same occurs for the \(g(x|y)\) conditional function.

Thus, \(h(y|x)\) and \(g(x|y)\) both satisfy the conditions of a probability mass function, and we can do the same operations we did on a joint pmf, such as:

Computing conditional probabilities such as

$$ P(a < Y < b| X = x) = \sum_{\{y:a < y < b\}} h(y|x) $$

and conditional expectations such as

$$ E[u(Y)|X = x] = \sum_{y}u(y)h(y|x) $$

in a manner similar to those associated with unconditional probabilities and expectations.

Two special conditional moments are the conditional mean of \(Y\), given that \(X = x\) defined by

$$ \mu_{Y|x}=E(Y|x)=\sum_{y}yh(y|x), $$

and the conditional variance of \(Y\), given that \(X = x\), defined by

$$ \sigma_{Y|x}^2 = E\{[Y – E(Y|x)]^2|x\} = \sum_{y}[y-E(Y|x)]^2h(y|x), $$

We can compute this as: 

$$ \sigma_{Y|x}^2 = E(Y^2|x) – [E(Y|x)]^2. $$

We would use the same logic to find the conditional mean \(\mu_{X|y}\) and the conditional variance \(\sigma_{X|y}^2\)

Using the values we found in example 2, let’s compute the conditional mean and variance:

\begin{align} \mu_{Y|2} & = E(Y|X=2) = \sum_{y=1}^{3}yh(y|2)\\ & = \sum_{y=1}^{3}y\left(\frac{10+3y}{48}\right) = 1 \left(\frac{13}{48}\right) + 2 \left(\frac{16}{48}\right) + 3 \left(\frac{19}{48}\right)\\ & = \frac{17}{8} \end{align}


\begin{align} \sigma_{Y|2}^2 & = E\left[\left(Y – \frac{17}{8}\right)^2\bigg|X=2\right] = \sum_{y=1}^{3}\left(y-\frac{17}{8}\right)^2\left(\frac{10+3y}{48}\right)\\ & = \left(1-\frac{17}{8}\right)^2\frac{13}{48}+\left(2-\frac{17}{8}\right)^2\frac{16}{48}+\left(3-\frac{17}{8}\right)^2\frac{19}{48}\\ & = \left(\frac{81}{64}\right)\frac{13}{48}+\left(\frac{1}{64}\right)\frac{16}{48}+\left(\frac{49}{64}\right)\frac{19}{48} \approx 0.651. \end{align}

Continuous Conditional Functions

If X and Y have a joint probability density function \( \text{f}(\text{x},\text{y}) \), then the conditional probability density function of X given that \( \text{Y}=\text{y}\) is defined, for all values of y such that \( \text{f}_\text{y}(\text{y})>0 \), by

$$ \text{g}\left(\text{x}|\text{y}\right)=\frac{\text{f}\left(\text{x},\text{y}\right)}{\text{f}_{\text{Y}}(\text{y})} $$

Similarly, the conditional probability density function of Y given X, \( \text{h}(\text{y}|\text{x}) \), similarly. Such that;/p>

$$ \text{h}\left(\text{y}|\text{x}\right)=\frac{\text{f}\left(\text{x},\text{y}\right)}{\text{f}_\text{X}\left(\text{x}\right)} $$

Example: Continuous Conditional Functions

The random variables X and Y have a joint density function:

$$ \text{f}\left(\text{x},\text{y}\right)=\frac{1}{20}\left(2\text{x}+5\text{y}\right) \ \ \ 0 < \text{x} < 2,\ 0<\text{y} $$

Find the conditional density function of X given \(\text{Y}=\text{y} \)


We know that:

$$ \text{g}\left(\text{x}|\text{y}\right)=\frac{\text{f}\left(\text{x},\text{y}\right)}{\text{f}_\text{Y}(\text{y})} $$


$$ \begin{align*} \text{f}_\text{Y}\left(\text{y}\right)&=\int_{-\infty}^{\infty}{\text{f}\left(\text{x},\text{y}\right)\text{dx},\ \ \ \ \ \ \ \text{y}\epsilon \text{S}_\text{y}}, \\ &=\int_{\text{x}=0}^{2}{\frac{1}{20}(2\text{x}+5\text{y})\text{dx}} \\ &=\frac{1}{20}\left[\frac{2}{2}\text{x}^2+5\text{xy}\right]_{\text{x}=0}^2 \\ &=\frac{1}{20}(\left[2^2+5\text{y}\times2\right]-{[2}^0+5\text{y}\times0]) \\ &=(4+10\text{y})/20 \\ \therefore \text{f}_\text{Y}\left(\text{y}\right) & =\frac{4+10\text{y}}{20}=\frac{2+5\text{y}}{10} \end{align*} $$


$$ \begin{align*} \text{g}\left(\text{x}|\text{y}\right)&=\frac{\frac{1}{20}\left(2\text{x}+5\text{y}\right)}{\frac{1}{20}\left(4+10\text{y}\right)}=\frac{2\text{x}+5\text{y}}{2(2+5\text{y})} \\ &=\frac{2\text{x}+5\text{y}}{2(2\text{x}+5\text{y})},\ \ 0<\text{x} < 2 \end{align*} $$

The conditional pdf of Y given that \( \text{X}=\text{x} \) is;

$$ \text{h}\left(\text{y}|\text{x}\right)=\frac{\text{f}_{\text{XY}}\left(\text{x},\text{y}\right)}{\text{f}_\text{X}\left(\text{x}\right)},\ \ \ \ \ \text{ provided that } \text{f}_\text{X}\left(\text{x}\right)>0; $$

Learning Outcome

Topic 3.b: Multivariate Random Variables – Determine conditional and marginal probability functions, probability density functions, and cumulative distribution functions.