Calculate variance, standard deviation ...
Variance and Standard Deviation for Conditional Discrete Distributions In the previous readings, we... Read More
In the field of probability and statistics, we often encounter experiments that involve multiple events occurring simultaneously. For example:
These events might be related to each other or have an intersection. A fundamental understanding of joint probability functions, probability density functions, and cumulative distribution functions is essential. These concepts play a pivotal role in various practical applications, making them a key focus area for the SOA Exam P.
Let \(X\) and \(Y\) be two discrete random variables defined on a two-dimensional discrete space, \(S\).
The joint probability mass function of \(X\) and \(Y\) is defined as:
$$ f\left(x, y\right)= P \left(X = x, Y = y\right) $$
In other words, this means that \(f\left(x, y\right)\) gives us the probability that random variable \(X\) takes the value \(x\), and random variable \(Y\) takes the value \(y\) simultaneously.
The conditional probability function of \(X\), given that \(Y = y\), is given by:
$$ P\left(X=x\middle| Y=y\right)=p\left(x\middle| y\right)=\frac{p\left(x,y\right)}{p_X\left(x\right)} $$
An insurance company collects data on accidents in different regions. The data is characterized as follows, with the number of accidents, \(X\), in each region:
$$ \begin{array}{c|c|c|c}
\text{Region} & \bf{X=0} & \bf{X=1} & \bf{X=2} \\ \hline
A & 0.23 & 0.10 & 0.13 \\ \hline
B & 0.10 & 0.15 & 0.02 \\ \hline
C & 0.01 & 0.18 & 0.08
\end{array} $$
Find the conditional distribution of \(X|\text{Region } A\).
Solution:
To determine the conditional distribution of \(X\) given Region A, denoted as \(P(X=x | \text{Region } A)\), we will use the conditional probability formula:
$$ P(X=x |\text{Region } A)= \frac{P\left(X=x \text{ and Region } A\right)}{P\left(\text{Region } A\right)} $$
First, we need to find \(P(\text{Region } A)\), which we can directly calculate from the table:
$$ P\left(\text{Region } A\right)=0.23+0.10+0.13=0.46 $$
Therefore, we can now calculate the conditional probabilities for each value, \(x\):
$$ \begin{align*}
P\left(X=0\middle| \text{Region } A\right) & = \frac{0.23}{0.46} = 0.5 \\
P\left(X=1\middle| \text{Region } A\right) & = \frac{0.10}{0.46} =0.2174 \\
P(X=2 |\text{Region } A) & =\frac{0.13}{0.46}= 0.2826 \end{align*} $$
So, the conditional distribution of \(X\) given Region \(A\) is as follows:
$$ \begin{array}{c|c}
X|\text{Region } A & P(X|\text{Region } A) \\ \hline
0 & 0.5 \\ \hline
1 & 0.2174 \\ \hline
2 & 0.2826
\end{array} $$
Note that,
$$ \text{Conditional probability}=\frac{\text{Joint probability}}{\text{Marginal probability}} $$
We will discuss this in detail in the next reading.
Suppose a certain local bank had three deposit or withdrawal counters. Two investors arrive at the counters at different times when the counters are serving no other customers. Each investor chooses a counter at random, independently of the other.
Let \(X\) be the number of investors who select counter 1, and let \(Y\) be the number of investors who select counter 2.
Solution
First, we have to consider the sample space associated with the experiment.
Let the pair \(\left\{I,j\right\}\) represent the simple event that the first investor selects counter \(i\) and the second investor chooses counter \(j\), where \(I, j=1,2, \text{ and } 3\).
By using the mn rule, the sample space consists of:
$$ 3\times3=9 \text{ sample points} $$
Therefore, each sample point is equal and has a probability of \(\frac{1}{9}\)
Thus, the sample space for the experiment is given as:
$$ S=\left[\left\{1,1\right\},\left\{1,2\right\},\left\{1,3\right\},\left\{2,1\right\},\left\{2,2\right\},\left\{2,3\right\},\left\{3,1\right\},\left\{3,2\right\},\left\{3,3\right\}\right] $$
We know that:
$$ f \left(x, y\right)= P \left(X =x, Y = y\right) $$
Therefore, the joint probability of \(X\) and \(Y\) is given as follows:
$$ \begin{array}{c|c|c|c|c} {\begin{matrix} X \\ \huge{\diagdown} \\ Y \end{matrix}} & {0} & {1} & {2} \\ \hline 0 & \frac{1}{9} & \frac{2}{9} & \frac{1}{9} \\ \hline 1 & \frac{2}{9} & \frac{2}{9} & 0 \\ \hline 2 & \frac{1}{9} & 0 & 0 & \end{array} $$
We know that,
$$ f \left(x, y\right)= P \left(X = x, Y = y\right) $$
Thus,
$$ \begin{align*} P\left(X=2, Y=0 \text{ or } 1\right) & =P\left(X=2, Y=0\right)+P\left(X=2, Y=1\right) \\
& =\frac{1}{9}+0=\frac{1}{9} \\
& =\frac{1}{9}+0=\frac{1}{9} \end{align*} $$
We are required to find \(P\left(Y=2\right)\), and since it does not depend on the value of \(X\), it is the same as finding \(P\left(Y=2, X=0,1,2\right)\). That is, we are summing over all the possible values of \(X\).
Thus,
$$ P\left(Y=2\right)=\frac{1}{9}+0+0=\frac{1}{9} $$
An insurance company collects data on the number of claims made by male and female policyholders. Let \(X\) be the number of claims from males, and \(Y\) be the number of claims from females. \(X\) and \(Y\) have the following joint probability distribution:
$$ f\left(x,y\right)=\frac{y}{9x},\ \ \ \text{for } x=1, 2;\ y=1, 2, 3 $$
Calculate \(P\left(X+\frac{Y}{2}=2\right)\).
Solution
We first determine the pairs \(\left(x,y\right)\) which satisfy the condition that \(x+\frac{y}{2}=2\).
\(x+\frac{y}{2}=2\) only for the pair (1, 2).
Now, we can proceed to calculate the required probability:
$$ P\left(X+\frac{Y}{2}=2\right)=\frac{2}{9\times1}=\frac{2}{9} $$
An analyst is concerned about the annual number of tsunamis in two countries, \(M\) and \(N\).
Let \(X\) and \(Y\) be the annual number of tsunamis in countries \(M\) and \(N\), respectively.
The analyst determines that \(X\) and \(Y\) are jointly distributed as below:
$$ f\left(x,y\right)=\frac{xy}{10},\ \ \ \text{for } \ x=0, 1;y=0, 1, 2, 3, 4 $$
Calculate \(P\left(X+Y \lt 3\right)\).
Solution
\(x+y \lt 3\) for the pairs, \(\left(0,0\right); \left(0,1\right); \left(0,2\right); \left(1,0\right)\) and \(\left(1,1\right)\)
Therefore,
$$ \begin{align*} P\left(X+Y \lt 3\right) & =\frac{0}{10}+\frac{0}{10}+\frac{0}{10}+\frac{0}{10}+\frac{1}{10} \\
& =\frac{1}{10} \end{align*} $$
The joint cumulative distribution function, \(F_{XY}\left(x,y\right)\) of two discrete random variables, \(X\) and \(Y\), is defined as the probability that the random variable \(X\) is less than or equal to a specified value of \(x\) and that the random variable \(Y\) is less than or equal to a specified value of \(y\), namely,
$$ F_{XY}\left(x,y\right)=P\left(X\le x, Y\le y\right) $$
Now, consider an experiment involving a sample of size \(n\), i.e., \(X_1, X_2,\ldots, X_n\). The cumulative distribution function of \(X_1, X_2,\ldots, X_n\) is given by:
$$ F\left(X_1, X_2,\ldots, X_n\right)=\sum_{w_1\le x_1}\sum_{w_2\le x_2}{\ldots\sum_{w_n\le x_n}f\left(w_1, w_2, \ldots., w_n\right)} $$
The following result holds for two random variables, \(X\) and \(Y\):
$$ P\left(x_1 \lt X\le x_2, {y}_1 \lt Y\le y_2\right)= F\left(x_2,y_2\right)+ F\left(x_1, y_1\right)- F\left(x_1, y_2\right)- F\left(x_2,y_1\right) $$
The above result holds if and only if \(x_1\lt x_2\) and \(y_1 \lt y_2\).
To prove the above results, suppose we have two discrete random variables, \(X\) and \(Y\), and we wish to find the probability that \(x_1 \lt x_2\) and \(y_1 \lt y_2\). This can be expressed as:
$$ P\left(x_1 \lt X\le x_2, { y}_1 \lt Y\le y_2\right) $$
Now, we can break this into four cases depending on whether \(X\) is less than or equal to \(x_1\) and \(Y\) is less than or equal to \(y_1\):
$$ P\left(x_1 \lt X\le x_2, {y}_1 \lt Y\le y_2\right) $$
$$ \begin{align*} = & P\left(X\le x_1, Y\le y_1\right)+ P\left(X\le x_2, Y \gt y_1\right) \\ + & P\left(X \gt x_1, Y \le y_1\right)+ P\left(X \gt x_1, Y \gt y_1\right) \end{align*} $$
From the definition of the joint cumulative distribution function,
$$ P\left(X \le x_1, Y \le y_1\right)= F\left(x_1, y_1\right) $$
$$ P\left(X\le x_2, Y \gt y_1\right)= F\left(x_2, y_2\right)- F\left(x_2, y_1\right) $$
$$ P\left(X \gt x_1, Y\le y_1\right)= F\left(x_1, y_2\right)- F\left(x_1, y_1\right) $$
$$ P\left(X \gt x_1, Y \gt y_1\right)=1-F\left(x_1, y_2\right)-F\left(x_2, y_1\right)+ F\left(x_1, y_1\right) $$
And when we substitute the above equations in the original equation, we get:
$$ P\left(x_1 \gt X\le x_2, { y}_1 \lt Y\le y_2\right)= F\left(x_2,y_2\right)+ F\left(x_1, y_1\right)- F\left(x_1, y_2\right)- F\left(x_2,y_1\right) $$
An actuary is conducting an analysis of the number of days of sickness and the number of medical appointments for a group of policyholders. Let \(X\) be the random variable representing the number of days of sickness, and \(Y\) be the random variable representing the number of medical appointments.
The joint probability mass function (pmf) for \(X\) and \(Y\) is given in the table below:
$$ \begin{array}{c|c|c|c|c} {\begin{matrix} X \\ \huge{\diagdown} \\ Y \end{matrix}} & {0} & {1} & {2} \\ \hline 0 & \frac{1}{8} & \frac{1}{6} & \frac{1}{4} \\ \hline 1 & \frac{1}{6} & \frac{1}{8} & \frac{1}{6} \end{array} $$
Find \(F_{XY}\left(0.5,1\right)\)
Solution
By definition of a joint cumulative distribution function,
$$ \begin{align*} F_{XY}\left(0.5,1\right) & =P\left(X\le 0.5, Y\le 1\right) \\
& =P_{XY}\left(0,0\right)+P_{XY}\left(0,1\right)=\frac{1}{8}+\frac{1}{6}=\frac{7}{24} \\
\therefore F_{XY}\left(0.5,1\right) & =\frac{7}{24} \end{align*} $$
An insurance company operates in two neighboring cities, \(A\) and \(B\). In June 2022, they collected data on the number of road accidents in each city. Let \(X\) represent the number of accidents in city \(A\), and \(Y\) represent the number of accidents in city \(B\). \(X\) and \(Y\) have the following joint cumulative distribution function:
$$ F\left(x,y\right)=\left({0.8}^x\right)\left({0.2}^y\right), \text{ for } x=0, 1, 2\ldots \text{ and } y=0, 1,2\ldots $$
Find the probability that in June 2022, we will have exactly 3 claims from city \(A\) and exactly 3 claims from city \(B\).
Solution:
We wish to find \(P\left(X=3, Y=3\right)\).
We know that,
$$ F\left(x,y\right)=P\left(X\le x, Y\le y\right) $$
We also know that,
$$ P\left(x_1 \lt X\le x_2, { y}_1 \lt Y\le y_2\right)= F\left(x_2,y_2\right)+ F\left(x_1, y_1\right)- F\left(x_1, y_2\right)- F\left(x_2,y_1\right) $$
$$ \begin{align*}
\Rightarrow P\left(X=3, Y=3\right) & =F\left(3,3\right)-F\left(2,3\right)-F\left(3,2\right)+F\left(2,2\right)\\
& =\left({0.8}^3\right)\left({0.2}^3\right)+\left({0.8}^2\right)\left({0.2}^2\right)-\left({0.8}^2\right)\left({0.2}^3\right) \\ & -\left({0.8}^3\right)\left({0.2}^2\right) \\
& =0.004096 \end{align*} $$
Question
A clinical trial is testing a new medication that either improves a patient’s condition (represented by 1) or has no effect (represented by 0). Let \(X\) represent the actual effect of the medication on a patient, and let \(Y\) represent the observed effect as reported by the patient. The joint probability function of \(X\) and \(Y\) is given by:
- \(P[X = 0, Y = 0] = 0.700\)
- \(P[X = 1, Y = 0] = 0.100\)
- \(P[X = 0, Y = 1] = 0.050\)
- \(P[X = 1, Y = 1] = 0.150\)
Calculate the variance of the observed effect given that the actual effect is positive, \(Var(Y∣X = 1)\)
- 0.12
- 0.21
- 0.24
- 0.35
- 0.42
Solution
The correct answer is C.
First, we need to calculate the conditional probabilities \(P\left(Y=0\middle| X=1\right)\) and \(P\left(Y=1\middle| X=1\right)\)
The conditional probability \(P(Y=0|X=1)\) is calculated as:
$$ P\left(Y=0\middle| X=1\right)=\frac{P\left(X=1,Y=0\right)}{P\left(X=1\right)} $$
We know that \(P\left(X=1\right)=P\left(X=1,Y=0\right)+P\left(X=1,Y=1\right)=0.100+0.150=0.250\)
Now, we calculate \(P(Y=0|X=1)\):
$$ P\left(Y=0\middle| X=1\right)=\frac{0.100}{0.250}=0.4 $$
Similarly, we calculate \(P(Y = 1∣X = 1)\):
$$ P\left(Y=1\middle| X=1\right)=\frac{P\left(X=1,Y=1\right)}{P\left(X=1\right)}=\frac{0.150}{0.250}=0.6 $$
Given that \(Y\) is a Bernoulli random variable, the variance of \(Y\) given \(X\) is:
$$ Var(Y∣X = 1) = p(1 − p) $$
Where \(p = P(Y = 1∣X = 1)\). Substituting p with the calculated value:
$$ Var(Y∣X = 1) = 0.600(1 − 0.600) = 0.600(0.400) = 0.240 $$
Note: A Bernoulli random variable is a discrete random variable that takes the value 1 with probability \(p\) and the value 0 with probability \(1−p\). In the case of the problem you’ve provided, the outcome of the medication’s effect is a perfect scenario for a Bernoulli random variable because each trial (i.e., each patient’s response to the medication) has only two possible outcomes.
Learning Outcome
Topic 3. a: Multivariate Random Variables – Explain and perform calculations concerning joint probability functions, probability density functions, and cumulative distribution function.