Determine the distribution of a transformation of jointly distributed random variables

Consider a transformation of one random variable \(X\) with pdf \(f(x)\). Let’s think about the continuous case, let \(Y = u(X)\) be an increasing or decreasing function of \(X\), with inverse \(X = v(Y)\), then the pdf of \(Y\) was

$$ g(y) = |v'(y)|f[v(y)],\quad c < y < d, $$

where the support \(c < y < d\) corresponds to the support of \(X\), say, \(a < x < b\), through the transformation \(x = v(y)\).

There is one remark we must pay attention to: If the function \(Y = u(X)\) does not have a single-valued inverse, the determination of the distribution of \(Y\) will not be as simple.

If then we think about the bivariate case: In the case of a single-valued inverse, the rule is about the same as that in the one-variable case, with the derivative being replaced by the Jacobian. That is, if \(X_1\) and \(X_2\) are two continuous-type random variables with joint pdf \(f(x_1,x_2)\), and \(Y_1 = u_1(X_1,X_2)\), \(Y_2 = u_2(X_1,X_2)\) has the single-valued inverse \(X_1 = v_1(Y_1,Y_2)\),\(X_2 = v_2(Y_1,Y_2)\), then the joint pdf of \(Y_1\) and \(Y_2\) is

$$g(y_1,y_2) = |J|f[v_1(y_1,y_2),v_2(y_1,y_2)],\quad (y_1,y_2)\in S_Y,$$

where the Jacobian \(J\) is the determinant

\begin{equation*} J = \begin{bmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2}\\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{bmatrix} = \frac{\partial x_1}{\partial y_1} \frac{\partial x_2}{\partial y_2} – \frac{\partial x_1}{\partial y_2} \frac{\partial x_2}{\partial y_1} \ne 0 \end{equation*}

We find the support \(S_Y\) of \(Y_1,Y_2\) by considering the mapping of the support \(S_X\) of \(X_1,X_2\) under the transformation \(y_1 = u_1(x_1,x_2)\), \(y_2 = u_2(x_1,x_2)\). This method of finding the distribution of \(Y_1\) and \(Y_2\) is called the change-of-variables technique.

It is often the mapping of the support \(S_x\) of \(X_1,X_2\) into that(say, \(S_Y\)) of \(Y_1,Y_2\) which causes the biggest challenge. That is, in most cases, it is easy to solve for \(x_1\) and \(x_2\) in terms of \(y_1\) and \(y_2\), say, $$ x_1=v_1(y_1,y_2),\quad x_2=v_2(y_1,y_2),$$ and then compute the Jacobian

\begin{equation*} J = \begin{bmatrix} \frac{\partial v_1(y_1,y_2)}{\partial y_1} & \frac{\partial v_1(y_1,y_2)}{\partial y_2}\\ \frac{\partial v_2(y_1,y_2)}{\partial y_1} & \frac{\partial v_2(y_1,y_2)}{\partial y_2} \end{bmatrix} \end{equation*}

Let’s see an example:

Example 1: Let \(X_1\) and \(X_2\) have a joint pdf \(f(x_1,x_2)\). Let \(Y_1 = X_1 + X_2\), \(Y_2 = X_1 – X_2\) be a transformation of \(X_1,X_2\). Find the joint density function of \(Y_1\) and \(Y_2\) in terms of \(f_{X_1,X_2}.\)

Applying the change of variable technique we would have to do some arrangements:

\begin{align} Y_1 & = X_1 – X_2\\ X_1 & = Y_1 + X_2\\ & \text{Taking from equation number 7.7} \nonumber\\ X_1 & = Y_1 + Y_2 – X_1\\ 2X_1 & = Y_1 + Y_2\\ X_1 & = \frac{Y_1 + Y_2}{2} \end{align} And for \(X_2\) \begin{align} Y_2 & = X_1 + X_2\\ X_2 & = Y_2 – X_1\\ & \text{Taking from equation number 7.2}\nonumber \\ X_2 & = Y_1 – X_2 – Y_2\\ 2X_2 & = Y_1 – Y_2\\ X_2 & = \frac{Y_1 – Y_2}{2} \end{align} Now we can find our desired value: \begin{equation*} J = \begin{bmatrix} \frac{1}{2} & \frac{1}{2}\\ \frac{1}{2} & -\frac{1}{2} \end{bmatrix} =\frac{1}{2}\frac{1}{2} – \frac{1}{2}\bigg(-\frac{1}{2}\bigg) = \frac{1}{2} \end{equation*}

Applying the formula then

$$g(y_1,y_2) = \bigg|\frac{1}{2}\bigg|f(x_1,x_2) = \frac{1}{2}f\bigg(\frac{y_1 + y_2}{2},\frac{y_1 – y_2}{2}\bigg)$$

If we defined any function to \(X_1,X_2\) we would only need to add it to this equation and change the bounds if needed. After this is done, one could do all the calculations we have done on previous chapters, find distributions, mean, and everything that could be done to a joint pmf.

Note: In another texts one will have that \(g(y_1,y_2) = f_{x_{1},x_{2}}(x_1,x_2)|J(x_1,x_2)|^{-1}\) this is the same as the calculations we are doing and one can prove that \(|J(x_1,x_2)|^{-1} = |J(y_1,y_2)|\) the method chosen is a preference in which finds easier or more comfortable working with.

So the Jacobian matrix would look like this:

\begin{equation*} J = \begin{bmatrix} \frac{\partial g_1}{\partial x_1} & \frac{\partial g_1}{\partial x_2}\\ \frac{\partial g_2}{\partial x_2} & \frac{\partial g_2}{\partial x_2} \end{bmatrix} \end{equation*}

For instance, let’s take a simple example.

Example 2: Let \(X_1\) and \(X_2\) have a joint pdf \(f(x_1,x_2)\). Let \(Y_1 = \frac{X_1}{X_2} \), \(Y_2 = X_2\) be a transformation of \(X_1,X_2\). Find the joint density function of \(Y_1\) and \(Y_2\) in terms of \(f_{X_1,X_2}.\)

First, let’s find the respective derivatives:

$$\frac{\partial Y_1}{\partial X_1} = \frac{(1)X_2 – (0)X_1}{X^2_2}=\frac{1}{X_2} ;\qquad \frac{\partial Y_1}{\partial X_2} = \frac{(0)X_2 – (1)X_1}{X^2_2}=\frac{X_1}{X_2}$$

$$\frac{\partial Y_2}{\partial X_1} = 0 ;\qquad \frac{\partial Y_2}{\partial X_2} = 1$$

We will have then \(Det(J) = \frac{1}{X_2}\). And by how \(Y_1, Y_2\) are defined we know then \(X_1 = Y_1 \cdot Y_2\) and \(X_2 = Y_2\). This would lead us to:

$$g(y_1,y_2) = y_2 f(y_1 y_2,y_2)$$

Let’s say this pair of variables had a joint pmf:

$$f(x_1,x_2) = x_1 + x_2 \qquad 0\leq x_1 \leq 1, 0 \leq x_2 \leq 2$$

If we did the transformation as we had explained then it’d be transformed into:

$$g(y_1,y_2) = y_2\cdot y_2(y_1+1) \qquad 0 \leq y_1 \leq 2, 0 \leq y_2 \leq 2$$

It is easy enough to map \(S_X\) into \(S_Y\) for this exercise.

Determine the distribution of order statistics from a set of independent random variables

 

Order statistics

Order statistics are the observations of the random sample, arranged in magnitude from the smallest to the largest. In recent years, the importance od order statistics has increases because of the more frequent use of nonparametric inferences and robust procedures. However, order statistics have always been prominent because, among other things, they are needed to determine rather simple statistics such as the sample median, the sample range, and the empirical cdf.

For studying purposes we will assume that the \(n\) independent observations come from a continuous-type distribution. This means, among other things, that the probability of any two observations being equal is zero. That is, the probability is \(1\) that the observations can be ordered from smallest to largest without having two equal values. Of course, in practice, we do frequently observe ties; but if the probability of a tie is small, the distribution theory that follows will hold approximately.

Transformation of order statistics

If \(X_1,X_2,\cdots,X_n\) are observations of a random sample of size \(n\) from a continuous-type distribution, we let the random variables

$$ Y_1 < Y_2 < \cdots < Y_n $$

denote the order statistics of that sample. That is,

\begin{align*} Y_1 & = \text{smallest of } X_1,X_2,\cdots,X_n\\ Y_2 &= \text{second smallest of } X_1,X_2,\cdots,X_n\\ & \vdots\\ Y_n & =\text{largest } X_1,X_2,\cdots,X_n \end{align*}

The joint density function of the order statistics is obtained by noting that the order statistics \(Y_1,\cdots, Y_n\) will take values \(y_1 \leq y_2, \leq \cdots \leq y_n\) if and only if, for some permutation \((i_1,i_2,\cdots,i_n)\) of \((1,2,\cdots,n)\),

$$Y_1 = y_{i_1}, Y_2=y_{i_2},\cdots,Y_n = y_{i_n}$$

There is a very simple procedure for determining the cdf of the rth order statistic, \(Y_r\).

For the order statistics one can compute things like sample median, sample range, and any other statistics. Let’s see one simple example:

Example 1: In an experiment for n= 5 data points \(x_1 = 0.34 , x_2 = 0.54 , x_3 = 0.43, x_4 = 0.67, x_5 = 0.14\) had a joint pdf \(f(x) = 3x, 0 < x < 1\). The order statistics found were:

$$y_1 = 0.25 < y_2 = 0.45 < y_3 = 0.65 < y_4 = 0.76 < y_5 = 0.96.$$

It is simple enough to note that \(y_3\) is the middle statistic and this is equal to the sample median. And if we do \(y_5 – y_1 = 0.96 – 0.25 = 0.71\) is the value of the sample range.

Example 2: Let’s take the same distribution from the past exercise. This time let \(Y_1, \cdots , Y_5\) be arbitrary and not known, and let’s assume that \(Y_3 < 1/3\), this means that the other 2 random variables must be less than this value too since they are ordered. This type of event then can be thought as a binomial experiment for convenience then:

$$P(X_i <= 1/3) = \int_{0}^{1/3}3xdx = \frac{3}{2}\cdot\frac{1^2}{3^2}= \frac{1}{6} $$

Then,

$$ P(X_3 <= 1/3) = \binom{5}{3}\bigg(\frac{1}{6}\bigg)^3\bigg(\frac{5}{6}\bigg)^2 + \binom{5}{4}\bigg(\frac{1}{6}\bigg)^4\bigg(\frac{5}{6}\bigg) + \bigg(\frac{1}{6}\bigg)^5= 0.035.$$

If then we decided to apply a transformation for this function we have been using as an example, we would reach an interesting result:

Example 3: Applying that for any \(X_i\) it would be an arbitrary value:

$$G(y) = P(Y_3 < y) = \binom{5}{3}\bigg(\frac{3y^2}{2}\bigg)^3\bigg(1-\frac{3y^2}{2}\bigg)^2 + \binom{5}{4}\bigg(\frac{3y^2}{2}\bigg)^4\bigg(1-\frac{3y^2}{2}\bigg) + \bigg(\frac{3y^2}{2}\bigg)^5$$

where \(3y^2/2\) is nothing more than what we’d get if we did the same integral we did on example 2 on the region \([0,y]\). Then transforming \( G(y)\) for \(0 < y < 1\)

\begin{align} g(y) = G'(y) & = \binom{5}{3}3\bigg(\frac{3y^2}{2}\bigg)^2(3y)\bigg(1-\frac{3y^2}{2}\bigg)^2+\binom{5}{3}\bigg(\frac{3y^2}{2}\bigg)^3(2)\bigg(1-\frac{3y^2}{2}\bigg)(-3y) \nonumber\\ & + \binom{5}{4}4\bigg(\frac{3y^2}{2}\bigg)(3y)\bigg(1-\frac{3y^2}{2}\bigg)^2 + \binom{5}{4}\bigg(\frac{3y^2}{2}\bigg)^4(-3y)+ 5\bigg(\frac{3y^2}{2}\bigg)^4(3y)\\ & = 9y\binom{5}{3}\bigg(\frac{3y^2}{2}\bigg)^2\bigg(1-\frac{3y^2}{2}\bigg)^2 – 6y\binom{5}{3}\bigg(\frac{3y^2}{2}\bigg)^3\bigg(1-\frac{3y^2}{2}\bigg) \nonumber\\ & + 12y \binom{5}{4}\bigg(\frac{3y^2}{2}\bigg)^3\bigg(1-\frac{3y^2}{2}\bigg) – 3y\binom{5}{4}\bigg(\frac{3y^2}{2}\bigg)^4 + 15y \bigg(\frac{3y^2}{2}\bigg)^4 \end{align}

Since we know that \(F(x) = 3/2x^2\) when \(0 < x < 1 \) we can do some replacing:

\begin{align} g(y) & = \binom{5}{3}3f(x)[F(x)]^2[1-F(x)]^2 -\binom{5}{3}2f(x)[F(x)]^3[1-F(x)] \nonumber \\ & + \binom{5}{4}4f(x)[F(x)]^3[1-F(x)]-\binom{5}{4}f(x)[F(x)]^4 + 5f(x)[F(x)]^4, \qquad 0 < y < 1 \end{align}

And then we have built a transformation for \(f(x)\) based on a binomial example. For this case the transformation is more complex than the original function and transforming these values would not be worth it if something was changed for our original values.

At the end of this example we used an interesting formula, let’s construct it: Let \(Y_1 < Y_2 < \cdot < Y_n\) be the order statistics of \(n\) independent observations from a distribution of the continuous type with cdf \(F(x)\) and pdf \(F'(x) = f(x)\), where \(0 < F(x) < 1\) for \(a < x < b\) and \(F(a) = 0, F(b) = 1.\) (It is possible that \(a=-\infty\) and/or \(b=+\infty\).) The event that the rth order statistic \(Y_r\) is at most \(y\), \({Y_r\leq y}\), can occur if and only if at least r of the n observations are less than or equal to y. That is, the probability of “success” on each trial is \(F(y)\), and we must have at least r successes. Thus,

$$ G_r(y) = P(Y_r\leq y) = \sum_{k=r}^{n}\binom{n}{k}[F(y)]^k[1-F(y)]^{n-k}.$$

Rewriting this, we have

$$G_r(y) = \sum_{k=r}^{n-1}\binom{n}{k}[F(y)]^k[1-F(y)]^{n-k}+[F(y)]^n.$$

Hence, the pdf of \(Y_r\) is

\begin{align} gr(y) = G_r'(y) = & \sum_{k=r}^{n-1}\binom{n}{k}(k)[F(y)]^{k-1}f(y)[1-F(y)]^{n-k}\\ & + \sum_{k=r}^{n-1}\binom{n}{k}[F(y)]^k(n-k)[1-F(y)]^{n-k-1}[-f(y)]\\ & + n[F(y)]^{n-1}f(y). \end{align}

But

$$\binom{n}{k}k=\frac{n!}{(r-1)!(n-r)!}\quad\text{and}\quad\binom{n}{k}(n-k)=\frac{n!}{k!(n-k-1)!},$$

then replacing on the pdf of \(Y_r\)

$$g_r(y)=\frac{n!}{(r-1)!(n-r)!}[F(y)]^{r-1}[1-F(y)]^{n-r}f(y),\quad a < y < b,$$

which is the first term of the first summation in \(g_r(y)=G_r’\) on (8.14). The remaining terms in \(g_r(y) = G_r'(y)\) sum to zero because the second term of the first summation(when \(k = r + 1\)) equals the negative of the first term in the second summation (when \(k=r\)), and so on. Finally, the last term of the second summation equals the negative of \(n[F(y)]^{n-1}f(y)\).

We then see that the pdf of the smallest order statistic is

$$g_1(y) = n[1-F(y)]^{n-1}f(y),\quad a < y < b,$$

and the pdf of the largest order statistic is

$$g_n(y) = n[F(y)]^{n-1}f(y),\quad a < y < b.$$

Note: There is one way to construct the expression for the pdf of \(Y_r\). For this, we must recall the multinomial probability and then consider the probability element \(g_r(y)(\Delta y)\) of \(Y_r\). If the length \(\Delta y\) is small, \(g_r(y)(\Delta y)\) represent approximately the probability

$$P(y < Y_r \leq y + \Delta y).$$

Thus, we want the probability, \(g_r(y)(\Delta y)\) that \((r-1)\) trials fall less than \(y\), that (\(n-fr\)) trials are greater than \(y – \Delta y\), and that one trial falls between \(y\) and \(y + \Delta y\). Recall that the probabilities on a single trial are

\begin{align} P(X \leq y) & = F(y),\\ P(X > y + \Delta y) & = 1 – F(y + \Delta y) \approx 1 – F(y),\\ P(y < X \leq y + \Delta y) & \approx f(y)(\Delta y). \end{align}

As a result, the multinomial probability is approximately

$$ g_r(y)(\Delta y) = \frac{n!}{(r-1)!1!(n-r)!}[F(y)]^{r-1}[1-F(y)]^{n-r}[f(y)(\Delta y)].$$

If we divide both sides by the length of \(\Delta y\), the formula for \(g_r(y)\) results.

 

Learning Outcome

Topic 3.g: Multivariate Random Variables – Determine the distribution of a transformation of jointly distributed random variables. Determine the distribution of order statistics from a set of independent random variables.


X