Determine conditional and marginal pro ...
Marginal Probability Distribution In the previous reading, we looked at joint discrete distribution... Read More
Order Statistics are distributions obtained when we look at test scores from a random sample arranged in ascending order, i.e., from the smallest to the largest. In recent years, the importance of order statistics has increased because of the more frequent use of nonparametric inferences and robust procedures. However, order statistics have always been prominent because, among other things, they are applied in determining rather simple statistics such as sample median, the sample range, etc.
For studying purposes, we will assume that a random sample \(X_{1}, X_{2}, \ldots, X_{n}\) comprising \(n\) independent observations is obtained from a continuous population random variable \(X\). Note that each observation \(X_{i}\) and the population random variable \(X\) have the same distribution. It is also worth noting that the probability of any two observations being equal is zero. That is, the probability is 1 that the observations can be ordered from the smallest to largest without having two equal values. Of course, in practice, we do frequently observe ties, but if the probability of a tie is small, the distribution theory that follows will hold approximately.
One can compute order statistics like sample median, sample range, and other statistics.
Let us look at a straightforward example:
In an experiment for \(n=5\) data points, \(x_{1}=0.45, x_{2}=0.96, x_{3}=0.65, x_{4}=0.76, x_{5}=0.25\) with each having a pdf \(f(x)=3 x^{2}, 0<x<1\).
Determine the sample median and sample range.
The order statistics are:
$$
y_{1}=0.25<y_{2}=0.45<y_{3}=0.65<y_{4}=0.76<y_{5}=0.96
$$
It is simple enough to note that \(y_{3}=0.65\) is the middle statistics, and this is equal to the sample median.
And if we do \(y_{5}-y_{1}=0.96-0.25=0.71\) is the value of the sample range.
Now, let \(Y_{1}, \ldots, Y_{5}\) be arbitrary and not known, and let’s assume that \(Y_{4}<\frac{1}{3}\), this means that the other 2 random variables must be less than this value, too, since they are ordered. This type of event can then be thought of as a binomial experiment for convenience. Now, the probability of success (the event that \(X_{i}<\frac{1}{3}\)) is:
$$
\text{P}\left(\text{X}_{i} \leq \frac{1}{3}\right)=\int_{0}^{\frac{1}{3}} 3 \text{x}^{2} d x=\frac{1}{27}
$$
Note that we must have at least four successes so that:
$$
\text{P}\left(\text{X}_{3} \leq \frac{1}{3}\right)=\left(\begin{array}{l}
5 \\
4
\end{array}\right)\left(\frac{1}{27}\right)^{4}\left(\frac{26}{27}\right)+\left(\frac{1}{27}\right)^{5}=0.00000233
$$
Now, we can use the same analogy as in the above example to find the cdf of \(Y_{3}\), which we can denote as \(F(y)\).
We know that,
$$
\text{F(y)}=\text{P}\left(\text{Y}_{3}<\text{y}\right)
$$
And from the order statistics:
$$
\text{P}\left(\text{X}_{i}<\text{y}\right)=\int_{0}^{\text{y}} 3\text{x}^{2}\text{d x}=\left[\text{x}^{3}\right]_{0}^{\text{y}}=\text{y}^{3}
$$
Then,
$$
\begin{aligned}
\text{F(y)} &=\text{P}\left(\text{Y}_{3}<\text{y}\right) \\
&=\left(\begin{array}{l}
5 \\
4
\end{array}\right)\left(\text{y}^{3}\right)^{4}\left(1-\text{y}^{3}\right)+\left(\text{y}^{3}\right)^{5}
\end{aligned}
$$
Now, to find the probability distribution function \(f(y)\) for \(0<y<1\), we will simply differentiate the cumulative function, \(F(y)\), i.e.,
$$
\begin{aligned}
\text{f(y)} &=\text{F}^{\prime}(\text{y}) \\
&=\frac{5 !}{3 ! 1 !}\left[\left(\text{y}^{3}\right)^{4}\left(1-\text{y}^{3}\right) 3 \text{y}^{2}\right]
\end{aligned}
$$
From the above example, we can generalize the results of order statistics.
Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent observations of a random sample of size \(n\) from a continuous population, \(X\) with \(\operatorname{cdf} F(x)\) and pdf \(F^{\prime}(x)=f(x)\). Let \(Y_{1}<Y_{2}<\cdots<Y_{n}\) denote the order statistic of that sample. i.e., arranged from the smallest to the largest, namely,
\(Y_{1}=\) smallest of \(X_{1}, X_{2}, \ldots, X_{n}\)
\(Y_{2}=\) second smallest of \(X_{1}, X_{2}, \ldots, X_{n}\)
\(\vdots \)
\(Y_{n}=\) largest of \(X_{1}, X_{2}, \ldots, X_{n}\)
There is a very simple procedure for determining the cdf of the \(r^{t h}\) order statistic, \(Y_{r}\), and majorly depends on the binomial distribution.
The event that the \(r^{t h}\) order statistic \(Y_{r}\) is at most \(y, Y_{r} \leq y\), can occur if and only if at least \(r\) of the \(n\) independent observations is less than or equal to \(y\). That is, the probability of “success” on each trial is \(F(y)\), and we must have at least \(r\) successes. Thus, using the binomial distribution with probability of success, \(p=F(y)\), the cdf of \(Y_{r}\) is given by,
$$
\text{G}_{\text{r}}(\text{y})=\text{P}\left(Y_{r} \leq \text{y}\right)=\sum_{\text{k=r}}^{\text{n}}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y)}]^{\text{n}-\text{k}}
$$
Rewriting this, we have,
$$
\text{G}_{\text{r}}(\text{y})=\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y})]^{\text{n}-\text{k}}+[\text{F(y)}^{\text{n}}
$$
Hence, the pdf of \(Y_{r}\) is
$$
\begin{aligned}
\text{g}_{\text{r}}(\text{y})=\text{G}_{\text{r}}^{\prime}(\text{y}) &=\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)(\text{k})[\text{F(y)}]^{\text{k}-1}\text{f(y)}[1-\text{F(y)}]^{\text{n}-\text{k}} \\
&+\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}(\text{n}-\text{k})[1-\text{F(y)}]^{\text{n}-\text{k}}[-\text{f(y)}]+\text{n}[\text{F(y)}]^{\text{n}-1} \text{f(y)} \ldots \ldots \ldots \ldots \text{eqn*}
\end{aligned}
$$
But,
$$\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right) \text{k}=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !} \quad \text { and }\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)(\text{n}-\text{k})=\frac{n !}{\text{k} !(\text{n}-\text{k}-1) !}
$$
Then replacing in \(\text{eqn*}\) above,
$$
\text{g}_{\text{r}}(\text{y})=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !}[\text{F(y)}]^{\text{r}-1}[1-\text{F(y)}]^{\text{n}-\text{r}} \text{f(y)}, \quad \text{a}<\text{y}<\text{b}
$$
This is actually the first term of the first summation in \(\text{eqn*}\). On the other hand, the remaining terms in \(\text{eqn*}\) sum to zero because the second term of the first summation (when \(k=r+1\) ) equals the negative of the first term in the second summation (when \(k=r\) ), and so on. Finally, the last term of the second summation equals the negative of \(n[F(y)]^{n-1} f(y)\).
So, in summary,
1. The cdf of \(Y_{r}\) is given by,
$$\text{G}_{r}(\text{y})=\text{P}\left(\text{Y}_{\text{r}} \leq \text{y}\right)=\sum_{\text{k}=\text{r}}^{\text{n}}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y)}]^{\text{n}-\text{k}}
$$
2. The pdf of \(Y_{r}\) is given by,
$$
\text{g}_{\text{r}}(\text{y})=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !}[\text{F(y)}]^{\text{r}-1}[1-\text{F(y)}]^{\text{n}-\text{r}}\text{ f(y)}
$$
Recall that for an order statistic \(Y_{1}, Y_{2}, \ldots, Y_{n}, Y_{1}\) is the smallest(the minimum) of \(X_{1}, X_{2}, \ldots, X_{n}\) and \(Y_{n}\) is the largest(the maximum) of \(X_{1}, X_{2}, \ldots, X_{n}\), namely,
$$
\text{Y}_{1}=\min \left(\text{X}_{1}, \text{X}_{2}, \ldots, \text{X}_{\text{n}}\right)
$$
and,
$$
\text{Y}_{\text{n}}=\max \left(\text{X}_{1}, \text{X}_{2}, \ldots, \text{X}_{\text{n}}\right)
$$
It can be shown that the pdf of the smallest(minimum) order statistic is
$$
\text{g}_{1}(\text{y})=\text{n}[1-\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b};
$$
and the pdf of the largest(maximum) order statistic is
$$
\text{g}_{n}(\text{y})=\text{n}[\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b}
$$
Two machines in the manufacturing industry each has operating life (in years) \(Y\) with a pdf given by
$$
\text{f(y)}= \begin{cases}\frac{1}{200} e^{-\frac{\text{y}}{200}}, \quad\text{y}>0 \\ 0, \quad \quad\quad \quad \text { otherwise }\end{cases}
$$
The machines operate independently, but if one machine breaks down, the manufacturing process must be stopped.
Find the pdf of \(X\), the length of the time of the manufacturing process.
Since manufacturing stops when one machine fails, then \(X\) must be:
$$
\text{X}=\min \left(\text{Y}_{1},\text{Y}_{2}\right)
$$
Where \(Y_{1}\) and \(Y_{2}\) are independent random variables with the given pdf defined above.
We know that,
$$
\text{g}_{\text{X}}(\text{y})=\text{n}[1-\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b};
$$
Now,
$$
\text{F(y)}=\int_{0}^{\text{y}} \frac{1}{200}\text{e}^{-\frac{\text{t}}{200}}\text{dt}=-\text{e}^{-\frac{\text{y}}{200}}+1
$$
Thus,
$$
g_{\text{X}}(\text{y})=2\left[1-\left(-e^{-\frac{\text{y}}{200}}+1\right)\right]^{\text{n}-1} \cdot \frac{1}{200} e^{-\frac{y}{200}}=\frac{1}{100} e^{-\frac{\text{y}}{100}}
$$
More precisely,
$$
g_{\text{X}}(\text{y})= \begin{cases}\frac{1}{100} e^{-\frac{y}{100}}, \quad \text{y}>0 \\ 0, \quad \quad \quad\text { otherwise }\end{cases}
$$
Note that the mean life of each machine is 200 years while manufacturing life is 100 years.
Learning Outcome
Topic 3. f: Multivariate random Variables-Determine the distribution of order statistics from a set of independent random variables.