Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10.

Determine the distribution of order statistics from a set of independent random variables

01 Sep 2022

Order Statistics

Order Statistics are distributions obtained when we look at test scores from a random sample arranged in ascending order, i.e., from the smallest to the largest. In recent years, the importance of order statistics has increased because of the more frequent use of nonparametric inferences and robust procedures. However, order statistics have always been prominent because, among other things, they are applied in determining rather simple statistics such as sample median, the sample range, etc.

For studying purposes, we will assume that a random sample $X_{1}, X_{2}, \ldots, X_{n}$ comprising $n$ independent observations is obtained from a continuous population random variable $X$. Note that each observation $X_{i}$ and the population random variable $X$ have the same distribution. It is also worth noting that the probability of any two observations being equal is zero. That is, the probability is 1 that the observations can be ordered from the smallest to largest without having two equal values. Of course, in practice, we do frequently observe ties, but if the probability of a tie is small, the distribution theory that follows will hold approximately.

One can compute order statistics like sample median, sample range, and other statistics.

Let us look at a straightforward example:

Example 1: Order Statistics

In an experiment for $n=5$ data points, $x_{1}=0.45, x_{2}=0.96, x_{3}=0.65, x_{4}=0.76, x_{5}=0.25$ with each having a pdf $f(x)=3 x^{2}, 0<x<1$.

Determine the sample median and sample range.

Solution

The order statistics are:

$$
y_{1}=0.25<y_{2}=0.45<y_{3}=0.65<y_{4}=0.76<y_{5}=0.96
$$

It is simple enough to note that $y_{3}=0.65$ is the middle statistics, and this is equal to the sample median.

And if we do $y_{5}-y_{1}=0.96-0.25=0.71$ is the value of the sample range.

Now, let $Y_{1}, \ldots, Y_{5}$ be arbitrary and not known, and let’s assume that $Y_{4}<\frac{1}{3}$, this means that the other 2 random variables must be less than this value, too, since they are ordered. This type of event can then be thought of as a binomial experiment for convenience. Now, the probability of success (the event that $X_{i}<\frac{1}{3}$) is:

$$
\text{P}\left(\text{X}_{i} \leq \frac{1}{3}\right)=\int_{0}^{\frac{1}{3}} 3 \text{x}^{2} d x=\frac{1}{27}
$$
Note that we must have at least four successes so that:

$$
\text{P}\left(\text{X}_{3} \leq \frac{1}{3}\right)=\left(\begin{array}{l}
5 \\
4
\end{array}\right)\left(\frac{1}{27}\right)^{4}\left(\frac{26}{27}\right)+\left(\frac{1}{27}\right)^{5}=0.00000233
$$
Now, we can use the same analogy as in the above example to find the cdf of $Y_{3}$, which we can denote as $F(y)$.

We know that,
$$
\text{F(y)}=\text{P}\left(\text{Y}_{3}<\text{y}\right)
$$
And from the order statistics:
$$
\text{P}\left(\text{X}_{i}<\text{y}\right)=\int_{0}^{\text{y}} 3\text{x}^{2}\text{d x}=\left[\text{x}^{3}\right]_{0}^{\text{y}}=\text{y}^{3}
$$
Then,
$$
\begin{aligned}
\text{F(y)} &=\text{P}\left(\text{Y}_{3}<\text{y}\right) \\
&=\left(\begin{array}{l}
5 \\
4
\end{array}\right)\left(\text{y}^{3}\right)^{4}\left(1-\text{y}^{3}\right)+\left(\text{y}^{3}\right)^{5}
\end{aligned}
$$
Now, to find the probability distribution function $f(y)$ for $0<y<1$, we will simply differentiate the cumulative function, $F(y)$, i.e.,

$$
\begin{aligned}
\text{f(y)} &=\text{F}^{\prime}(\text{y}) \\
&=\frac{5 !}{3 ! 1 !}\left[\left(\text{y}^{3}\right)^{4}\left(1-\text{y}^{3}\right) 3 \text{y}^{2}\right]
\end{aligned}
$$
From the above example, we can generalize the results of order statistics.

Cumulative Density Function and Density Function for the rth Order Statistic

Let $X_{1}, X_{2}, \ldots, X_{n}$ be independent observations of a random sample of size $n$ from a continuous population, $X$ with $\operatorname{cdf} F(x)$ and pdf $F^{\prime}(x)=f(x)$. Let $Y_{1}<Y_{2}<\cdots<Y_{n}$ denote the order statistic of that sample. i.e., arranged from the smallest to the largest, namely,

$Y_{1}=$ smallest of $X_{1}, X_{2}, \ldots, X_{n}$

$Y_{2}=$ second smallest of $X_{1}, X_{2}, \ldots, X_{n}$
$\vdots $
$Y_{n}=$ largest of $X_{1}, X_{2}, \ldots, X_{n}$

There is a very simple procedure for determining the cdf of the $r^{t h}$ order statistic, $Y_{r}$, and majorly depends on the binomial distribution.

The event that the $r^{t h}$ order statistic $Y_{r}$ is at most $y, Y_{r} \leq y$, can occur if and only if at least $r$ of the $n$ independent observations is less than or equal to $y$. That is, the probability of “success” on each trial is $F(y)$, and we must have at least $r$ successes. Thus, using the binomial distribution with probability of success, $p=F(y)$, the cdf of $Y_{r}$ is given by,

$$
\text{G}_{\text{r}}(\text{y})=\text{P}\left(Y_{r} \leq \text{y}\right)=\sum_{\text{k=r}}^{\text{n}}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y)}]^{\text{n}-\text{k}}
$$
Rewriting this, we have,
$$
\text{G}_{\text{r}}(\text{y})=\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y})]^{\text{n}-\text{k}}+[\text{F(y)}^{\text{n}}
$$
Hence, the pdf of $Y_{r}$ is
$$
\begin{aligned}
\text{g}_{\text{r}}(\text{y})=\text{G}_{\text{r}}^{\prime}(\text{y}) &=\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)(\text{k})[\text{F(y)}]^{\text{k}-1}\text{f(y)}[1-\text{F(y)}]^{\text{n}-\text{k}} \\
&+\sum_{\text{k}=\text{r}}^{\text{n}-1}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}(\text{n}-\text{k})[1-\text{F(y)}]^{\text{n}-\text{k}}[-\text{f(y)}]+\text{n}[\text{F(y)}]^{\text{n}-1} \text{f(y)} \ldots \ldots \ldots \ldots \text{eqn*}
\end{aligned}
$$
But,
$$\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right) \text{k}=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !} \quad \text { and }\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)(\text{n}-\text{k})=\frac{n !}{\text{k} !(\text{n}-\text{k}-1) !}
$$
Then replacing in $\text{eqn*}$ above,
$$
\text{g}_{\text{r}}(\text{y})=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !}[\text{F(y)}]^{\text{r}-1}[1-\text{F(y)}]^{\text{n}-\text{r}} \text{f(y)}, \quad \text{a}<\text{y}<\text{b}
$$
This is actually the first term of the first summation in $\text{eqn*}$. On the other hand, the remaining terms in $\text{eqn*}$ sum to zero because the second term of the first summation (when $k=r+1$ ) equals the negative of the first term in the second summation (when $k=r$ ), and so on. Finally, the last term of the second summation equals the negative of $n[F(y)]^{n-1} f(y)$.

So, in summary,

1. The cdf of $Y_{r}$ is given by,
$$\text{G}_{r}(\text{y})=\text{P}\left(\text{Y}_{\text{r}} \leq \text{y}\right)=\sum_{\text{k}=\text{r}}^{\text{n}}\left(\begin{array}{l}
\text{n} \\
\text{k}
\end{array}\right)[\text{F(y)}]^{\text{k}}[1-\text{F(y)}]^{\text{n}-\text{k}}
$$

2. The pdf of $Y_{r}$ is given by,

$$
\text{g}_{\text{r}}(\text{y})=\frac{\text{n} !}{(\text{r}-1) !(\text{n}-\text{r}) !}[\text{F(y)}]^{\text{r}-1}[1-\text{F(y)}]^{\text{n}-\text{r}}\text{ f(y)}
$$
Recall that for an order statistic $Y_{1}, Y_{2}, \ldots, Y_{n}, Y_{1}$ is the smallest(the minimum) of $X_{1}, X_{2}, \ldots, X_{n}$ and $Y_{n}$ is the largest(the maximum) of $X_{1}, X_{2}, \ldots, X_{n}$, namely,

$$
\text{Y}_{1}=\min \left(\text{X}_{1}, \text{X}_{2}, \ldots, \text{X}_{\text{n}}\right)
$$
and,
$$
\text{Y}_{\text{n}}=\max \left(\text{X}_{1}, \text{X}_{2}, \ldots, \text{X}_{\text{n}}\right)
$$
It can be shown that the pdf of the smallest(minimum) order statistic is
$$
\text{g}_{1}(\text{y})=\text{n}[1-\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b};
$$
and the pdf of the largest(maximum) order statistic is
$$
\text{g}_{n}(\text{y})=\text{n}[\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b}
$$

Example 2: Order Statistics

Two machines in the manufacturing industry each has operating life (in years) $Y$ with a pdf given by

$$
\text{f(y)}= \begin{cases}\frac{1}{200} e^{-\frac{\text{y}}{200}}, \quad\text{y}>0 \\ 0, \quad \quad\quad \quad \text { otherwise }\end{cases}
$$

The machines operate independently, but if one machine breaks down, the manufacturing process must be stopped.

Find the pdf of $X$, the length of the time of the manufacturing process.

Solution

Since manufacturing stops when one machine fails, then $X$ must be:
$$
\text{X}=\min \left(\text{Y}_{1},\text{Y}_{2}\right)
$$
Where $Y_{1}$ and $Y_{2}$ are independent random variables with the given pdf defined above.

We know that,
$$
\text{g}_{\text{X}}(\text{y})=\text{n}[1-\text{F(y)}]^{\text{n}-1} \text{f(y)}, \text{a}<\text{y}<\text{b};
$$
Now,
$$
\text{F(y)}=\int_{0}^{\text{y}} \frac{1}{200}\text{e}^{-\frac{\text{t}}{200}}\text{dt}=-\text{e}^{-\frac{\text{y}}{200}}+1
$$
Thus,

$$
g_{\text{X}}(\text{y})=2\left[1-\left(-e^{-\frac{\text{y}}{200}}+1\right)\right]^{\text{n}-1} \cdot \frac{1}{200} e^{-\frac{y}{200}}=\frac{1}{100} e^{-\frac{\text{y}}{100}}
$$
More precisely,

$$
g_{\text{X}}(\text{y})= \begin{cases}\frac{1}{100} e^{-\frac{y}{100}}, \quad \text{y}>0 \\ 0, \quad \quad \quad\text { otherwise }\end{cases}
$$
Note that the mean life of each machine is 200 years while manufacturing life is 100 years.

Learning Outcome

Topic 3. f: Multivariate random Variables-Determine the distribution of order statistics from a set of independent random variables.

Offered by AnalystPrep

Swaps

Principles for Sound Stress Testing – Practices and Supervision

Country Risk: Determinants, Measures, and Implications

Daniel Glyn

2021-03-24

I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!

michael walshe

2021-03-18

Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.