Determine the distribution of order statistics from a set of independent random variables

Determine the distribution of order statistics from a set of independent random variables

Order Statistics

Order Statistics are distributions obtained when we look at test scores from a random sample arranged in ascending order, i.e., from the smallest to the largest. In recent years, the importance of order statistics has increased because of the more frequent use of nonparametric inferences and robust procedures. However, order statistics have always been prominent because, among other things, they are applied in determining rather simple statistics such as sample median, the sample range, etc.

For studying purposes, we will assume that a random sample \(X_1, X_2, \ldots, X_n\) comprising \(n\) independent observations is obtained from a continuous population random variable \(X\). Note that each observation \(X_i\) and the population random variable \(X\), have the same distribution. It is also worth noting that the probability of any two observations being equal is zero. That is, the probability is 1 that the observations can be ordered from the smallest to largest without having two equal values. Of course, in practice, we do frequently observe ties, but if the probability of a tie is small, the distribution theory that follows will hold approximately.

One can compute order statistics like sample median, sample range, and other statistics.

Let us look at a straightforward example:

Example 1: Order Statistics

In an experiment for \(n=5\) data points, \(x_1=0.45,\ {x}_2=0.96,\ x_3=0.65,\ { x}_4=0.76,\ x_5=0.25\), with each having a pdf \(f\left(x\right)=3x^2,\ \ 0\lt x \lt 1\).

Determine the sample median and sample range.

Solution

The order statistics are:

$$ y_1=0.25 \lt y_2=0.45 \lt y_3=0.65 \lt y_4=0.76 \lt y_5=0.96 $$

It is simple enough to note that \(y_3=0.65\) is the middle statistic, and this is equal to the sample median.

And if we do \(y_5-y_1=0.96-0.25=0.71\) is the value of the sample range.

Now, let \(Y_1,\ldots, Y_5\) be arbitrary and not known, and let’s assume that \(Y_4 \lt \frac{1}{3}\), this means that the other two random variables must be less than this value, too, since they are ordered. This type of event can then be thought of as a binomial experiment for convenience. Now, the probability of success (the event that \(X_i \lt \frac{1}{3}\)) is:

$$ P\left(X_i\le\frac{1}{3}\right)=\int_{0}^{\frac{1}{3}}{3x^2\ dx=\frac{1}{27} } $$

Note that we must have at least four successes so that:

$$ P\left(X_3\le\frac{1}{3}\right)=\binom{5}{4}\left(\frac{1}{27}\right)^4\left(\frac{26}{27}\right)+\left(\frac{1} {27}\right)^5=0.00000233 $$

Now, we can use the same analogy as in the above example to find the cdf of \(Y_3\) which we can denote as \(F\left(y\right)\).

We know that,

$$ F\left(y\right)=P\left(Y_3 \lt y\right) $$

And from the order statistics:

$$ P\left(X_i \lt y\right)=\int_{0}^{y}{3x^2\ dx=\left[x^3\right]_0^y=y^3} $$

Then,

$$ \begin{align*}
F\left(y\right)&=P\left(Y_3 \lt y\right) \\ & =\binom{5}{4}\left(y^3\right)^4\left(1-y^3\right)+\left(y^3\right)^5 \end{align*} $$

Now, to find the probability distribution function \(f(y)\) for \(0\lt y \lt 1\), we will simply differentiate the cumulative function, \(F\left(y\right)\), i.e.,

$$\begin{align*} f\left(y\right) &=F^\prime\left(y\right) \\ & =\frac{5!}{3!1!}[\left(y^3\right)^4\left(1-y^3\right)3y^2] \end{align*} $$

Let \(x_1\ x_2\ \cdots\ x_{n-1}\ \ \ x_n\) be an ordered statistic.

We can denote:

$$ m=\text{the } 1^{st} \text{ order statistic}=min(x_1,x_2,\cdots,x_n) $$

$$ \widetilde{m}=\text{the } n^{th}\text{ order statistic}=max(x_1,x_2,\cdots,x_n) $$

We can let:

$$ M=rvr \text{ the } 1^{st} \text{ order statistic}=min(x_1,x_2,\cdots,x_n) $$

$$ \widetilde{M}=rvr \text{ the } n^{th} \text{ order statistic}=max(X_1,X_2,\cdots,X_n) $$

Now, let’s say we want to find the following probabilities:

\(Pr(M\le 7)\) (CDF of M) Convoluted event

\(Pr{\left(M \gt 183\right)}\) (Survival function of M) Not convoluted event

\(Pr{\left(\widetilde{M}\geq 100\right)}\) (Survival function of \(\widetilde{M}\)) Convoluted Event

\(Pr{\left(\widetilde{M} \lt 1000\right)}\) (CDF of \(\widetilde{M}\)) Not convoluted

Now, for Probabilities involving \(M\), start with the Survival Function.

For Probabilities involving \(\widetilde{M}\), start with the Cumulative Distribution Function.

For instance, if we wanted to calculate \(Pr(M\le 7)\), we can rewrite this so that we have:

$$ Pr{\left(M\le 7\right)}=1-Pr{\left(M\geq 7\right)} $$

Likewise,

$$ Pr{\left(\widetilde{M}\geq 100\right)}=1-Pr{\left(\widetilde{M}\le 100\right)} $$

Now, let us assume we have two random variables only:

Order statistic: \(X_1,X_2\)

Suppose that \(X_1\) and \(X_2\) are independent

Suppose further that \(X_1\) and \(X_2\) are identically distributed to a common distribution, \(X\).

$$ M=min(X_1, X_2) $$

$$ Pr{\left(M \gt x\right)}=Pr(X_1 \gt x\cap X_2 \gt x) $$

We can let \(E_1=X_1 \gt x\) and \(E_2=X_2 \gt x\), so that we have:

$$ \begin{align*} Pr{\left(M \gt x\right)} & =\Pr{\left(E_1\cap E_2\right)} \\
& =\Pr{\left(E_1\right)}\cdot Pr{\left(E_2\right)} \text{ if } E_1 \text{ and } E_2 \text{ are independent} \\
\Rightarrow Pr{\left(M \gt x\right)} & =Pr(X_1 \gt x\cap X_2 \gt x) \\ & =Pr{\left(X_1 \gt x\right)}\cdot Pr(X_2 \gt x)
\end{align*} $$

Now, if we assume further that \(X_1\) and \(X_2\) are identically distributed to a common distribution, \(X\)., i.e., \(Pr{\left(X_1 \gt x\right)}=Pr{\left(X_2>x\right)}=Pr(X \gt x)\),

$$ \begin{align*} \Rightarrow Pr{\left(M \gt x\right)} & = Pr(X_1 \gt x\cap X_2 \gt x) \\ & =Pr{\left(X_1 \gt x\right)}\cdot Pr(X_2 \gt x) \\ & =[Pr X \gt x]^2 \end{align*} $$

Similarly, for the maximum,

$$ \widetilde{M}=max(X_1,X_2) $$

$$ \begin{align*} Pr{\left(\widetilde{M}\le x\right)} & =Pr(X_1\le x\cap X_2\le x) \\ & =Pr{\left(X_1\le x\right)}\cdot Pr(X_2\le x) \\ & =[Pr X\leq x]^2 \end{align*} $$

Note that we can extend this to more than two variables.

Example 2

A random sample of 10 observations from an exponential distribution with mean of 1000 is collected. Determine the probability that the maximum of the observations is greater than 1200.

Solution

$$ \text{Order statistic: } X_1\ \ \ X_2\ \ \cdots\ \ X_9\ \ X_{10} $$

Since we have a random sample of 10 observations, then \(\left\{X_i\right\}_{i=1}^{10}\) is iid to \(X\sim Exp(\theta=1000)\).

$$ \widetilde{M}=max(X_1,X_2,\cdots,X_{10}) $$

\(\Pr{\left(\widetilde{M} \gt 1200\right)}\) (convoluted event)

$$ \begin{align*}
Pr{\left(\widetilde{M} \gt 1200\right)} & =1-Pr(\widetilde{M}\le 1200) \\
Pr{\left(\widetilde{M}\le 1200\right)} & =Pr(X_1\le 1200\cap X_2\le 1200\cap \cdots \cap X_{10}\le 1200) \\
& =\Pr{\left(X_1\le 1200\right)}\cdot Pr{\left(X_2\le 1200\right)}\cdot \cdots\cdot Pr{\left(X_{10}\le 1200\right)}\ \text{(indep)} \\
& =[Pr(X \leq 1200)]^{10} \ \ \ id \\
& =[1-e^{-1.2}]^{10} \\
\therefore Pr{\left(\widetilde{M}\gt 1200\right)}& =1-\left[1-e^{-1.2}\right]^{10}=0.9722\ldots
\end{align*} $$

Cumulative Density Function and Density Function for the rth Order Statistic

Let \(X_1,X_2, \ldots, X_n\) be independent observations of a random sample of size n from a continuous population, \(X\) with CDF \(F(x)\) and pdf \(F^\prime\left(x\right)=f\left(x\right)\). Let \(Y_1 \lt Y_2 \lt \ldots \lt Y_n\) denote the order statistic of that sample. i.e., arranged from the smallest to the largest, namely,

\(Y_1\)= smallest of \(X_1, X_2,\ldots, X_n\)

\(\begin{align*}
Y_2= \text{second smallest of } X_1, X_2,\ldots, & X_n \\
& \cdot \\
& \cdot \\
& \cdot \\
Y_n= \text{largest of } X_1, X_2,\ldots, & X_n \end{align*} \)

There is a very simple procedure for determining the cdf of the \(r^{th}\) order statistic, \(Y_r\) and majorly depends on the binomial distribution.

The event that the \(r^{th}\) order statistic \(Y_r\) is at most \(y, Y_r\le y,\) can occur if and only if at least \(r\) of the \(n\) independent observations is less than or equal to \(y\). That is, the probability of “success” on each trial is \(F(y)\), and we must have at least \(r\) successes. Thus, using the binomial distribution with probability of success, \(p=F(y)\), the cdf of \(Y_r\) is given by,

$$ G_r\left(y\right)=P\left(Y_r\le y\right)=\sum_{k=r}^{n}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}} $$

Rewriting this, we have,

$$ G_r\left(y\right)=\sum_{k=r}^{n-1}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}+\left[F\left(y\right)\right]^n} $$

Hence, the pdf of \(Y_r\) is

$$ \begin{align*} g_r\left(y\right)=G_r^\prime\left(y\right)&=\sum_{k=r}^{n-1}{\binom{n}{k}\left(k\right)\left[F\left(y\right)\right]^{k-1}f\left(y\right)\left[1-F\left(y\right)\right]^{n-k}} \\ & +\sum_{k=r}^{n-1}{\binom{n}{k}\left[F\left(y\right)\right]^k\left(n-k\right)\left[1-F\left(y\right)\right]^{n-k}\left[-f\left(y\right)\right]+n\left[F\left(y\right)\right]^{n-1}f\left(y\right)\ldots\ldots eqn\ast} \end{align*} $$

But,

$$ \binom{n}{k}k=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\ \text{ and } \ \binom{n}{k}\left(n-k\right)=\frac{n!}{k!\left(n-k-1\right)!} $$

Then replacing in \(eqn\ast\) above,

$$ g_r\left(y\right)=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\left[F\left(y\right)\right]^{r-1}\left[1-F\left(y\right)\right]^{n-r}f\left(y\right),\ \ \ \ a \lt y \lt b $$

This is actually the first term of the first summation in \(eqn\ast\). On the other hand, the remaining terms in \(eqn\ast\) sum to zero because the second term of the first summation (when \(k=r+1\)) equals the negative of the first term in the second summation (when \(k=r\)), and so on. Finally, the last term of the second summation equals the negative of \(n\left[F\left(y\right)\right]^{n-1}f\left(y\right)\).

So, in summary,

  1. The cdf of \(Y_r\) is given by,

    $$ G_r\left(y\right)=P\left(Y_r\le y\right)=\sum_{k=r}^{n}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}} $$

  2. The pdf of \(Y_r\) is given by,

    $$ g_r\left(y\right)=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\left[F\left(y\right)\right]^{r-1}\left[1-F\left(y\right)\right]^{n-r}f\left(y\right) $$

Recall that for an order statistic \(Y_1 , Y_2, \ldots, Y_n, Y_1\) is the smallest(the minimum) of \(X_1, X_2,\ldots, X_n\) and \(Y_n\) is the largest(the maximum) of \(X_1, X_2,\ldots, X_n\), namely,

$$ Y_1=min{\left(X_1, X_2,\ldots, X_n\right)} $$

and,

$$ Y_n=max{\left(X_1, X_2,\ldots, X_n\right) } $$

It can be shown that the pdf of the smallest(minimum) order statistic is

$$ g_1\left(y\right)=n\left[1-F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ a\lt y \lt b, $$

and the pdf of the largest(maximum) order statistic is

$$ g_n\left(y\right)=n\left[F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ \ a \lt y \lt b $$.

Example 3: Order Statistics

Two machines in the manufacturing industry each has operating life (in years) \(Y\) with a pdf given by

$$ f(y)=\left\{ \begin{matrix} \frac{1}{200}e^{-\frac{y}{200}}, & y \gt 0 \\ 0, & \text{otherwise} \end{matrix} \right. $$

The machines operate independently, but if one machine breaks down, the manufacturing process must be stopped.

Find the pdf of \(X\), the length of the time of the manufacturing process.

Solution

Since manufacturing stops when one machine fails, then \(X\) must be:

$$ X=min(Y_1,Y_2) $$

Where \(Y_1\) and \(Y_2\) are independent random variables with the given pdf defined above.

We know that,

$$ g_X\left(y\right)=n\left[1-F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ a \lt y \lt b, $$

Now,

$$ F\left(y\right)=\int_{0}^{y}{\frac{1}{200}e^{-\frac{t}{200}}dt=}-e^{-\frac{y}{200}}+1 $$

Thus,

$$ g_X\left(y\right)=2\left[1-\left(-e^{-\frac{y}{200}}+1\right)\right]^{n-1}.\frac{1}{200}e^{-\frac{y}{200}}=\frac{1}{100}e^{-\frac{y}{100}} $$

More precisely,

$$ g_X\left(y\right)= \left\{ \begin{matrix} \frac{1}{100}e^{-\frac{y}{100}}, & y \gt 0 \\ 0, & \text{otherwise} \end{matrix} \right. $$

Note that the mean life of each machine is 200 years, while manufacturing life is 100 years.

Question

A batch of 100 independent components are tested for durability, and their failure times (in hours) are recorded. Assume the failure times are uniformly distributed between 0 to 1000 hours. Determine the probability distribution function of the failure time of the component that fails first (the 1st order statistic).

  1. U(0, 10)
  2. U(0, 100)
  3. U(0, 1000)
  4. Exp(0.1)
  5. Exp(0.01)

Solution

The correct answer is D.

The first order statistic in a sample of uniformly distributed independent random variables represents the minimum value. For a uniform distribution \(U(a, b)\), the probability density function of the first order statistic \(Y_{\left(1\right)}\) (the minimum) can be found using the following formula:

$$ f_{Y_{\left(1\right)}}\left(y\right)=n\cdot \left[F_Y\left(y\right)\right]^{n-1}\cdot f_Y\left(y\right) $$

Where:

  • \(n\) is the sample size,
  • \(F_Y\left(y\right)\) is the cumulative distribution function of \(Y\),
  • \(f_Y\left(y\right)\) is the probability density function of \(Y\).

Given that the components’ failure times are uniformly distributed between 0 to 1000 hours, each component has a failure time distribution \(U(0,1000)\), with a density function:

$$ f_Y\left(y\right)=\frac{1}{1000-0}=\frac{1}{1000} $$

And a cumulative distribution function:

$$ F_Y\left(y\right)=\frac{\left(y-0\right)}{1000-0}=\frac{y}{1000} $$

Substitute these into the formula for \(f_{Y_{\left(1\right)}}\left(y\right)\):

$$ f_{Y_{\left(1\right)}}\left(y\right)=100\cdot \left[\frac{y}{1000}\right]^{99}\cdot \frac{1}{1000} $$

Since the failure time is continuous and \(y\) can take any value from 0 to 1000, the resulting distribution is not uniform but rather an exponential distribution due to the nature of the order statistics of a uniform distribution. The rate parameter \(\lambda\) of the exponential distribution can be found as \(\lambda=\frac{n}{b-a}\)

Hence, the failure time of the component that fails first follows an exponential distribution with \(\lambda = \frac{100}{1000} = 0.1\).

This solution takes into account the properties of the uniform distribution and the characteristics of order statistics to derive the distribution of the minimum value in a sample. It demonstrates how the distribution of the first order statistic from a uniform distribution transitions to an exponential distribution.

Learning Outcome

Topic 3. f: Multivariate random Variables-Determine the distribution of order statistics from a set of independent random variables.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.