Apply the Central Limit Theorem to cal ...
One of the most important results in probability theory is the central limit... Read More
Order Statistics are distributions obtained when we look at test scores from a random sample arranged in ascending order, i.e., from the smallest to the largest. In recent years, the importance of order statistics has increased because of the more frequent use of nonparametric inferences and robust procedures. However, order statistics have always been prominent because, among other things, they are applied in determining rather simple statistics such as sample median, the sample range, etc.
For studying purposes, we will assume that a random sample \(X_1, X_2, \ldots, X_n\) comprising \(n\) independent observations is obtained from a continuous population random variable \(X\). Note that each observation \(X_i\) and the population random variable \(X\), have the same distribution. It is also worth noting that the probability of any two observations being equal is zero. That is, the probability is 1 that the observations can be ordered from the smallest to largest without having two equal values. Of course, in practice, we do frequently observe ties, but if the probability of a tie is small, the distribution theory that follows will hold approximately.
One can compute order statistics like sample median, sample range, and other statistics.
Let us look at a straightforward example:
In an experiment for \(n=5\) data points, \(x_1=0.45,\ {x}_2=0.96,\ x_3=0.65,\ { x}_4=0.76,\ x_5=0.25\), with each having a pdf \(f\left(x\right)=3x^2,\ \ 0\lt x \lt 1\).
Determine the sample median and sample range.
Solution
The order statistics are:
$$ y_1=0.25 \lt y_2=0.45 \lt y_3=0.65 \lt y_4=0.76 \lt y_5=0.96 $$
It is simple enough to note that \(y_3=0.65\) is the middle statistic, and this is equal to the sample median.
And if we do \(y_5-y_1=0.96-0.25=0.71\) is the value of the sample range.
Now, let \(Y_1,\ldots, Y_5\) be arbitrary and not known, and let’s assume that \(Y_4 \lt \frac{1}{3}\), this means that the other two random variables must be less than this value, too, since they are ordered. This type of event can then be thought of as a binomial experiment for convenience. Now, the probability of success (the event that \(X_i \lt \frac{1}{3}\)) is:
$$ P\left(X_i\le\frac{1}{3}\right)=\int_{0}^{\frac{1}{3}}{3x^2\ dx=\frac{1}{27} } $$
Note that we must have at least four successes so that:
$$ P\left(X_3\le\frac{1}{3}\right)=\binom{5}{4}\left(\frac{1}{27}\right)^4\left(\frac{26}{27}\right)+\left(\frac{1} {27}\right)^5=0.00000233 $$
Now, we can use the same analogy as in the above example to find the cdf of \(Y_3\) which we can denote as \(F\left(y\right)\).
We know that,
$$ F\left(y\right)=P\left(Y_3 \lt y\right) $$
And from the order statistics:
$$ P\left(X_i \lt y\right)=\int_{0}^{y}{3x^2\ dx=\left[x^3\right]_0^y=y^3} $$
Then,
$$ \begin{align*}
F\left(y\right)&=P\left(Y_3 \lt y\right) \\ & =\binom{5}{4}\left(y^3\right)^4\left(1-y^3\right)+\left(y^3\right)^5 \end{align*} $$
Now, to find the probability distribution function \(f(y)\) for \(0\lt y \lt 1\), we will simply differentiate the cumulative function, \(F\left(y\right)\), i.e.,
$$\begin{align*} f\left(y\right) &=F^\prime\left(y\right) \\ & =\frac{5!}{3!1!}[\left(y^3\right)^4\left(1-y^3\right)3y^2] \end{align*} $$
Let \(x_1\ x_2\ \cdots\ x_{n-1}\ \ \ x_n\) be an ordered statistic.
We can denote:
$$ m=\text{the } 1^{st} \text{ order statistic}=min(x_1,x_2,\cdots,x_n) $$
$$ \widetilde{m}=\text{the } n^{th}\text{ order statistic}=max(x_1,x_2,\cdots,x_n) $$
We can let:
$$ M=rvr \text{ the } 1^{st} \text{ order statistic}=min(x_1,x_2,\cdots,x_n) $$
$$ \widetilde{M}=rvr \text{ the } n^{th} \text{ order statistic}=max(X_1,X_2,\cdots,X_n) $$
Now, let’s say we want to find the following probabilities:
\(Pr(M\le 7)\) (CDF of M) Convoluted event
\(Pr{\left(M \gt 183\right)}\) (Survival function of M) Not convoluted event
\(Pr{\left(\widetilde{M}\geq 100\right)}\) (Survival function of \(\widetilde{M}\)) Convoluted Event
\(Pr{\left(\widetilde{M} \lt 1000\right)}\) (CDF of \(\widetilde{M}\)) Not convoluted
Now, for Probabilities involving \(M\), start with the Survival Function.
For Probabilities involving \(\widetilde{M}\), start with the Cumulative Distribution Function.
For instance, if we wanted to calculate \(Pr(M\le 7)\), we can rewrite this so that we have:
$$ Pr{\left(M\le 7\right)}=1-Pr{\left(M\geq 7\right)} $$
Likewise,
$$ Pr{\left(\widetilde{M}\geq 100\right)}=1-Pr{\left(\widetilde{M}\le 100\right)} $$
Now, let us assume we have two random variables only:
Order statistic: \(X_1,X_2\)
Suppose that \(X_1\) and \(X_2\) are independent
Suppose further that \(X_1\) and \(X_2\) are identically distributed to a common distribution, \(X\).
$$ M=min(X_1, X_2) $$
$$ Pr{\left(M \gt x\right)}=Pr(X_1 \gt x\cap X_2 \gt x) $$
We can let \(E_1=X_1 \gt x\) and \(E_2=X_2 \gt x\), so that we have:
$$ \begin{align*} Pr{\left(M \gt x\right)} & =\Pr{\left(E_1\cap E_2\right)} \\
& =\Pr{\left(E_1\right)}\cdot Pr{\left(E_2\right)} \text{ if } E_1 \text{ and } E_2 \text{ are independent} \\
\Rightarrow Pr{\left(M \gt x\right)} & =Pr(X_1 \gt x\cap X_2 \gt x) \\ & =Pr{\left(X_1 \gt x\right)}\cdot Pr(X_2 \gt x)
\end{align*} $$
Now, if we assume further that \(X_1\) and \(X_2\) are identically distributed to a common distribution, \(X\)., i.e., \(Pr{\left(X_1 \gt x\right)}=Pr{\left(X_2>x\right)}=Pr(X \gt x)\),
$$ \begin{align*} \Rightarrow Pr{\left(M \gt x\right)} & = Pr(X_1 \gt x\cap X_2 \gt x) \\ & =Pr{\left(X_1 \gt x\right)}\cdot Pr(X_2 \gt x) \\ & =[Pr X \gt x]^2 \end{align*} $$
Similarly, for the maximum,
$$ \widetilde{M}=max(X_1,X_2) $$
$$ \begin{align*} Pr{\left(\widetilde{M}\le x\right)} & =Pr(X_1\le x\cap X_2\le x) \\ & =Pr{\left(X_1\le x\right)}\cdot Pr(X_2\le x) \\ & =[Pr X\leq x]^2 \end{align*} $$
Note that we can extend this to more than two variables.
A random sample of 10 observations from an exponential distribution with mean of 1000 is collected. Determine the probability that the maximum of the observations is greater than 1200.
Solution
$$ \text{Order statistic: } X_1\ \ \ X_2\ \ \cdots\ \ X_9\ \ X_{10} $$
Since we have a random sample of 10 observations, then \(\left\{X_i\right\}_{i=1}^{10}\) is iid to \(X\sim Exp(\theta=1000)\).
$$ \widetilde{M}=max(X_1,X_2,\cdots,X_{10}) $$
\(\Pr{\left(\widetilde{M} \gt 1200\right)}\) (convoluted event)
$$ \begin{align*}
Pr{\left(\widetilde{M} \gt 1200\right)} & =1-Pr(\widetilde{M}\le 1200) \\
Pr{\left(\widetilde{M}\le 1200\right)} & =Pr(X_1\le 1200\cap X_2\le 1200\cap \cdots \cap X_{10}\le 1200) \\
& =\Pr{\left(X_1\le 1200\right)}\cdot Pr{\left(X_2\le 1200\right)}\cdot \cdots\cdot Pr{\left(X_{10}\le 1200\right)}\ \text{(indep)} \\
& =[Pr(X \leq 1200)]^{10} \ \ \ id \\
& =[1-e^{-1.2}]^{10} \\
\therefore Pr{\left(\widetilde{M}\gt 1200\right)}& =1-\left[1-e^{-1.2}\right]^{10}=0.9722\ldots
\end{align*} $$
Let \(X_1,X_2, \ldots, X_n\) be independent observations of a random sample of size n from a continuous population, \(X\) with CDF \(F(x)\) and pdf \(F^\prime\left(x\right)=f\left(x\right)\). Let \(Y_1 \lt Y_2 \lt \ldots \lt Y_n\) denote the order statistic of that sample. i.e., arranged from the smallest to the largest, namely,
\(Y_1\)= smallest of \(X_1, X_2,\ldots, X_n\)
\(\begin{align*}
Y_2= \text{second smallest of } X_1, X_2,\ldots, & X_n \\
& \cdot \\
& \cdot \\
& \cdot \\
Y_n= \text{largest of } X_1, X_2,\ldots, & X_n \end{align*} \)
There is a very simple procedure for determining the cdf of the \(r^{th}\) order statistic, \(Y_r\) and majorly depends on the binomial distribution.
The event that the \(r^{th}\) order statistic \(Y_r\) is at most \(y, Y_r\le y,\) can occur if and only if at least \(r\) of the \(n\) independent observations is less than or equal to \(y\). That is, the probability of “success” on each trial is \(F(y)\), and we must have at least \(r\) successes. Thus, using the binomial distribution with probability of success, \(p=F(y)\), the cdf of \(Y_r\) is given by,
$$ G_r\left(y\right)=P\left(Y_r\le y\right)=\sum_{k=r}^{n}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}} $$
Rewriting this, we have,
$$ G_r\left(y\right)=\sum_{k=r}^{n-1}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}+\left[F\left(y\right)\right]^n} $$
Hence, the pdf of \(Y_r\) is
$$ \begin{align*} g_r\left(y\right)=G_r^\prime\left(y\right)&=\sum_{k=r}^{n-1}{\binom{n}{k}\left(k\right)\left[F\left(y\right)\right]^{k-1}f\left(y\right)\left[1-F\left(y\right)\right]^{n-k}} \\ & +\sum_{k=r}^{n-1}{\binom{n}{k}\left[F\left(y\right)\right]^k\left(n-k\right)\left[1-F\left(y\right)\right]^{n-k}\left[-f\left(y\right)\right]+n\left[F\left(y\right)\right]^{n-1}f\left(y\right)\ldots\ldots eqn\ast} \end{align*} $$
But,
$$ \binom{n}{k}k=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\ \text{ and } \ \binom{n}{k}\left(n-k\right)=\frac{n!}{k!\left(n-k-1\right)!} $$
Then replacing in \(eqn\ast\) above,
$$ g_r\left(y\right)=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\left[F\left(y\right)\right]^{r-1}\left[1-F\left(y\right)\right]^{n-r}f\left(y\right),\ \ \ \ a \lt y \lt b $$
This is actually the first term of the first summation in \(eqn\ast\). On the other hand, the remaining terms in \(eqn\ast\) sum to zero because the second term of the first summation (when \(k=r+1\)) equals the negative of the first term in the second summation (when \(k=r\)), and so on. Finally, the last term of the second summation equals the negative of \(n\left[F\left(y\right)\right]^{n-1}f\left(y\right)\).
So, in summary,
$$ G_r\left(y\right)=P\left(Y_r\le y\right)=\sum_{k=r}^{n}{\binom{n}{k}\left[F\left(y\right)\right]^k\left[1-F\left(y\right)\right]^{n-k}} $$
$$ g_r\left(y\right)=\frac{n!}{\left(r-1\right)!\left(n-r\right)!}\left[F\left(y\right)\right]^{r-1}\left[1-F\left(y\right)\right]^{n-r}f\left(y\right) $$
Recall that for an order statistic \(Y_1 , Y_2, \ldots, Y_n, Y_1\) is the smallest(the minimum) of \(X_1, X_2,\ldots, X_n\) and \(Y_n\) is the largest(the maximum) of \(X_1, X_2,\ldots, X_n\), namely,
$$ Y_1=min{\left(X_1, X_2,\ldots, X_n\right)} $$
and,
$$ Y_n=max{\left(X_1, X_2,\ldots, X_n\right) } $$
It can be shown that the pdf of the smallest(minimum) order statistic is
$$ g_1\left(y\right)=n\left[1-F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ a\lt y \lt b, $$
and the pdf of the largest(maximum) order statistic is
$$ g_n\left(y\right)=n\left[F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ \ a \lt y \lt b $$.
Two machines in the manufacturing industry each has operating life (in years) \(Y\) with a pdf given by
$$ f(y)=\left\{ \begin{matrix} \frac{1}{200}e^{-\frac{y}{200}}, & y \gt 0 \\ 0, & \text{otherwise} \end{matrix} \right. $$
The machines operate independently, but if one machine breaks down, the manufacturing process must be stopped.
Find the pdf of \(X\), the length of the time of the manufacturing process.
Solution
Since manufacturing stops when one machine fails, then \(X\) must be:
$$ X=min(Y_1,Y_2) $$
Where \(Y_1\) and \(Y_2\) are independent random variables with the given pdf defined above.
We know that,
$$ g_X\left(y\right)=n\left[1-F\left(y\right)\right]^{n-1}f\left(y\right),\ \ \ a \lt y \lt b, $$
Now,
$$ F\left(y\right)=\int_{0}^{y}{\frac{1}{200}e^{-\frac{t}{200}}dt=}-e^{-\frac{y}{200}}+1 $$
Thus,
$$ g_X\left(y\right)=2\left[1-\left(-e^{-\frac{y}{200}}+1\right)\right]^{n-1}.\frac{1}{200}e^{-\frac{y}{200}}=\frac{1}{100}e^{-\frac{y}{100}} $$
More precisely,
$$ g_X\left(y\right)= \left\{ \begin{matrix} \frac{1}{100}e^{-\frac{y}{100}}, & y \gt 0 \\ 0, & \text{otherwise} \end{matrix} \right. $$
Note that the mean life of each machine is 200 years, while manufacturing life is 100 years.
Question
A batch of 100 independent components are tested for durability, and their failure times (in hours) are recorded. Assume the failure times are uniformly distributed between 0 to 1000 hours. Determine the probability distribution function of the failure time of the component that fails first (the 1st order statistic).
- U(0, 10)
- U(0, 100)
- U(0, 1000)
- Exp(0.1)
- Exp(0.01)
Solution
The correct answer is D.
The first order statistic in a sample of uniformly distributed independent random variables represents the minimum value. For a uniform distribution \(U(a, b)\), the probability density function of the first order statistic \(Y_{\left(1\right)}\) (the minimum) can be found using the following formula:
$$ f_{Y_{\left(1\right)}}\left(y\right)=n\cdot \left[F_Y\left(y\right)\right]^{n-1}\cdot f_Y\left(y\right) $$
Where:
- \(n\) is the sample size,
- \(F_Y\left(y\right)\) is the cumulative distribution function of \(Y\),
- \(f_Y\left(y\right)\) is the probability density function of \(Y\).
Given that the components’ failure times are uniformly distributed between 0 to 1000 hours, each component has a failure time distribution \(U(0,1000)\), with a density function:
$$ f_Y\left(y\right)=\frac{1}{1000-0}=\frac{1}{1000} $$
And a cumulative distribution function:
$$ F_Y\left(y\right)=\frac{\left(y-0\right)}{1000-0}=\frac{y}{1000} $$
Substitute these into the formula for \(f_{Y_{\left(1\right)}}\left(y\right)\):
$$ f_{Y_{\left(1\right)}}\left(y\right)=100\cdot \left[\frac{y}{1000}\right]^{99}\cdot \frac{1}{1000} $$
Since the failure time is continuous and \(y\) can take any value from 0 to 1000, the resulting distribution is not uniform but rather an exponential distribution due to the nature of the order statistics of a uniform distribution. The rate parameter \(\lambda\) of the exponential distribution can be found as \(\lambda=\frac{n}{b-a}\)
Hence, the failure time of the component that fails first follows an exponential distribution with \(\lambda = \frac{100}{1000} = 0.1\).
This solution takes into account the properties of the uniform distribution and the characteristics of order statistics to derive the distribution of the minimum value in a sample. It demonstrates how the distribution of the first order statistic from a uniform distribution transitions to an exponential distribution.
Learning Outcome
Topic 3. f: Multivariate random Variables-Determine the distribution of order statistics from a set of independent random variables.