Calculation and Interpretation of Confidence Intervals

Confidence interval (C.I) refers to a range of values within which statisticians believe the actual value of a certain population parameter lies. It differs from a point estimate which is a single, specific numerical value.

Breaking down Confidence Interval

When constructing confidence intervals, we must specify the probability that the interval contains the true value of the parameter of interest. This probability is represented by (1 – α) where α is the level of significance. In statistical terminology, 1- α is called the degree of confidence or certainty.

We define a 100(1 – α)% confidence interval for a given parameter, say, θ by specifying two random variables θ’1(X)and θ’2(X) such that P{ θ’1(X?) < θ < θ’2(X)} = 1 – α.

It happens that α = 0.05 is the most common case not just in the exam but in practice. This leads to a 95% confidence interval.

Consequently, P{ θ’1(X) < θ < θ’2(X)} = 0.95 specifies { θ’1(X), θ’2(X)} as a 95% C.I for θ. The main task for candidates lies in being able to construct and interpret a confidence interval. Thus, the C.I for θ above could be interpreted to mean that if we were to construct similar intervals using samples of equal sizes from the same population, 95% of the intervals would contain the true parameter value and just 5% would not contain it, hence, the phrase “confidence” interval.

Constructing Confidence Intervals

To construct a confidence interval, one must come up with an appropriate value that will be subtracted and added to a point estimate. A confidence interval appears as follows:

C.I = point estimate ± reliability factor * standard error

Where:

Point estimate refers to a calculated value of the sample statistic such as the mean, X.

Reliability factor is a value that depends on the sampling distribution involved and (1 – ), the probability that the point estimate is contained in the C.I.

Standard error = standard error of the point estimate.

Different Scenarios

1. Normal distribution with a known variance:

We can calculate the C.I for the mean as,

x ± zα/2 * σ/√n

Here, the reliability factor is zα/2– the z-score that leaves a probability of α/2 on the upper tail (right-hand tail) of the standard normal distribution.

The following table represents the standard normal distributions commonly used by analysts.

 Degree of confidence Level of significance(one-tailed) zα/2 90% 10% 1.645 95% 5% 1.960 99% 1% 2.575

1. Normal distribution with unknown variance:

When the variance is unknown, we construct the C.I for the mean by replacing the z-score in the first scenario with the t-score. Similarly, we replace the unknown σ with S, the standard deviation of the sample mean. Why is the t-distribution used?

Thus,

C.I = x ± tα/2 * S/√n

tα/2 is the t-score that leaves probability of α/2 on the upper tail of the t-distribution. The number of degrees of freedom is determined by the sample size such that d.f = n – 1.

1. Confidence Interval of the population mean when variance is unknown and the sample size is large enough (any type of distribution):

Thanks to the Central Limit Theorem, we can approximate just about any type of non-normal distribution as a normal one provided the sample size is large (n ≥ 30). Therefore, we can use the relevant z-score when constructing a confidence interval for the population mean. However, some analysts may advocate the use of the t-distribution in scenarios where the distribution is non-normal and the population variance is unknown, even if n ≥ 30. Nonetheless, the use of the z statistic would still be justified under such circumstances provided the central limit theorem is applied correctly.

Example

A teacher draws a sample of 5 12-year-old children from the school’s population and records their heights as follows:

{124, 124, 128, 130, 127}

Assume that the heights have a normal distribution where both μ and σ are unknown. Calculate a two-tailed 95% confidence interval for the mean height of 12-year-olds.

Solution:

Since the variance is unknown and the sample size is less than 30, we should use the t-score as opposed to the z-score, even if the distribution is normal. Thus, the C.I for the mean will take the form,

C.I = x ± tα/2 * S/√n

From the data, X = 126.6 and S2 = 17.8

You can read off the t-score value from the t-distribution table where you will find that,

0.95 = P(-2.776 < t4 < 2.776) i.e. t4, 0.025 = 2.776

Therefore, C.I = 126.6 ± 2.776 * 4.219/√5

= 126.6 ± 5.238

Thus, our confidence interval for μ is (121.4, 131.8)

Question:

Use the data from the example above to calculate a two-tailed 99% confidence interval for the population mean.

A. (121.4, 130.8)

B. (117.9, 135.3)

C. (126.6, 135.3)

Solution

C.I = x ± tα/2 * S/√n

t0.005, 4 = 4.604

The other inputs remain the same as in the example above.

Therefore, C.I = 126.6 ± 4.604 * 4.219/√5

= 126.6 ± 8.687

The C.I for the mean is (117.9, 135.3).

As you might have observed, the interval widens as the level of confidence increases.

Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size.

Share:

Related Posts

Present Values and Future Values of Investments when compounding period is non-annual.

Some types of investments are known to accumulate interest more than once a...

Difference between Two Population Means

It’s common for analysts to be interested in establishing whether there exists a...