###### Assumptions Underlying Linear Regression

Assume that we have samples of size \(n\) for dependent variable \(Y\) and... **Read More**

Measures of dispersion are used to describe the variability or spread in a sample or population. They are usually used in conjunction with measures of central tendency, such as the mean and the median. Specifically, measures of dispersion are the range, variance, absolute deviation, and standard deviation.

Measures of dispersion are important because they give us an idea of how well the measures of central tendency represent data. For example, if the standard deviation is large, then there are large differences between individual data points. Consequently, the mean may not be representative of the data.

Range is the difference between the highest and the lowest scores in a set of data, i.e.,

$$ \text{Range} = \text{Maximum value} – \text{Minimum value} $$

Consider the following scores of 10 level I candidates:

{78 56 67 51 43 89 57 67 78 50}

$$ \text{Range} = 89 – 43 = 46 $$

- The range is easy to compute.

- The range may not be considered a reliable method of dispersion. Besides, it does not tell anything about the shape of the distribution because it is based on only two pieces of information from the distribution.
- The range is sensitive to outliers.

It is a measure of dispersion representing the **average of the absolute values** of the deviations of individual observations from the arithmetic mean. Therefore:

$$ \text{MAD} =\frac { \sum { |{ X }_{ i }-\bar { X } | } }{ n } $$

Remember that the sum of deviations from the arithmetic means is always zero, and that is why we are using **absolute** **values**.

Six financial analysts have reported the following returns on six different large-cap stocks over 2021:

{6% 7% 12% 2% 3% 11%}

Calculate the mean absolute deviation and interpret it.

**Solution**

First, we have to calculate the arithmetic mean:

$$ X =\cfrac {(6\% +7\% +12\% +2\% +3\% +11\%)}{6} = 6.83\% $$

Next, we can now compute the MAD:

$$ \begin{align*} \text{MAD} & = \cfrac {\left\{ |6\% – 6.83\%|+ |7\% – 6.83\%| + |12\% – 6.83\%| + |2\% – 6.83\%| + |3\% – 6.83\%| + |11\% – 6.83\%| \right\}} {6} \\ & =\cfrac {0.83+0.17+5.17+4.83+3.83+4.17}{6} \\ & = 3.17\% \\ \end{align*} $$

* Interpretation*: It means that, on average, an individual return deviates by 3.17% from the mean return of 6.83%.

The population * variance*, denoted by \(σ^2\), is the average of the squared deviations from the mean. Therefore:

$$ { \sigma }^{ 2 }=\frac { \sum { { \left( { X }_{ i }-\mu \right) }^{ 2 } } }{ N } $$

And the * standard deviation* is simply the square root of variance.

Working with data from the example above, the variance will be calculated as follows:

$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 6\%-6.83 \%\right) }^{ 2 }+{ \left( 7\%-6.83\% \right) }^{ 2 }+{ \left( 12\%-6.83\% \right) }^{ 2 }+{ \left( 2\%-6.83\% \right) }^{ 2 }+{ \left( 3\%-6.83\% \right) }^{ 2 }+{ \left( 11\%-6.83\% \right) }^{ 2 } \right\} }{ 6 } \\ & = 13.81(\%^2) \\ & = 0.001381 \\ \end{align*} $$

Therefore, the average variation from the mean of 0.12 is 0.001381.

The standard deviation is \(0.001381^{ 0.5 } = 0.0372\) or \(3.72\%\).

Analysts use the standard deviation to interpret returns instead of the variance since it is much easier to comprehend.

The sample variance, \(S^2\), is the dispersion measure that applies when we are working with a sample as opposed to a population.

$$ { S }^{ 2 }=\frac { \left\{ \sum { { \left( { X }_{ i }- \bar { X } \right) }^{ 2 } } \right\} }{ n-1 } $$

Note that we are dividing by \(n – 1\). This is necessary to remove **bias**.

The sample standard deviation, \(S\), is simply the sample variance’s square root.

Assume that the returns realized in the previous example were sampled from a population comprising 100 returns. The sample mean and the corresponding sample variance is *closest* to:

**Solution**

The sample mean will still be 6.83%.

Hence,

$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 6\%-6.83 \%\right) }^{ 2 }+{ \left( 7\%-6.83\% \right) }^{ 2 }+{ \left( 12\%-6.83\% \right) }^{ 2 }+{ \left( 2\%-6.83\% \right) }^{ 2 }+{ \left( 3\%-6.83\% \right) }^{ 2 }+{ \left( 11\%-6.83\% \right) }^{ 2 } \right\} }{ 5 } \\ & = 16.57(\%^2) \\ & = 0.001657 \\ \end{align*} $$

Therefore,

$$ \begin{align*} S & = 0.001657^{\frac {1}{2}} \\ & = 0.0407 \end{align*} $$

QuestionYou have been given the following data:

{12 13 54 56 25}

Assuming that this is complete data from a certain population, the population standard deviation is

closestto:A. 19.34.

B. 374.00.

C. 1,870.00.

The correct answer is

A.$$ \mu =\cfrac {(12 + 13 + \cdots +25)}{5} =\cfrac {160}{5} = 32 $$

Hence,

$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 12-32 \right) }^{ 2 }+{ \left( 13-32 \right) }^{ 2 }+{ \left( 54-32 \right) }^{ 2 }+{ \left( 56-32 \right) }^{ 2 }+{ \left( 25-32 \right) }^{ 2 } \right\} }{ 5 } \\ & =\cfrac {1870}{5} = 374 \\ \end{align*} $$

Therefore,

$$ \sigma =\sqrt{374} = 19.34 $$