Introduction to Linear Regression
Linear regression is a mathematical method used for analyzing how the variation in... Read More
Measures of dispersion are used to describe the variability or spread in a sample or population. They are usually used in conjunction with measures of central tendency, such as the mean and the median. Specifically, measures of dispersion are the range, variance, absolute deviation, and standard deviation.
Measures of dispersion are important because they give us an idea of how well the measures of central tendency represent data. For example, if the standard deviation is large, then there are large differences between individual data points. Consequently, the mean may not be representative of the data.
Range is the difference between the highest and the lowest scores in a set of data, i.e.,
$$ \text{Range} = \text{Maximum value} – \text{Minimum value} $$
Consider the following scores of 10 level I candidates:
{78 56 67 51 43 89 57 67 78 50}
$$ \text{Range} = 89 – 43 = 46 $$
It is a measure of dispersion representing the average of the absolute values of the deviations of individual observations from the arithmetic mean. Therefore:
$$ \text{MAD} =\frac { \sum { |{ X }_{ i }-\bar { X } | } }{ n } $$
Remember that the sum of deviations from the arithmetic means is always zero, and that is why we are using absolute values.
Six financial analysts have reported the following returns on six different large-cap stocks over 2021:
{6% 7% 12% 2% 3% 11%}
Calculate the mean absolute deviation and interpret it.
Solution
First, we have to calculate the arithmetic mean:
$$ X =\cfrac {(6\% +7\% +12\% +2\% +3\% +11\%)}{6} = 6.83\% $$
Next, we can now compute the MAD:
$$ \begin{align*} \text{MAD} & = \cfrac {\left\{ |6\% – 6.83\%|+ |7\% – 6.83\%| + |12\% – 6.83\%| + |2\% – 6.83\%| + |3\% – 6.83\%| + |11\% – 6.83\%| \right\}} {6} \\ & =\cfrac {0.83+0.17+5.17+4.83+3.83+4.17}{6} \\ & = 3.17\% \\ \end{align*} $$
Interpretation: It means that, on average, an individual return deviates by 3.17% from the mean return of 6.83%.
The population variance, denoted by \(σ^2\), is the average of the squared deviations from the mean. Therefore:
$$ { \sigma }^{ 2 }=\frac { \sum { { \left( { X }_{ i }-\mu \right) }^{ 2 } } }{ N } $$
And the standard deviation is simply the square root of variance.
Working with data from the example above, the variance will be calculated as follows:
$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 6\%-6.83 \%\right) }^{ 2 }+{ \left( 7\%-6.83\% \right) }^{ 2 }+{ \left( 12\%-6.83\% \right) }^{ 2 }+{ \left( 2\%-6.83\% \right) }^{ 2 }+{ \left( 3\%-6.83\% \right) }^{ 2 }+{ \left( 11\%-6.83\% \right) }^{ 2 } \right\} }{ 6 } \\ & = 13.81(\%^2) \\ & = 0.001381 \\ \end{align*} $$
Therefore, the average variation from the mean of 0.12 is 0.001381.
The standard deviation is \(0.001381^{ 0.5 } = 0.0372\) or \(3.72\%\).
Analysts use the standard deviation to interpret returns instead of the variance since it is much easier to comprehend.
The sample variance, \(S^2\), is the dispersion measure that applies when we are working with a sample as opposed to a population.
$$ { S }^{ 2 }=\frac { \left\{ \sum { { \left( { X }_{ i }- \bar { X } \right) }^{ 2 } } \right\} }{ n-1 } $$
Note that we are dividing by \(n – 1\). This is necessary to remove bias.
The sample standard deviation, \(S\), is simply the sample variance’s square root.
Assume that the returns realized in the previous example were sampled from a population comprising 100 returns. The sample mean and the corresponding sample variance is closest to:
Solution
The sample mean will still be 6.83%.
Hence,
$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 6\%-6.83 \%\right) }^{ 2 }+{ \left( 7\%-6.83\% \right) }^{ 2 }+{ \left( 12\%-6.83\% \right) }^{ 2 }+{ \left( 2\%-6.83\% \right) }^{ 2 }+{ \left( 3\%-6.83\% \right) }^{ 2 }+{ \left( 11\%-6.83\% \right) }^{ 2 } \right\} }{ 5 } \\ & = 16.57(\%^2) \\ & = 0.001657 \\ \end{align*} $$
Therefore,
$$ \begin{align*} S & = 0.001657^{\frac {1}{2}} \\ & = 0.0407 \end{align*} $$
Question
You have been given the following data:
{12 13 54 56 25}
Assuming that this is complete data from a certain population, the population standard deviation is closest to:
A. 19.34.
B. 374.00.
C. 1,870.00.
The correct answer is A.
$$ \mu =\cfrac {(12 + 13 + \cdots +25)}{5} =\cfrac {160}{5} = 32 $$
Hence,
$$ \begin{align*} { \sigma }^{ 2 } & =\frac { \left\{ { \left( 12-32 \right) }^{ 2 }+{ \left( 13-32 \right) }^{ 2 }+{ \left( 54-32 \right) }^{ 2 }+{ \left( 56-32 \right) }^{ 2 }+{ \left( 25-32 \right) }^{ 2 } \right\} }{ 5 } \\ & =\cfrac {1870}{5} = 374 \\ \end{align*} $$
Therefore,
$$ \sigma =\sqrt{374} = 19.34 $$