Level 1 CFA® Exam:
Measures of Dispersion
Measures of dispersion include range, mean absolute deviation (MAD), variance, and standard deviation.
Measures of dispersion tell us about how observations are dispersed around the mean.
Let's take the following example. There are two sets of numbers. Set 1 includes numbers: 1, 5, 5, and 9, and Set 2 includes numbers: 4, 5, 5, and 6. For both sets, the arithmetic mean, the median, and the mode are the same and equal to 5. However, the two sets differ from each other. We intuitively know that the observations in Set 2 are more concentrated and the observations in Set 1 are more dispersed. Measures of dispersion allow us to quantify the dispersion of observations.
Note that the lower the dispersion in a dataset, the better measures of central tendency reflect the characteristics of the set and the more useful they are in its analysis. If all the observations were of the same value, each of them would be equal to the arithmetic mean and if we randomly picked any observation from the set, we would know for sure what number it would be.
This level of certainty is, however, scarce. Lack of certainty is intuitively connected to risk. In other words, the more dispersed a dataset, the more uncertain we feel, which means that more risk is involved.
If we randomly select one number from Set 1, we can get a value equal to the mean which is 5, but also an extremely smaller value that is 1, or an extremely greater number that is 9. If we randomly select a number from Set 2, we can also get a number equal to the mean, namely 5, but also either of the two numbers close to the mean, that is 4 or 6.
If the numbers in Set 1 and Set 2 were rates of return from investments, a reasonable person would choose Set 2, as both sets have the same arithmetic mean, but the volatility and dispersion of Set 2 are smaller than that of Set 1. That is why also the risk involved in such an investment is lower.
Range
Range is defined as the difference between the largest and the smallest value in a dataset. It's expressed with the following formula:
Theoretically, the greater the range, the more dispersed the observations. Note, however, that range is based only on the two extreme values, which is why it's a deeply flawed measure.
Mean Absolute Deviation
Mean absolute deviation is the arithmetic mean of the absolute values of the deviations of the individual elements of the dataset around the mean and is expressed with the following formula:
(...)
Undoubtedly, variance and standard deviation are the most popular measures of dispersion. They are commonly used in many fields of study. For example, in the portfolio theory, variance reflects investment risk and is the measure of the volatility of rates of return.
Variance
Variance is the mean of the squared deviations of individual observations around the mean.
Variance is difficult to interpret, but it's often easier to work with during complicated computations. Still, market practitioners prefer standard deviation, which is the square root of variance.
Sample variance is expressed like this:
Standard Deviation
Standard deviation is the square root of the variance, so it's expressed like this:
As you can see variance and standard deviation are calculated based on all observations in a data set. It's one of the qualities that make a variance and standard deviation better measures than, for example, range.
Both measures, that is standard deviation and variance, include the sum of squared deviations around the mean. The square is there for the same reasons for which the MAD formula involves absolute values. Each squared real number is positive.
Variance is difficult to interpret, as it's a squared value. When extracting the root of the variance, or calculating standard deviation, we return to more intuitively comprehensible values. Both variance and standard deviation tell us about how much observations are dispersed around the mean.
Standard deviation informs us about how much on average the elements of a set differ from the mean.
Let's now return to our example from the beginning of the lesson. Let’s assume that Set 1 and Set 2 are entire populations and not samples (in this case we will use \(n\) instead of \(n-1\) in the denominator).
For Set 1, which includes numbers 1, 5, 5, and 9, the standard deviation equals:
\(\sigma=\sqrt{\frac{(1-5)^2+(5-5)^2+(5-5)^2+(9-5)^2}{4}}=\sqrt{\frac{(-4)^2+(4)^2}{4}}=\sqrt{8}=2.83\)
Whereas for Set 2, including numbers 4, 5, 5, and 6, the standard deviation equals:
\(\sigma=\sqrt{\frac{(4-5)^2+(5-5)^2+(5-5)^2+(6-5)^2}{4}}=\sqrt{\frac{(-1)^2+(1)^2}{4}}=\sqrt{0.5}=0.71\)
We said before that Set 2 is less risky. Now, our intuitions have been proved right by statistical tools. Set 2 is characterized by lower average dispersion. On average, the items in Set 2 deviated from the mean by 0.71. In the case of Set 1, the standard deviation equals as much as 2.83, so on average the items in Set 1 deviated from the mean by 2.83. In other words, the values of the set including 4, 5, 5, and 6 are more concentrated around the mean than the elements of the set including 1, 5, 5, and 9.
Suppose you have the following data on the annual rates of return of a fund.
Year | Rate of return (%) |
---|---|
2014 | 4 |
2015 | 15 |
2016 | 31 |
2017 | 22 |
2018 | -19 |
2019 | -9 |
2020 | 12 |
2021 | 4 |
2022 | -1 |
Calculate and interpret the range, MAD, sample variance, and sample standard deviation.
(...)
- Measures of dispersion tell us about how observations are dispersed around the mean.
- Range is defined as the difference between the largest and the smallest value in a dataset.
- Mean absolute deviation (MAD) is the arithmetic mean of the absolute values of the deviations of the individual elements of the dataset around the mean.
- Variance is the mean of the squared deviations of individual observations around the mean.
- Standard deviation is the square root of variance.