# Level 1 CFA® Exam:

Measures of Location

In this lesson, we're going to deal with measures of central tendency and measures of location.

Measures of location are a broader category that includes measures of central tendency. Both measures of central tendency and measures of location allow us to characterize an entire population or its sample.

Population consists of all elements of a group. We can see it as a set of all the members of the group we're interested in. A descriptive characteristic of a population is called a parameter. A parameter can be for example a mean value.

Sample is a subset of a population. A sample is usually selected randomly and a random sample is another key term in statistics. A sample is described by the so-called sample statistic (statistic for short), for example, a sample mean.

Measures of central tendency help us determine what the center of analyzed data is. The most common types of the measures of central tendency include the arithmetic mean, the mode, and the median. Measures of central tendency are a sub-type of measures of location.

Measures of location are a broader concept than measures of central tendency. Measures of location provide us with information on observations in different locations, not only in the center. Since measures of location include measures of central tendency, they tell us where data are centered but they also provide information on data location (or distribution).

Measures of central tendency and measures of location help to determine the similarities between the elements of a dataset, whereas the differences are examined using measures of dispersion, skewness, and peakedness, which you're going to learn about in the next lessons.

For now, let's focus on measures of central tendency.

The measures of central tendency that we're going to discuss for your level 1 CFA exam include:

- arithmetic mean,
- median,
- mode,
- weighted mean,
- geometric mean, and
- harmonic mean.

The arithmetic mean can be most simply defined as the sum of all observations divided by the number of observations.

If we're dealing with a sample, the arithmetic mean can be computed using the following formula:

\(\bar{X}=\sum_{i=1}^n\frac{X_i}{n}\)

- \(\bar{X}\) - sample mean
- n - number of observations in a sample
- \(X_i\) - i observation value

Note: When you multiply the arithmetic mean by the number of observations, the result will be the sum of the observations. Also, remember that the sum of deviations of individual elements of a set from the mean equals zero.

The arithmetic mean is a popular and frequently used measure, as it is easy to calculate and can be interpreted intuitively. We should remember, however, that it doesn't always properly reflect the characteristics of a dataset. One of the reasons is that the arithmetic mean takes into account all elements of a dataset including outliers.

Outliers are extreme observations, that is observations extremely different from the majority of observations for a variable. So, outliers take either extremely high or extremely low values.

In the case of a large difference between the highest or the lowest value and the central value, the arithmetic mean can also be very high or very low, which may distort the characteristics of the examined data.

On the other hand, the fact that the mean includes all elements of a dataset is an advantage when compared to such measures of central tendency as the mode or the median.

When we detect outliers, the first thing we should do is to check for possible errors in our data. We might draw outliers from a different population or erroneously record an outlier. If there are no errors, we generally have two options – either we leave the data as it is, namely we don’t remove the outliers, or we remove the outliers.

If we decide to remove outliers, two solutions come in handy:

- trimmed mean, and
- winsorized mean.

The following example shows how the trimmed mean and winsorized mean are calculated.

Our sample dataset includes the following 60 numbers:

-111, -33, -12, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 327, 576, 5012

You notice the outliers in the dataset and decide to remove them. How will the dataset for 10% trimmed mean and 90% winsorized mean look like?

(...)

Another measure of central tendency is the median.

The median is the value of the middle element of a set. Therefore, a median divides a dataset into two equal parts. Half of the observations are smaller than the median and the other half is greater.

- When there is an even number of elements in a set, the median is the arithmetic mean of the two neighboring middle numbers.
- When there is an odd number, the median is the element in the middle.

An advantage of the median is the fact that it ignores extreme values, so it is insensitive to any extreme deviations.

What is more, we can determine the median even if we don't know all observations precisely. To find it, we need to sort the items in the set from the smallest to the greatest value. Then, we need to determine the location of the middle value. When there is an odd number of observations, the median equals the value of the item in the middle. So, for a number of observations equal to \(n\), where \(n\) is an odd number, the middle value is determined using the following formula:

\(\text{position of middle value}=\frac{n+1}{2}\)

Again, when there is an odd number of observations, the location of the middle value is clearly defined and all we need to do is read the value from the set.

However, when the number of observations is even, we need to take the arithmetic mean of the two values closest to the middle. Where this is the case, the median can be calculated using the following formula:

\(\text{median}=\frac{\text{value in 'n' position}+\text{value in 'n+1' position}}{2}\)

(...)

Now, let's discuss a few other measures of central tendency, that is:

- the weighted mean,
- the geometric mean, and
- the harmonic mean.

The weighted mean is often used in portfolio analysis or the portfolio approach.

For example, we can calculate the rate of return achieved by a mutual fund that holds securities of 10 different companies. Knowing the rates of return on the individual stocks, we also need to know their weights in the portfolio to be able to compute the weighted mean return.

The weighted mean equals the sum of products of the values of observations and their weights:

\(\bar{X}_w=\sum_{i=1}^n{w_i\times X_i}\)

- \(\bar{X}_w\) - weighted mean
- \(w_i\) - weight of i observation
- \(X_i\) - value of i observation
- n - number of observations in a sample
- \(\sum_{i=1}^n{w_i}=1\)

The weighted mean is often used in portfolio analysis.

Note that the sum of all weights must always equal 1.

In the case of the weighted mean return, by the value of observations we mean the returns on the individual stocks.

Another measure of central tendency is the geometric mean. The geometric mean is often used to compute the average rate of return over a series of periods or to calculate the growth rate. The geometric mean of a set of observations is given by the following formula:

\(G=\sqrt[n]{\prod_{i=1}^nX_i}\)

- G - geometric mean
- n - number of observations
- \({X_i}\) - value of i observation

The geometric mean (G) is always less or equal to arithmetic mean (A) and greater or equal to harmonic mean (H):

\(H\le{G}\le{A}\)

Of course every observation should be greater than or equal to zero.

Very often we use the geometric mean while analyzing rates of return. When calculating a rate of return, the formula for the geometric mean looks as follows:

\(R_G=\sqrt[T]{\prod_{t=1}^T(1+R_t)}-1\)

- \(R_G\) - geometric mean return
- T - number of periods
- \(R_t\) - return in period t

The geometric mean is often used to compute the average rate of return over a series of periods or to calculate the growth rate.

The last type of mean we're going to discuss here is the harmonic mean. The harmonic mean has fewer applications than the arithmetic mean and the geometric mean. It can be used, however, to determine the average purchase price paid for stocks if we bought them in several periods for the same amount or to calculate the average time necessary for the production of a given product. The harmonic mean applies reciprocals of the values of observations. It can be represented with this expression:

\(\bar{X}_H=\frac{n}{\sum_{i=1}^n\frac{1}{X_i}}\)

- \(\bar{X}_H\) - harmonic mean
- \(X_i\) - value of i observation, \(X_i>0\)
- n - number of observations

**1) **The harmonic mean can be used e.g. to determine the average purchase price paid for stocks bought in several months for the same monthly budget OR to calculate the average time necessary for a given product to be produced.

**2) **Harmonic mean (H) is always less or equal to geometric mean (G), which is always less or equal to arithmetic mean (A).

\(H\le{G}\le{A}\)

The harmonic mean equals the number of observations divided by the sum of the inverse values of observations.

(...)

Measures of location are a broader concept than measures of central tendency. The latter, as their name suggests, refer to the middle of the data. Measures of location provide us with information on observations in different locations, not only in the center. Therefore, measures of central tendency are a sub-type of measures of location.

For your level 1 CFA exam, we're going to take a look at measures of location such as:

- percentiles,
- quartiles,
- quintiles, and
- deciles.

### Percentiles

Let's start with percentiles. A given percentile is a value below which a given percentage of observations is located. Percentiles divide a set into a hundred parts. For example, if we take the 30th percentile, 30% of observations have a lower value and 70% a higher value than this percentile.

The location of a percentile can be determined as follows:

(...)

### Quintiles

A quintile is a fifth of a population. For example:

The first quintile is a value below which 20% of the dataset is located. Note that, for example, we can calculate the location of the third quintile as the location of the 60th percentile and the location of the fourth quintile as the location of the 80th percentile.

### Deciles

The last measure we're going to discuss is the decile. A decile is a tenth of the population.

Take the following example. The fourth decile is equal to the value of the 40th percentile. The fifth decile is the median of the population. Below the 7th decile, there is 70% of the dataset and 30% of a dataset has a higher value than the decile.

Suppose you have the following data on the annual rates of return of a fund.

Year | Rate of return (%) |
---|---|

2014 | 4 |

2015 | 15 |

2016 | 31 |

2017 | 22 |

2018 | -19 |

2019 | -9 |

2020 | 12 |

2021 | 4 |

2022 | -1 |

Calculate and interpret the rate of return as the arithmetic mean and geometric mean. Then, determine the median, the mode, the first quartile, and the eighth decile.

(...)

- Measures of central tendency and measures of location help to determine the similarities between the elements of a dataset.
- The arithmetic mean is the sum of all observations divided by the number of observations.
- The sum of deviations of individual elements of a set from the mean equals zero.
- Outliers are extreme observations, that is observations extremely different from the majority of observations for a variable.
- If we decide to remove outliers we can either use trimmed mean and winsorized mean.
- The median is the value of the middle element of a set.
- The mode is the most frequently occurring element of a set.
- If we can identify one most frequent value, we're dealing with unimodal distribution.
- The weighted mean is often used in portfolio analysis or the portfolio approach.
- The harmonic mean can be used to determine the average purchase price paid for stocks if we bought them in several periods for the same amount.
- A given percentile is a value below which a given percentage of observations is located.
- To find the location of a percentile we need to sort the data set in ascending order.