Which measure of central tendency would be a better choice to use if a data set has some extreme values?

The preferred measure of central tendency often depends on the shape of the distribution. Of the three measures of tendency, the mean is most heavily influenced by any outliers or skewness.

In a symmetrical distribution, the mean, median, and mode are all equal. In these cases, the mean is often the preferred measure of central tendency. 

Mean = Median = Mode Symmetrical

For distributions that have outliers or are skewed, the median is often the preferred measure of central tendency because the median is more resistant to outliers than the mean. Below you will see how the direction of skewness impacts the order of the mean, median, and mode. Note that the mean is pulled in the direction of the skewness (i.e., the direction of the tail).

Median Mean Mode Skewed to the left

Median Mean Mode Skewed to the right

What happens to the mean and median if we add or multiply each observation in a data set by a constant?

Consider for example if an instructor curves an exam by adding five points to each student’s score. What effect does this have on the mean and the median? The result of adding a constant to each value has the intended effect of altering the mean and median by the constant.

For example, if in the above example where we have 10 aptitude scores, if 5 was added to each score the mean of this new data set would be 87.1 (the original mean of 82.1 plus 5) and the new median would be 86 (the original median of 81 plus 5).

Similarly, if each observed data value was multiplied by a constant, the new mean and median would change by a factor of this constant. Returning to the 10 aptitude scores, if all of the original scores were doubled, the then the new mean and new median would be double the original mean and median. As we will learn shortly, the effect is not the same on the variance!

Looking Ahead!

Why would you want to know this? One reason, especially for those moving onward to more applied statistics (e.g. Regression, ANOVA), is the transforming data. For many applied statistical methods, a required assumption is that the data is normal, or very near bell-shaped. When the data is not normal, statisticians will transform the data using numerous techniques e.g. logarithmic transformation. We just need to remember the original data was transformed!!

Shape

The shape of the data helps us to determine the most appropriate measure of central tendency. The three most important descriptions of shape are Symmetric, Left-skewed, and Right-skewed. Skewness is a measure of the degree of asymmetry of the distribution.

Symmetric

  • mean, median, and mode are all the same here
  • no skewness is apparent
  • the distribution is described as symmetric
A symmetrical distribution.

Mean = Median = Mode Symmetrical

Left-Skewed or Skewed Left

  • mean < median
  • long tail on the left
A left skewed distribution.

Median Mean Mode Skewed to the left

Right-skewed or Skewed Right

  • mean > median
  • long tail on the right
A right skewed distribution.

Median Mean Mode Skewed to the right

Note! When one has very skewed data, it is better to use the median as measure of central tendency since the median is not much affected by extreme values.

Learning Outcomes

  • Recognize, describe, and calculate the measures of the center of data: mean, median, and mode.

By now, everyone should know how to calculate mean, median and mode. They each give us a measure of Central Tendency (i.e. where the center of our data falls), but often give different answers. So how do we know when to use each? Here are some general rules:

  1.  Mean is the most frequently used measure of central tendency and generally considered the best measure of it. However, there are some situations where either median or mode are preferred.
  2. Median is the preferred measure of central tendency when:
    1.  There are a few extreme scores in the distribution of the data. (NOTE: Remember that a single outlier can have a great effect on the mean). b.
    2. There are some missing or undetermined values in your data. c.
    3. There is an open ended distribution (For example, if you have a data field which measures number of children and your options are [latex]0[/latex], [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], [latex]4[/latex], [latex]5[/latex] or “[latex]6[/latex] or more,” than the “[latex]6[/latex] or more field” is open ended and makes calculating the mean impossible, since we do not know exact values for this field).
    4. You have data measured on an ordinal scale.
  3. Mode is the preferred measure when data are measured in a nominal ( and even sometimes ordinal) scale.

Which measure of central tendency would be a better choice to use if a data set has some extreme values Why?

What is the most appropriate measure of central tendency when the data has outliers? The median is usually preferred in these situations because the value of the mean can be distorted by the outliers.

What is the best measure of central tendency for this data set?

Skewed Distributions and the Mean and Median However, in this situation, the mean is widely preferred as the best measure of central tendency because it is the measure that includes all the values in the data set for its calculation, and any change in any of the scores will affect the value of the mean.

Which central tendency is affected by extreme values?

The mean is the measure of central tendency most likely to be affected by an extreme value. Mean is the only measure of central tendency which depends on all the values as it is derived from the sum of the values divided by the number of observations.

What is the best central tendency to use?

Mean is the most frequently used measure of central tendency and generally considered the best measure of it. However, there are some situations where either median or mode are preferred. Median is the preferred measure of central tendency when: There are a few extreme scores in the distribution of the data.