What do you call the measure of center that can be calculated by dividing the sum of all data by the total number of data?

Q: What is the measure of the center of the data?

The two most widely used measures of the center of the data are the mean (average) and the median.

Let me take you back to the days when you were an under-21 college student, figuring out who you were and what you wanted to be when you finally grew up. For some of you this may be a lifetime ago, and for others, it may have seemed as if those days happened yesterday (literally, yesterday).

Inhaltsverzeichnis Show

What is Central Tendency?
Locating the Measures of Central Tendency
Mean vs Median as Measures of Central Tendency
Finding the mode as the central tendency for continuous data
What is the Best Measure of Central Tendency—the Mean, Median, or Mode?
What is the measure of center called?
Which measure of center is the sum of a data set divided by the number of values it contains?
What is the measure of the center of the data?
What is the sum of the measures of the center?

Most college students must take one, if not two courses in mathematics during their college careers, regardless of their degree program. Most of the time, elementary statistics is the course selected, probably because it’s the easiest math elective to take for most people. In short, lots of people have an elementary knowledge of statistics. So, why are average-oriented metrics put on such a pedestal?

In elementary statistics, you most likely learned about the four measures of center and about outliers. If you don’t remember, that’s OK, it’s probably been a long time since, or you probably weren’t a math person and wanted to forget everything you had learned as quickly as possible.

The four measures of center are mean, median, mode, and midrange.

Mean – The mean is what you know as the average. It is calculated by taking all of the values in a set and dividing them by the total number of values in that set. The mean is very sensitive to outliers (more on outliers in a little bit).

Example: The mean of 1, 3, 5, 5, 5, 7, and 29 is about 7.8571.

Median – The median is not the same thing as the mean, even though in popular parlance, the two terms are often used interchangeably. The median is the number that is in the middle of a data set that is organized from lowest to highest or from highest to lowest. The median doesn’t represent a true average, but is not as greatly affected by the presence of outliers as is the mean.

Example: The median of 1, 3, 5, 5, 5, 7, and 29 is 5 (the number in the middle).

Mode – The mode is the number that repeats most often in a data set. It’s seldom used in statistics as a reliable measure of center.

Example: The mode of 1, 3, 5, 5, 5, 7, and 29 is 5 (it repeats 3 times – the other values only appear one time each).

Midrange – The midrange is calculated by adding the highest and lowest values of a data set together, and dividing the sum by 2. The midrange is hardly ever used as a measure of center.

Example: The midrange of 1, 3, 5, 5, 5, 7, and 29 is 15 (29 + 1 = 30; 30 / 2 = 15).

With four different measures of center, I’ve been able to come up with four different correct calculations for an average. Each measure of center has its benefits and present different sensitivities to the presence of outliers. Depending on the set of data, the measure of center may lose strength and implied value because of how it is calculated and how it is used.

Outliers – Outliers are numbers in a data set that are either way bigger or way smaller than the other numbers in a data set.

Example: In the 1, 3, 5, 5, 5, 7, and 29 data set, the number 29 is an outlier because of how much greater it is than all of the other numbers in the set. 29 is the only number that doesn’t “fit” in this set.

What is the meaning of all of this?
The meaning of all of this is to take your averages (average order value, average conversion rate, average time on site, and others) with a tiny grain of salt. Use average-oriented metrics cautiously and with skeptical optimism, as the presence of a mere few outliers in your data can distort the figures and not provide a true representation of what is really happening.

Take this extreme example of the revenue of five separate orders placed on a web site:

$4.94
$4.39
$7.01
$6.33
$553.93

Your “realistic” average order value here should be $5.67 (the four “normal” values added up and divided by four). But if we’re looking at a report from a web analytics tool, it would report the average order value as $115.32. Clearly, there is a massive difference between $5.67 and $115.32.

To obtain real insights that will help your web site and your organization, you’ll have to dive much deeper beyond the averages to really exact meaningful information and data. Know your measures of center and your outliers, so that you can decide if your averages are realistic representations of what’s happening on your web site.

Until next time, I will leave you with one of my favorite all-time quotes, which fits right into this topic. Think about it the next time you’re obsessing over averages:

“A statistician drowned while crossing a river that was on average six inches deep”.

What is Central Tendency?

Measures of central tendency are summary statistics that represent the center point or typical value of a dataset. Examples of these measures include the mean, median, and mode. These statistics indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of central tendency as the propensity for data points to cluster around a middle value.

In statistics, the mean, median, and mode are the three most common measures of central tendency. Each one calculates the central point using a different method. Choosing the best measure of central tendency depends on the type of data you have. In this post, I explore the mean, median, and mode as measures of central tendency, show you how to calculate them, and how to determine which one is best for your data.

Locating the Measures of Central Tendency

Most articles about the mean, median, and mode focus on how you calculate these measures of central tendency. I’ll certainly to that, but I’m going to start with a slightly different approach. My philosophy throughout my blog is to help you intuitively grasp statistics by focusing on concepts. Consequently, I’m going to start by illustrating the central point of several datasets graphically—so you understand the goal. Then, we’ll move on to choosing the best measure of central tendency for your data and the calculations.

The three distributions below represent different data conditions. In each distribution, look for the region where the most common values fall. Even though the shapes and type of data are different, you can find that central tendency. That’s the area in the distribution where the most common values are located. These examples cover the mean, median, and mode.

As the graphs highlight, you can see where most values tend to occur. That’s the concept. Measures of central tendency represent this idea with a value. Coming up, you’ll learn that as the distribution and kind of data changes, so does the best measure of central tendency. Consequently, you need to know the type of data you have, and graph it, before choosing between the mean, median, and mode!

Related posts: Guide to Data Types and How to Graph Them

Whether you’re using the mean, median, or mode, the central tendency is only one characteristic of a distribution. Another aspect is the variability around that central value. While measures of variability is the topic of a different article (link below), this property describes how far away the data points tend to fall from the center. The graph below shows how distributions with the same central tendency (mean = 100) can actually be quite different. The panel on the left displays a distribution that is tightly clustered around the mean, while the distribution on the right is more spread out. It is crucial to understand that the central tendency summarizes only one aspect of a distribution and that it provides an incomplete picture by itself.

Related post: Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

The mean is the arithmetic average, and it is probably the measure of central tendency that you are most familiar. Calculating the mean is very simple. You just add up all of the values and divide by the number of observations in your dataset.

The calculation of the mean incorporates all values in the data. If you change any value, the mean changes. However, the mean doesn’t always locate the center of the data accurately. Observe the histograms below where I display the mean in the distributions.

In a symmetric distribution, the mean locates the center accurately.

However, in a skewed distribution, the mean can miss the mark. In the histogram above, it is starting to fall outside the central area. This problem occurs because outliers have a substantial impact on the mean as a measure of central tendency. Extreme values in an extended tail pull the mean away from the center. As the distribution becomes more skewed, the mean is drawn further away from the center. Consequently, it’s best to use the mean as a measure of the central tendency when you have a symmetric distribution. More about this issue when we look at the mean vs median!

In statistics, we generally use the arithmetic mean, which is the type I discuss in this post. However, there are other types of means, such as the geometric mean. Read my post about the geometric mean to learn when it is a better measure.

When to use the mean: Symmetric distribution, Continuous data

Related posts: Using Histograms to Understand Your Data and What is the Mean?

Median

The median is the middle value. It is the value that splits the dataset in half, making it a natural measure of central tendency.

To find the median, order your data from smallest to largest, and then find the data point that has an equal number of values above it and below it. The method for locating the median varies slightly depending on whether your dataset has an even or odd number of values. I’ll show you how to find the median for both cases. In the examples below, I use whole numbers for simplicity, but you can have decimal places.

In the dataset with the odd number of observations, notice how the number 12 has six values above it and six below it. Therefore, 12 is the median of this dataset.

When there is an even number of values, you count in to the two innermost values and then take the average. The average of 27 and 29 is 28. Consequently, 28 is the median of this dataset.

Outliers and skewed data have a smaller effect on the mean vs median as measures of central tendency. To understand why, imagine we have the Median dataset below and find that the median is 46. However, we discover data entry errors and need to change four values, which are shaded in the Median Fixed dataset. We’ll make them all significantly higher so that we now have a skewed distribution with large outliers.

As you can see, the median doesn’t change at all. It is still 46. When comparing the mean vs median, the mean depends on all values in the dataset while the median does not. Consequently, when some of the values are more extreme, the effect on the median is smaller. Of course, with other types of changes, the median can change. When you have a skewed distribution, the median is a better measure of central tendency than the mean.

Related post: Skewed Distributions

Mean vs Median as Measures of Central Tendency

Now, let’s compare the mean vs median as measures of central tendency on symmetrical and skewed distributions to see how they perform. The histograms below allow us to compare these two statistics directly.

In a symmetric distribution, the mean and median both find the center accurately. They are approximately equal, and both are valid measures of central tendency.

In a skewed distribution, the outliers in the tail pull the mean away from the center towards the longer tail. For this example, the mean vs median differs by over 9000. The median better represents the central tendency for the skewed distribution.

These data are based on the U.S. household income for 2006. Income is the classic example of when to use the median instead of the mean because its distribution tends to be skewed. The median indicates that half of all incomes fall below 27581, and half are above it. For these data, the mean overestimates where most household incomes fall.

To learn more about incomes and their right-skewed distributions, read my post about Global Income Distributions.

Statisticians say that the median is a robust statistical while the mean is sensitive to outliers and skewed distributions.

When to use the median: Skewed distribution, Continuous data, Ordinal data

Related posts: Median Definition and Uses and What are Robust Statistics?

Mode

The mode is the value that occurs the most frequently in your data set, making it a different type of measure of central tendency than the mean or median.

To find the mode, sort the values in your dataset by numeric values or by categories. Then identify the value that occurs most often.

On a bar chart, the mode is the highest bar. If the data have multiple values that are tied for occurring the most frequently, you have a multimodal distribution. If no value repeats, the data do not have a mode. Learn more about bimodal distributions.

In the dataset below, the value 5 occurs most frequently, which makes it the mode. These data might represent a 5-point Likert scale.

Typically, you use the mode with categorical, ordinal, and discrete data. In fact, the mode is the only measure of central tendency that you can use with categorical data—such as the most preferred flavor of ice cream. However, with categorical data, there isn’t a central value because you can’t order the groups. With ordinal and discrete data, the mode can be a value that is not in the center. Again, the mode represents the most common value.

In the graph of service quality, Very Satisfied is the mode of this distribution because it is the most common value in the data. Notice how it is at the extreme end of the distribution. I’m sure the service providers are pleased with these results!

Learn more about How to Find the Mode.

Related post: Bar Charts: Using, Examples, and Interpreting

Finding the mode as the central tendency for continuous data

In the continuous data below, no values repeat, indicating this dataset has no mode for a measure of central tendency. With continuous data, it is unlikely that two or more values will be exactly equal because there are an infinite number of values between any two values.

When you are working with the raw continuous data, don’t be surprised if there is no mode. However, you can find the mode for continuous data by locating the maximum value on a probability distribution plot. If you can identify a probability distribution that fits your data, find the peak value and use it as the mode.

The probability distribution plot displays a lognormal distribution that has a mode of 16700. This distribution corresponds to the U.S. household income example in the median section.

When to use the mode: Categorical data, Ordinal data, Count data, Probability Distributions

What is the Best Measure of Central Tendency—the Mean, Median, or Mode?

When you have a symmetrical distribution for continuous data, the mean, median, and mode are equal. In this case, analysts tend to use the mean because it includes all of the data in the calculations. However, if you have a skewed distribution, the median is often the best measure of central tendency.

When you have ordinal data, the median or mode is usually the best choice. For categorical data, you must use the mode.

In cases where you are deciding between the mean vs median as the better measure of central tendency, you are also determining which types of statistical hypothesis tests are appropriate for your data—if that is your ultimate goal. I have written an article that discusses when to use parametric (mean) and nonparametric (median) hypothesis tests along with the advantages and disadvantages of each type.

Analysts frequently use measures of central tendency to describe their datasets. Learn how to Analyze Descriptive Statistics in Excel.

If you’re learning about statistics and like the approach I use in my blog, check out my Introduction to Statistics book! It’s available at Amazon and other retailers.

What is the measure of center called?

In a distribution with an odd number of observations, the median value is the middle value. Advantage of the median: The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.

Which measure of center is the sum of a data set divided by the number of values it contains?

The mean of a data set is the sum of the values divided by the number of values. The median of a data set is the middle value when the values are written in numerical order.