Measures of Dispersion in Statistics
Measures of dispersion play an important role in any data set. These measures go along with the measures of central tendency and show you the variability of your data.
Measures of central tendency will show you the different ways you can group your data. They’re good for figuring out how the different variables operate in a specific sample or group of people. The three basic things they can tell you are the median, mean, and range.
Measures of dispersion go hand in hand with the measures of central tendency. They’re also essential to reading any data set because they show you how variable your data is. Their important role in statistics has been reinforced by Wild and Pfannkuch (1999).
According to them, our perception of the variability of the data is one of the basic components of statistical thinking. The way we perceive the variability gives us information on the dispersion, or spread of the data, in terms of a mean or a median.
A mean, or an average, is very common in statistics. But they’re easy to misinterpret. This happens specifically when there’s a wide dispersion of values in the variable. That’s when the measures of dispersion come into play (2).
There are three important components in measures of dispersion that are related to random variability (2):
- The perception of how common they are in the world around you.
- Whether there are competing explanations.
- The ability to quantify them (which means understanding the concept of dispersion and knowing how it applies).
What Are Measures of Dispersion For?
Measures of dispersion are important in any statistical study when you’re trying to draw conclusions from data. This is because they play a direct role on the margin of error you’re working with. The greater the dispersion in a sample, the more space you’ll need to work within that margin.
They can also help you figure out if your data is far from its central tendency. What that does is show you whether or not your central tendency is actually a good way to represent the people you’ve sampled for your study. This is very helpful when it comes to comparing distributions and understanding the risks of making certain decisions (1).
To sum up, the greater the dispersion, the less representative your central tendency is. Here are the most common measures of dispersion:
- Average deviation
- Standard deviation
- Coefficient of variation (or relative standard deviation)
How Each of Them Works
Range is generally best for making your first comparisons because it only looks at the two extremes of your data. This is also why it’s generally only worth doing with small sample sizes (1). The basic definition of range is: the difference between the first and last data.
Then, there’s average deviation. It’s helpful because it can show you where the data would be if it was all the same exact distance from the mean (1). The deviation of a number from the variable is the difference between the absolute value of that variable and the mean. So, the average deviation is basically just the average of all the deviations (3).
Variance is the algebraic function for all the values, and it’s perfect for inferential statistics (1). The variance is basically the square of the deviations.
Standard deviation is the most common measure of dispersion for any samples taken from the same group of people (1). It’s the square root of the variance (3).
Coefficient of Variation
This measure is mainly used to compare the variation between two sets of data put into separate groups. For example, if you were to get information about the height and weight of students at a school. It can help you figure out what specific distribution shows the highest grouping of the data, for a more representative measurement
The coefficient of variation is the most representative of all the measures of dispersion we’ve talked about because it gives you an abstract number. In other words, it’s independent of the variables in your groups. You’ll generally see the coefficient of variation shown as a percentage (3).
These measures of dispersion are ways to see how much variability there is in your sample. They’ll also tell you how representative your central tendency is. If the variability is low, that means your data is relatively close to that tendency and is a good representation of the overall data set.
On the other hand, if you have a high level of variability, that means the data is spread out rather than concentrated. High variability means a central tendency that isn’t very representative. If this is the case, you’ll need to pull from a bigger pool of data. Having more data will reduce the variability, which was the basic cause of your large margin of error.