Standard Deviation with Mean Example
Dylan | Aug 17, 2020
Standard deviation is one way to measure the spread in a dataset. For example, if we went out on the street and measured the height of five random strangers, we might get the following sample (72, 74, 67, 71, 64). The mean (average) of our sample is 69.6 and after some calculations, we would discover our standard deviation, represented by the greek the lowercase letter Sigma σ, equals 4.04. What exactly does σ tell us?
The Empirical Rule
The Empirical Rule states that for a normal distribution, nearly all of the data will be within three standard deviations of the mean. This is also commonly referred to as the 68-95-99.7 Rule due to the following:
- 68% of the data will fall within one standard deviation from the mean
- 95% of the data will fall within two standard deviations from the mean
- 99.7% of the data will fall within three standard deviations from the mean
The centerline represents the mean, and assuming the heights are strangers are normally distributed, we would find that based on our sample, 68% of heights should fall between the range (mean-σ, mean+σ), (69.6-4.04, 69.6+4.04), or between 65.56 and 73.64 inches.
Following the same rule, 95% of heights should fall between the range (mean-2σ, mean+2σ) or 61.52 and 77.68 inches and 99.7% should stand within (mean-3σ, mean+3σ) or 57.48 and 81.72 inches.
Calculating Standard Deviation
The formula used to calculate standard deviation depends on whether your dataset represents an entire population or just a sample. Often in Statistics, we do not have access to every person or thing in a population. In this case, we will use the formula to calculate the sample standard deviation. The two formulas are almost identical, with the sample population having one additional step. Let’s explore both formulas below.
Population Standard Deviation
Imagine a fifth-grade teacher has measured the heights of her class, 10 students in total. She recorded the following heights (in inches) 62, 61, 59, 64, 60, 54, 57, 53, 60, 55. Now she would like to know the standard deviation of her student’s heights. To calculate this, she will need to perform four steps.
Step One: Calculate the Mean
The mean is represented by the Greek letter μ (pronounced "mu"). She calculates this by summing all of the heights and dividing by the total number of students she measured.
62+61+59+64+60+54+57+53+60+55 = 585
μ = 585 / 10 = 58.5
She determines the average height is 58.5 inches.
Step Two: Square each number after subtracting the mean from it
Take each of the heights and subtract the mean from it before squaring the resulting number.
(62-58.5)² = 3.5² = 12.25
(61-58.5)² = 2.5² = 6.25
(59-58.5)² = 0.5² = 0.25
(64-58.5)² = 5.5² = 30.25
(60-58.5)² = 1.5² = 2.25
(54-58.5)² = -4.5² = 20.25
(57-58.5)² = -1.5² = 2.25
(53-58.5)² = -5.5² = 30.25
(60-58.5)² = 1.5² = 2.25
(55-58.5)² = -3.5² = 12.25
Step Three: Calculate the mean of the values from Step Two
Just like she did earlier to find the mean height, she will now sum each of the resulting values calculated in the previous step and divide by the number of students.
12.25+6.25+0.25+30.25+2.25+20.25+2.25+30.25+2.25+12.25 = 118.5
μ = 118.5 / 10 = 11.85
Good work so far! I promise we’re almost finished.
Step Four: Calculate the square root of Step Three
This value that we calculated in Step Three is known as the variance; however, we’re after the standard deviation which happens to just be the square root of the variance.
σ = √11.85 = 3.44
Interpreting the Results
If you remember the Empirical Rule from earlier, we can now take the σ of student heights to determine that 68% of student heights should fall between plus or minus one standard deviation from the mean height.
(58.5-3.44, 58.5+3.44): 68% of heights between 55.06 and 61.94 inches
Likewise, 95% of student heights should fall between plus or minus two standard deviations from the mean height.
(58.5-2*3.44, 58.5+2*3.44): 95% between 51.62 and 65.38 inches
Finally, following the same pattern, we can assume that roughly 99.7% of student heights will fall between plus or minus three standard deviations from the mean.
(58.5-3*3.44, 58.5+3*3.44): 99.7% between 48.18 and 68.82 inches
Note: If you do not have the complete data on your population, the formula changes slightly. To compute a sample standard deviation, you must subtract 1 from your sample size before computing the sample mean in Step Four. This caveat has to do with degrees of freedom.
I hope you found this post easy to follow along with and helpful. Please let me know if anything remains unclear. I always do my best to answer any questions in the comments below. Thanks for reading and enjoy applying standard deviation to your data science projects in the wild!