Variance describes the distance between each data point and the center. Statisticians realized quickly that if they just added each distance, then the total distance would be 0.
Data Value | Score | Distance from Mean |
Person1 | 100 | 100-95 = 5 |
Person2 | 95 | 95 – 95 = 0 |
Person3 | 90 | 90 – 95 = -5 |
Mean of Scores: 95
Instead of adding 5 + 0 + -5 = 0, statisticians square each distance. This also makes small distances smaller and big distances bigger. So, this variance calculation better describes how far each datapoint is from the mean.
Data Value | Score | Squared distance from Mean |
Person1 | 100 | (100-95)2 = 25 |
Person2 | 95 | (95 – 95)2 = 0 |
Person3 | 90 | (90 – 95)2 = 25 |
Now the total distance is 25 + 0 + 25 = 50, and the average distance is 50 divided by 3, about 16.67.
Here is an example of finding the variance within a population of two people.
Population of 2 | Distance between data point and mean | Squared Distance Between data point and mean | |
John | 15 | 15-25= -10 | (15-25)2= 100 |
Pam | 35 | 35-15= 10 | (35-25)2= 100 |
Mean: | 25 | Total Squared Distance | 100+100=200 |
Average Squared Distance | 200 ÷ 2 = 100 |
The mean is 25 so the squared distance for both people are 100. The average distance is (100 + 100) ÷ 2 = 100.
This second example has a population of 4 people, and the mean is still 25.
Population of 4 | Distance between data point and mean | Squared Distance Between data point and mean | |
John | 15 | 15-25= -10 | (15-25)2= 100 |
Pam | 35 | 35-15= 10 | (35-25)2= 100 |
Ann | 30 | 15-25= -10 | (30-25)2= 25 |
Tommy | 20 | 35-15= 10 | (20-25)2= 25 |
Mean: | 25 | Total Squared Distance | 100+100+25+25=250 |
So the population variance decreases to 62.5 because the two new scores are closer to the mean. Your question is based on the idea that the average of 100, 100, 25, 25 is equal to the average of 62.5, 62.5, 62.5, 62.5. We could represent this in the calculation of the total squared distances.
Population of 4 | Squared Distance between data point and mean | Hypothetical Squared Distance | |
100 | 62.5 | ||
100 | 62.5 | ||
25 | 62.5 | ||
25 | 62.5 | ||
Sum: | 250 | 250 | |
Mean: | 250 ÷ 4 = 62.5 | 250 ÷ 4 = 62.5 |
So, if you know the population variance is 25. Then, you could estimate on average that each person has an individual variance of 25. In a sample of 20, the table would look like this:
Population of 20 | Squared Distance between data point and mean | Hypothetical Squared Distance | |
Person 1 | Data Unknown | 25 | |
Person 2 | 25 | ||
Person 3 | Actual distance unknown | 25 | |
Person 4 | 25 | ||
Person 5 | 25 | ||
Person 6 | 25 | ||
Person 7 | 25 | ||
Person 8 | 25 | ||
Person 9 | 25 | ||
Person 10 | 25 | ||
Person 11 | 25 | ||
Person 12 | 25 | ||
Person 13 | 25 | ||
Person 14 | 25 | ||
Person 15 | 25 | ||
Person 16 | 25 | ||
Person 17 | 25 | ||
Person 18 | 25 | ||
Person 19 | 25 | ||
Person 20 | 25 | ||
Sum: | 500 | ||
Mean: | 25 |
This means that the Sum of Squared Distances (SS) is 500, on average. If we wanted to calculate a sample deviation, then we would divide SS/df, 500 ÷ 19 = 26.32.
The question is hard conceptually because it isn’t realistic. You almost never know the population variance, and it isn’t used to estimate the variance in theoretical samples.
But, you can calculate SS by estimating that every person in the sample has the population variance. Then, calculating that sum.
If you have any further questions, just shoot me a text! Khan Academy walks through these calculations step by step.