2

Given a population of [7, 14, 21, 28], I am sampling two data points at a time, calculating the (unbiased) sample variance, averaging the results, and expecting the mean of the results to be an estimate of the population's variance (61.25). But I'm not getting the right answer:

There are 6 equi-probable samples of two data points:

sample sample variance population variance of sample
[7, 14] 24.5 12.25
[7, 21] 98.0 49.0
[7, 28] 220.5 110.25
[14, 21] 24.5 12.25
[14, 28] 98.0 49.0
[21, 28] 24.5 12.25
/ mean = 81.67 mean=40.833 --> multiply that by 3/2 = 61.25

Why isn't the mean of the sample variances equal to the population variance 61.25? And why if I multiply the mean population variance of each sample by 3/2 I get the right population variance?

WalksB
  • 123

1 Answers1

2

Your 2-element samples are without replacement. Bessel-corrected sample variance will be unbiased for samples with replacement, that is, you should be choosing two elements $x_1$ and $x_2$ independently from the population.

Most importantly, you are missing the samples $[7,7], [14,14], [21,21], [28,28]$. They would have sample variance $0$. Since you are missing these zeros, your average sample variance is too large.

Include them, and also include the samples where $x_1>x_2$ (like $[14,7]$), and behold, your average sample variance is $61.25$.

As to the magnitude of the error: With replacement you have $16$ different samples. You were missing the $4$ that have zero sample variance. (You were also not considering the $x_1>x_2$ samples, but that does not matter here because of symmetry.) So you were overestimating the population variance by a factor of $16/12 = 4/3$. This checks out: your $81\frac{2}{3}$ is $(4/3) \times 61.25$.

As to the "population variances" (really: uncorrected sample variances) in the rightmost column: They have an additional bias (yes, the well-known bias of uncorrected sample variances), indeed they are too small by a factor of $(n-1)/n = 1/2$, where $n=2$ is the sample size. Compounding these two errors, their average is wrong by a factor of $(4/3) \cdot (1/2) = 2/3$. Which explains why multiplying by $3/2$ fixes both errors.