1

I still get a bit confused when calculating the variance of a sample:

Suppose I am given a value for $\sum_{i=1}^n x_i$, $\sum_{i=1}^n x_i^2$ and $n$. Which formula is the best to use as I seem to get confused whether to divide through by $n$ or $n-1$ ?

To be more clear,

Is either of the two below preferred, or do you know of a more accurate equation for estimate of sample variance?

$[(\sum_{i=1}^n x_i^2)/n]$ $-$ $[(\sum_{i=1}^n x_i)/n)^2]$

$[(\sum_{i=1}^n x_i^2)/(n-1)]$ $-$ $[(\sum_{i=1}^n x_i)/(n-1)^2]$

kay
  • 229
  • You don't calculate the variance from sample. You only estimate it. There is no "best" estimator. http://stats.stackexchange.com/questions/17890/what-is-the-difference-between-n-and-n-1-in-calculating-population-variance http://math.stackexchange.com/questions/61251/intuitive-explanation-of-bessels-correction – leonbloy May 30 '13 at 21:01
  • @leonbloy I have tried to edit the question so hopefully you have a better idea of what I am trying ask, thank you for your feedback. – kay May 30 '13 at 21:39
  • I voted to close as duplicated, all you need to know is explained in the links above. BTW you have the terms messed: you "calculate" the "sample variance" in order to "estimate" the "variance". And, I repeat, there is no univocal measure of "accuracy" or "best" for an estimator, each has its advantages. – leonbloy May 30 '13 at 22:38

1 Answers1

0

To estimate the mean you simply use

$$\hat{\mu}=\frac{1}{n}\sum_{i=1}^nx_i$$

Using this estimated mean, you get an unbiased estimator of the variance by computing

$$\hat{\sigma}^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\hat{\mu})^2$$

It can be shown that if you divide by $n$ instead of by $n-1$, the estimator for the variance is biased.

Matt L.
  • 10,636