Explanation behind 'Variance' in statistics

Question

I know there are already some questions asked regarding this, but mine is a little different. I know that variance is calculated to know how spreaded the data is w.r.t mean value.

So calculating variance is equal to calculating average of the differences between values and mean and then dividing by the number of data points we have.

Now, I have 2 questions:

1) Why don't we use absolute values instead of squaring them?(Maybe absolute values are not differentiable or something?)

2) Why do we use 'N-1' instead of 'N' when dividing?

You can use absolute values. It does make things harder though. — Angina Seng, Apr 14 '19 at 07:04
Because if we divide by $N$, the statistic will be biased. And I belive that variance is supposed to come from the 'scalar multiplication' which is covariance. — Jakobian, Apr 14 '19 at 07:05
If we try and calculate the mean square error of an estimator, we will see that it's really a sum of $2$ things, variance and bias. We often want to minimize bias, even at the cost of variance. — Jakobian, Apr 14 '19 at 07:12
@Jakobian Could you explain a bit more? I did not understand what do you mean by biased statistics — Reckoner, Apr 14 '19 at 07:19
@LordSharktheUnknown How would it make things harder? I just took a sample and calculated variance using absolute method. Didn't find it hard. What else do I need to do? — Reckoner, Apr 14 '19 at 07:20
@Reckoner An unbiased statistic is a statistic which bias is equal to $0$. Biased one would be the opposite — Jakobian, Apr 14 '19 at 07:28
@Jakobian Got the terms from you now, will read about these on my own and will ask if I have any questions. — Reckoner, Apr 14 '19 at 07:30
For (1) see https://math.stackexchange.com/questions/717339/why-is-variance-squared while for (2) see https://math.stackexchange.com/questions/707272/why-do-statisticians-like-n-1-instead-of-n — Henry, Jan 08 '20 at 17:22

score 1 · Answer 1 · answered Apr 14 '19 at 07:31

For (1), you will lose certain useful properties. For example, suppose $X$ and $Y$ are two independent random variables, we have $\mathbb{V}ar(X+Y)=\mathbb{V}ar(X)+\mathbb{V}ar(Y)$. However, if we define $V(A)=\mathbb{E}[|A-\mathbb{E}[A]|]$, we have $V(A)+V(B) = \mathbb{E}[|A-\mathbb{E}[A]|] + \mathbb{E}[|B-\mathbb{E}[B]|]=\mathbb{E}[|A-\mathbb{E}[A]|+|B-\mathbb{E}[B]|]$, which cannot be simplified without introducing inequalities.

For (2), suppose you want to estimate the true variance $\sigma^2$ of a model. You collected a set of data, say, $\{X_i\}_{i=1}^n$. To estimate the true variance, you construct a statistic $S=\frac{1}{n-1}\sum_{i=1}^n {{(X_i-\overline{X})}}^2$. A theorem tells you that $\mathbb{E}[S]=\sigma^2$. Hence if you take $n$ instead of $n-1$, $\mathbb{E}[S] \ne \sigma^2$.

Explanation behind 'Variance' in statistics

1 Answers1