0

I know there are already some questions asked regarding this, but mine is a little different. I know that variance is calculated to know how spreaded the data is w.r.t mean value.

So calculating variance is equal to calculating average of the differences between values and mean and then dividing by the number of data points we have.

Now, I have 2 questions:

1) Why don't we use absolute values instead of squaring them?(Maybe absolute values are not differentiable or something?)

2) Why do we use 'N-1' instead of 'N' when dividing?

Reckoner
  • 151
  • You can use absolute values. It does make things harder though. – Angina Seng Apr 14 '19 at 07:04
  • Because if we divide by $N$, the statistic will be biased. And I belive that variance is supposed to come from the 'scalar multiplication' which is covariance. – Jakobian Apr 14 '19 at 07:05
  • If we try and calculate the mean square error of an estimator, we will see that it's really a sum of $2$ things, variance and bias. We often want to minimize bias, even at the cost of variance. – Jakobian Apr 14 '19 at 07:12
  • @Jakobian Could you explain a bit more? I did not understand what do you mean by biased statistics – Reckoner Apr 14 '19 at 07:19
  • @LordSharktheUnknown How would it make things harder? I just took a sample and calculated variance using absolute method. Didn't find it hard. What else do I need to do? – Reckoner Apr 14 '19 at 07:20
  • @Reckoner An unbiased statistic is a statistic which bias is equal to $0$. Biased one would be the opposite – Jakobian Apr 14 '19 at 07:28
  • 1
    @Jakobian Got the terms from you now, will read about these on my own and will ask if I have any questions. – Reckoner Apr 14 '19 at 07:30
  • For (1) see https://math.stackexchange.com/questions/717339/why-is-variance-squared while for (2) see https://math.stackexchange.com/questions/707272/why-do-statisticians-like-n-1-instead-of-n – Henry Jan 08 '20 at 17:22

1 Answers1

1

For (1), you will lose certain useful properties. For example, suppose $X$ and $Y$ are two independent random variables, we have $\mathbb{V}ar(X+Y)=\mathbb{V}ar(X)+\mathbb{V}ar(Y)$. However, if we define $V(A)=\mathbb{E}[|A-\mathbb{E}[A]|]$, we have $V(A)+V(B) = \mathbb{E}[|A-\mathbb{E}[A]|] + \mathbb{E}[|B-\mathbb{E}[B]|]=\mathbb{E}[|A-\mathbb{E}[A]|+|B-\mathbb{E}[B]|]$, which cannot be simplified without introducing inequalities.

For (2), suppose you want to estimate the true variance $\sigma^2$ of a model. You collected a set of data, say, $\{X_i\}_{i=1}^n$. To estimate the true variance, you construct a statistic $S=\frac{1}{n-1}\sum_{i=1}^n {{(X_i-\overline{X})}}^2$. A theorem tells you that $\mathbb{E}[S]=\sigma^2$. Hence if you take $n$ instead of $n-1$, $\mathbb{E}[S] \ne \sigma^2$.