Why is variance defined as $\sum\limits_{n} |\mu -x_i|^2$ and not $\sum\limits_{n} |\mu -x_i|$?

Question

If we wanted to measure how much the values $x_1, \ldots ,x_n$ of a sample differ from the mean $\mu$, it seems more intuitive to me to use the formula $$\frac{\sum\limits_{n} |\mu -x_i|}{n}$$ instead of the formula for variance. I've read about some geometric interpretations of variance as well as standard deviation, yet this just seems to push the questions further back, as we could ask what reason do we have to care more about the distance between the vectors $(x_1,\ldots x_n)$ and $(\mu ,\ldots ,\mu)$ as opposed to just the average distance between a possible value $x_0$ and $\mu$.

Some explanations of the variance formula point to the fact that variance pays more attention to values further apart from the mean, but two immediate questions come to mind: Why should we give more importance to values farther apart from the mean? And why should we do so by squaring the respective distances instead of, say, cubing them?

You are free to take absolute difference. There are many situations in which it is preferable to variance (or standard deviation). Variance is far better analytically (as the absolute value function has no derivative at $0$) and sometimes that is important. See, for instance, this question — lulu, Jun 05 '20 at 23:29
It is mostly because it is more convenient mathematically ($L^2$ is a good space), understand it leads to simpler formulas when dealing with random variables in general. For instance the variance of a sum is the sum of variances, but there are others. — zwim, Jun 05 '20 at 23:31
@zwim sum of independent random variables (more generally uncorrelated), the variance of $2X$ is $4V(X)$ — reuns, Jun 06 '20 at 00:07
The given expression $\sum_n |\mu - x_i|$ has an indexing error with $i$ and $n$. Deviation from $\mu$ in the senses of mean absolute, mean square, mean absolute cube, mean absolute square root are: $$ \frac{1}{n}\sum_{i=1}^n |\mu-x_i| , \quad \frac{1}{n}\sum_{i=1}^n |\mu-x_i|^2 , \quad \frac{1}{n}\sum_{i=1}^n |\mu-x_i|^3 , \quad \frac{1}{n}\sum_{i=1}^n |\mu-x_i|^{1/2}$$ For the mean-square sense we can remove absolute values because $|\mu-x_i|^2 = (\mu-x_i)^2$, which gives us a nice-and-smooth function that is often easier to work with, as the Lulu comment describes. — Michael, Jun 06 '20 at 06:34
See here https://math.stackexchange.com/questions/3645198/query-on-the-standard-deviation-formula/3645250#3645250 how Gauss itself argues about that question. — Michael Hoppe, Jun 06 '20 at 10:10
See here https://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia — leonbloy, Jun 07 '20 at 03:00

score 3 · Answer 1 · answered Jun 06 '20 at 04:23

Although both are measures of dispersion, the use of one over the other often boils down to statistical inference as well as level of difficulty in solving decision problems.

If one uses $g(a)=E|X-a|$ to infer a parameter of a random variable $X$, then $g$ is minimized by the median; where as if one uses $h(a)=E[(X-a)^2]$, the minimizer is attained at the mean.
Computationally, $h$ is easier to use in optimization problems as one can use differentiation methods. $g$ is more complicated in optimization problems.

With computer power nowadays, one can handle both to obtained average estimates as well as median estimates.

Why is variance defined as $\sum\limits_{n} |\mu -x_i|^2$ and not $\sum\limits_{n} |\mu -x_i|$?

1 Answers1