Why square the result of $x_1 - \bar{x}$ in the standard deviation?

Question

I don't understand the necessity of square the result of $x_1 - \bar{x}$ in $$\sqrt{\frac{\sum_{i=1}^{N} (x_i - \bar{x})^2}{N-1}}$$. In fact I don't understand even why is $N - 1$ on the denominator instead of just $N$. Someone could explain it or recommend a good text about it? All books about Errors Theory or even Statistics that I found are either too much abstract or too much simplist. Thanks in advance.

There are arguments for $N$ instead of $N-1$. The estimator (for variance, but not for standard deviation!) is unbiased if we use $N-1$. The estimator for the variance that we get by using $N$ is biased, but in some ways better! It is on average wrong (but not by much if $N$ is large) but on average closer to the truth than the estimator based on $N-1$. Complicated. — André Nicolas, Mar 07 '14 at 18:41
To understand an estimator of the variance , you first want to understand the variance. It's not clear which is your case. Given that you ask about "the square", I'd say this has already been aswered: http://math.stackexchange.com/questions/460940/why-the-definition-of-variance-is-such — leonbloy, Mar 07 '14 at 20:43

score 4 · Answer 1 · answered Mar 07 '14 at 18:38

The square is used to remove the effect of the sign of $x_i - \overline{x}$. Suppose your mean was 0, and you had measurements at -2 and +2. These would cancel, but squaring gets rid of that issue.

Now, you might ask, "why not use absolute value?" Great question! The reason is that if we use absolute value, variances are no longer additive. In other words, with this definition, we have $\textrm{Var}(x_1 + x_2 + \cdots + x_m) = \textrm{Var}(x_1) + \textrm{Var}(x_2) + \cdots + \textrm{Var}(x_m)$.

As far as the $n-1$ term... it has to do with the fact that with $n$ data points, we get $n-1$ degrees of freedom. Dividing by $n-1$ rather than $n$ reduces bias.

score 1 · Answer 2 · answered Mar 07 '14 at 18:36

It is necessary to square the deviations from the mean as you want to measure both positive as well as negative deviations (note that $\sum (x_i - \bar{x})$ is just zero). Another possibility is to take absolute values, but the above formula turns out to have nicer properties (such as additivity of variances, as pointed out by Arkamis).

Regarding the $N-1$ in the denominator: you would underestimate the standard deviation when dividing by $N$ since the true mean is not as close to $x_1$, $\ldots$, $x_n$ as the sample mean $\bar{x}$ is (in fact, $\bar{x}$ is calculated to be ''as close'' to the data points as possible). That the $N$ has to be replaced by $N-1$ can be derived by working out the expected value of your formula for the variance (the square of the SD). It turns out that the expected value equals the population variance, i.e. the sample variance is an unbiased estimator of the true variance.

score 1 · Accepted Answer · answered Mar 07 '14 at 18:41

Squaring $x_i-\bar x$:

If we didn't square it, we would just be adding up $x_i - \bar x$, and that will always give us zero. What we want instead is to total "how far" each $x_i$ is from $\bar x$.

So, we need to make sure we're taking the average of some positive quantity representing how far $x_i$ is from $\bar x$; one good choice is $(x_i - \bar x)^2$. Another example is $|x_i - \bar x|$, which leads to the average absolute deviation. It ends up that standard deviation tends to be "nicer" for most uses, though both results are a measurement of how "spread out" your data is.

Using $N-1$:

Using the $N-1$ instead of $N$ is called Bessel's correction; there are a few proofs on the Wikipedia page I've linked as to why you need the $N-1$ in order to get a better estimate of the population standard deviation.

Thanks you and everyone! Now I got it. And I'm sorry for the duplicate question, but I did not really know what terms to use to find a question similar to mine. — thiago, Mar 08 '14 at 17:43

score 1 · Answer 4 · edited Apr 13 '17 at 12:21

Squaring the Deviations

The variance of a sample measures the spread of the values in a sample or distribution. We could do this with any function of $|x_k-\bar{x}|$. The reason that we use $(x_k-\bar{x})^2$ is because the variance computed this way has very nice properties. Here are a couple:

$1$. The variance of the sum of independent variables is the sum of their variances.

Since $x_i$ and $y_j$ are independent, their probabilities multiply. Therefore, $$ \begin{align} \hspace{-1cm}\mathrm{Var}(X+Y) &=\sum_{i=1}^n\sum_{j=1}^m\Big[(x_i+y_j)-(\bar{x}+\bar{y})\Big]^2p_iq_j\\ &=\sum_{k=1}^n(x_i-\bar{x})^2p_i+\sum_{j=1}^m(y_j-\bar{y})^2q_j+2\sum_{i=1}^n(x_i-\bar{x})p_i\sum_{j=1}^m(y_j-\bar{y})q_j\\ &=\sum_{k=1}^n(x_i-\bar{x})^2p_i+\sum_{j=1}^m(y_j-\bar{y})^2q_j\\ &=\mathrm{Var}(X)+\mathrm{Var}(Y)\tag{1} \end{align} $$

$2$. The mean is the point from which the mean square variance is minimized: $$ \begin{align} \sum_{i=1}^n(x_i-a)^2p_i &=\sum_{i=1}^n(x_i^2-2ax_i+a^2)p_i\\ &=\sum_{i=1}^n\left(x_i^2-2\bar{x}x_i+\bar{x}^2+(\bar{x}-a)(2x_i-\bar{x}-a)\right)p_i\\ &=\left(\sum_{i=1}^n(x_i-\bar{x})^2p_i\right)+(\bar{x}-a)^2\tag{2} \end{align} $$ Dividing by $\mathbf{n-1}$

Considering $(2)$, it can be seen that the mean square of a sample measured from the mean of the sample will be smaller than the mean square of the sample measured from the mean of the distribution. In this answer, this idea is quantified to show that $$ \mathrm{E}[v_s]=\frac{n{-}1}{n}v_d\tag{3} $$ where $\mathrm{E}[v_s]$ is the expected value of the sample variance and $v_d$ is the distribution variance. $(3)$ explains why we estimate the distribution variance as $$ v_d=\frac1{n-1}\sum_{i=1}^n(x_i-\bar{x})^2\tag{4} $$ where $\bar{x}$ is the sample mean.

score 0 · Answer 5 · answered Mar 07 '14 at 18:41

Hint: You can measure the spread by Minimum Absolute Deviation |(x-xbar)| or the squared differences. Take an example of the sequence, -3,0,3. Xbar for this is 0. Assume that you don't measure the Absolute deviation, then the spread is 0 if you just took x-xbar. The mean squared will avoid this situation and give you an objective measure of spread (to bring the unit of spread to the original measure, you take the square root of it. As far as dividing by N-1, N-1 is the number of observations minus the the degrees of freedom(measure of (number of estimators)). Here you are calculating the x-bar which is an estimator and hence subtract one.

Why square the result of $x_1 - \bar{x}$ in the standard deviation?

5 Answers5