Basic Question
Is there an intuitive explanation of standard deviation in terms of Euclidean distance in $n$ dimensional space?
Longer Version of Question
To begin a more detailed sketch of my question, for simplicity let's just focus on the simple case of a discrete random variable that is uniformly distributed. In this case, the variance is given by the following formula, which I've abducted straight from Wikipedia:
$$ \frac1{n}\sum_{i =1}^n (x_i - \mu)^2$$
where $\mu$ is the mean. The standard deviation is then the square root of this. Now, I can't help noticing that the square root of the sum returns the euclidean distance from the vector $X = (x_1, x_2, \dots, x_n)$ to the vector $\vec \mu = (\mu, \mu, \dots, \mu)$. That is, the standard deviation can be expressed as:
$$ \frac1{\sqrt{n}}|X - \vec \mu |$$
So I wonder, is there any significant conceptual relationship between this distance $|X - \vec \mu |$ and standard devation or is this just a coincidence?
Even More Details...
I have looked up many explanations of standard deviation and its cousin variance. Here are some that I've seen already, each sort of following from the previous one:
- We square the values before summing to get rid of the sign, which is obviously not important. This explanation is often criticised by hardcore statisticians and I can sort of see why: it doesn't explain why squaring beats taking the absolute value.
- We square the values so that we pay a greater price for greater deviations. This explains why squaring beats taking absolute values. But why not raise to the power of $4$, or $6$, or any other even power before summing? What is so special about $2$?
- The thing that is so special about $2$ is that it's the second moment of intertia, whereas the mean is the first moment, so mechanically it makes sense. I don't follow this. My intuition is totally OK with the mean: the point where, if I put my finger, the weights on either side will balance. But the second moment is harder for me to imagine physically like this.
Note, this is a question about intuition. I "understand" the mathematical formula at a shallow level: what all its terms mean, how to calculate it given a dataset. But I am not comfortable with my grasp on why this formula is "the best" one to use in so many applications e.g. the least squares method to fit data. I'm particularly confused as to why squaring is chosen as opposed to raising to some other even power e.g. $9234324$.
And this is where my intuition steps in and tries to provide an explanation that goes right back to the fundamental theorem of Pythagoras: euclidean distance. Here is my thought process: "The number $2$ is special. It's the unique power that makes Euclidean distance work. So maybe it's also the unique number that makes variance work." But then why the multiplying factor of $\frac1{\sqrt{n}}$? Is it just simply a case of: swallow it up and accept the definition, or can this intuition be resolved somehow?