In school, we are taught the following: Suppose you collect some data $X : x_1, x_2, ... x_n$. Regardless of the underlying distribution of $X$, you can always quantify the Standard Deviation of $X$ as:
$$ SD(X) = \sqrt{\frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n}}$$
I am interested in learning about where this formula comes from. Informally, I can infer that the above formula is a function of how much each $x_i$ differs from the average - but I am interested in learning about the mathematical justifications as to why in theory, this formula can be applied to any data regardless of the underlying probability distribution.
By doing some reading on this topic, here is what I have come up with so far:
- Suppose you have a Random Variable $X$ with a probability distribution $f(x)$. Suppose you also have a finite set of random observations from this same $f(x), X : x_1, x_2, ... x_n$. I am guessing (???) that regardless of what $f(x)$ is, the following statement is true:
$$ E(X) = \int x * f(x) dx \approx \frac{\sum x_i}{n}$$
- And by extension - suppose you have a Random Variable $X$ with a probability distribution $f(x)$. Suppose you also have a finite set of random observations from this same $f(x), X : x_1, x_2, ... x_n$. I am guessing (???) that regardless of what $f(x)$ is, the following statement is true:
$$ E(X^2) = \int x^2 * f(x) dx \approx \frac{\sum x_i ^2 }{n}$$
As I understand, these are useful results because they allow you to approximate a function of $f(x)$ without explicitly knowing what $f(x)$ is.
Using some basic algebra, I can see that:
$$ E(X^2) - (E(X))^2 = \frac{\sum x_i ^2 }{n} - \left[\frac{\sum x_i}{n}\right]^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n}$$
And $ E(X^2) - (E(X))^2$ is how we define the Variance of $X$ denoted by $Var(X)$ : And thus we know the Standard Deviation of $X$ - without relying on the underlying probability distribution of $X$ i.e. $f(x)$
Can someone please tell me if my analysis is correct?
Thanks!
PS: Can we use Law of Large Numbers to Prove that:
$$ \frac{1}{n} \sum_{i=1}^n x_i^k \xrightarrow{} \int_{\mathbb R} x^k \cdot f(x) \mathrm{d}x $$