0

I am trying to learn how to derive the formula for the "distribution of the sample variance" from first principles - regardless of the underlying probability distribution for $X$.

Part 1: In general, for some random variable $X$, we can write the variance of $X$ as:

$$Var(X) = E(X^2) - [E(X)]^2$$

where:

$$E(X) = \int x f(x) dx$$

Part 2: In general, we can write the formula for the "sample variance" for any random variable $X$ as:

$$S^2_x = \frac{\sum (X_i - \bar{x})^2}{n-1}$$

This means that we are now required to determine:

$$Var(S^2_x) = E(S^2_x) - [E(S^2_x)]^2$$

Part 3: I started evaluating $[E(S^2_x)]^2$

First, I tried to re-arrange this expression prior to taking the Expected Value:

$$S^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} = \frac{1}{n-1} \left( \sum x_i^2 + n\bar{x}^2 - 2\bar{x}\sum x_i \right) = \frac{1}{n-1} \left( \sum x_i^2 + n\bar{x}^2 - 2n\bar{x}^2 \right) = \frac{1}{n-1} \left( \sum x_i^2 - n\bar{x}^2 \right)$$

Then, if we remember the following relationships:

$$\begin{align*} \operatorname{Var}(X_i) &= \mathbb{E}(X_i^2) - (\mathbb{E}(X_i))^2 \\ \sigma^2 &= \mathbb{E}(x_i^2) - \mu^2 \\ \mathbb{E}(X_i^2) &= \sigma^2 + \mu^2 \end{align*}$$

$$\begin{align*} \operatorname{Var}(\bar{x}) &= \mathbb{E}(\bar{x}^2) - (\mathbb{E}(\bar{x}))^2 \\ \frac{\sigma^2}{n} &= \mathbb{E}(x^2) - \mu^2 \\ \mathbb{E}(\bar{x}^2) &= \frac{\sigma^2}{n} + \mu^2 \end{align*}$$

We can resume taking the Expected Value of $S^2$:

$$\begin{align*} \mathbb{E}(S^2) &= \frac{1}{n-1} \left( \mathbb{E}\left(\sum X_i^2\right) - n\mathbb{E}(\bar{x}^2) \right) \\ &= \frac{1}{n-1} \left( n(\sigma^2 + \mu^2) - n \left(\frac{\sigma^2}{n} + \mu^2\right) \right) \\ &= \frac{1}{n-1} (\sigma^2(n-1)) \\ &= \sigma^2 \end{align*}$$

Note that this is a well-known fact in Probability Theory : $S^2$ is an unbiased estimator of $\sigma^2$ , i.e. $E(S^2) = \sigma^2$.

I know I can expand this as:

Part 4: Now, I need to take $[E(S^2_x)]^2$ - this is where I get stuck:

$$\begin{align*} (S^2_x)^2 &= \left(\frac{\sum (X_i - \bar{x})^2}{n-1}\right)^2 \\ &= \frac{1}{(n-1)^2} \left( \sum X_i^2 + n\bar{x}^2 - 2\bar{x}\sum X_i \right)^2\end{align*}$$

However, I am not sure how to take the Expected Values of the terms in the above expression.

Can someone please help me continue this derivation?

Thanks!

References:

  • I found this link here Variance of sample variance? but the answers provided here do not show me how to proceed from the approach I am currently using
stats_noob
  • 3,112
  • 4
  • 10
  • 36
  • 1
    Probably a dupe; see here: https://math.stackexchange.com/questions/72975/variance-of-sample-variance – nicola Jun 06 '23 at 06:27
  • @ Nicola: thank you for your reply! I consulted that question that you linked. While similar content is discussed there - I don't think the approach I am using is shown. Thank you so much! – stats_noob Jun 06 '23 at 07:35
  • I think you are making some confusion with the exponents. For example: $Var(S^2) = E\left[{(S^2)}^2\right] + \left(E\left[{S^2}\right]\right)^2$ and not as you write in part 2. Try to use another symbol to the sample variance (e.g. $V$). – Renato Fernandes Jun 16 '23 at 22:27
  • A hint: use the following equivalence $\sum(X_i - \bar{x})^2 = \sum\left(X_i - \mu - (\bar{x} - \mu)\right)^2$ and expand the RHS of the expression. Please notice that $(\bar{x} - \mu)$ is a constant in the summation. – Renato Fernandes Jun 16 '23 at 22:47
  • @ Renato Fernandes: Thank you for your comments and pointing this out! If you have time, can you please show me how to proceed? – stats_noob Jun 17 '23 at 17:55

1 Answers1

1

This is a short algebraic domain transformation

$$Y = (X - Ex[X])^2$$ $$ Pr[Y < y] = Pr[(X - Ex[X])^2 < y] == Pr[-\sqrt y < X - Ex[X] < \sqrt y ] == Pr[-\sqrt y + Ex < X < \sqrt y + Ex]$$

$$Ex[Y]=\int_{-\sqrt y+Ex}^{\sqrt y+Ex} \ (x-Ex)^2 \ dPr[X<x]$$

etc.

Roland F
  • 2,098
  • 1
    @ Roland F: thank you for your answer! Can you please show me how this logic can be applied to my question? Thanks! – stats_noob Jun 06 '23 at 14:29