0

I'm following a proof of simple linear regression as detailed in Chapter 24 of The Probability Lifesaver by Stephen J Miller. There's one step involving the variance that the author explains as "simple algebra" that I can't fill in the blanks for:

$$ = N^2 (\frac{1}{N}\sum_{n=1}^N x^2_n - \bar{x}^2) $$ $$ = N^2 (\frac{1}{N}\sum_{n=1}^N(x_n - \bar{x})^2) $$

It seems clear that this is an application of the theorem:

$$Var(X) = E[(X-E[X])^2] = E[X^2] - E[X]$$

However I'm lost as to how to demonstrate that using purely algebra and the properties of summations as the author suggests. I tried looking at this answer for clarity but didn't find much.

1 Answers1

1

We have $$ \frac{1}{N} \sum_{n=1}^N (x_n-\bar{x})^2 = \frac{1}{N} \sum_{n=1}^N (x_n^2 -2 \bar{x} x_n + \bar{x}^2 \\ = \frac{1}{N}\sum_{n=1}^N x_n^2 - \frac{2}{N}\bar{x} \left( \sum_{n=1}^N x_n \right) + \bar{x}^2 , $$ since $ \frac{1}{N} \sum_n a = a $ for $a$ a constant. The key thing now is to recognise the definition of $\bar{x}$, namely $$ \bar{x} = \frac{1}{N} \sum_{n=1}^N x_n . $$ But that means that the middle term is actually $-2\bar{x}^2$. Hence the sum becomes $$ \left( \frac{1}{N}\sum_{n=1}^N x_n^2 \right) - \bar{x}^2 = \frac{1}{N}\sum_{n=1}^N (x_n^2 - \bar{x}^2 ) $$ as required.

Chappers
  • 67,606