Mean of squared "sum of squared errors"

Question

Let $X_i, i=1,\dots,n$ are independent with mean $\mu$ and variance $\sigma^2$. For example they are iid normal distributed.

I wonder how to calculuate $ \mathrm{E} \{[\sum_i (X_i - \bar{X})^2]^2\} $ ?

My first try:

$$ \mathrm{E} \{[\sum_i (X_i - \bar{X})^2]^2\} = \mathrm{Var} \{[\sum_i (X_i - \bar{X})^2]^2\} + [\mathrm{E} \sum_i (X_i - \bar{X})^2]^2 $$

I know $ \mathrm{E} [\sum_i (X_i - \bar{X})^2] = (n-1) \sigma^2. $ So I am tempted to find $\mathrm{Var} \{[\sum_i (X_i - \bar{X})^2]^2\}$, which seems to me harder than the original problem.

My second try: $$ \mathrm{E} [\sum_i (X_i - \bar{X})^2]^2\ = \mathrm{E} \{\sum_i [X_i^2 - 2 X_i \bar{X} + \bar{X}^2]\}^2 = \mathrm{E} [\sum_i (X_i^2) - n \bar{X}^2]^2 $$ Unsure how to proceed.

Thanks.

If you write down the exact definition of $\bar{X}$ it's not so bad after expanding out the product carefully. Also you probably meant to write covariance, not variance (and adjust the product accordingly). — Alex R., Sep 09 '14 at 16:19
Thanks, I was also stuck at the expanded expression before I tried the covariance. — Jonas, Sep 09 '14 at 16:20
I'd have mentioned independence if that was intended, and if it wasn't, then that's also a huge omission. — Michael Hardy, Sep 09 '14 at 18:05

wolfies · Answer 1 · 2017-06-25T17:12:45.263

1

This is known as a moment of moment problem ... it is often convenient to express such problems in power sum notation namely: $s_r = \sum_{i=1}^n X_i^r$. For your problem, express:

$$p = \sum_{i=1}^n (X_i - \bar{X})^2 = s_2 - \frac{s_1^2}{n}$$

Then, you seek $E[p^2]$, which is simply the $1^{st}$ RawMoment of $p^2$:

where:

RawMomentToCentral is a function from the mathStatica package for Mathematica, and
$\mu_i$ denotes the $i^{th}$ central moment of the population of $X$.

Note that the solution obtained is completely general and valid for any distribution whose moments exist ... not just for the Normal case.

For the Normal case:

For you specific case, i.e. with a $N(\mu, \sigma^2)$ parent, $\mu_2 = \sigma^2$ and $\mu_4 = 3 \sigma^4$, so the general solution simplifies to:

$$\sigma^4 (n^2 - 1)$$

Notes

As disclosure, I should add that I am one of the authors of the software used above.
Jonas asks (comments below): "What is the theorem that the software is based on?"

The theorem is known as the fundamental expectation result, which expresses the expectation of an augmented symmetric function in terms of moments of the parent population. For more detail, see Stuart and Ord (1994, Section (12.5)), or see Chapter 7 of our Springer book: Rose and Smith (2002, section 7.4) ... a free download of which is available here:

http://www.mathstatica.com/book/bookcontents.html

It is possible to do such calculations by hand ... but it rapidly gets extremely tedious, and such problems are far more easily solved with computer. In fact, we found a number of errors in the tables in Stuart and Ord using the software, as well as errors in solutions derived by Fisher (the famous one).

edited Jun 25 '17 at 17:12

answered Sep 09 '14 at 17:14

wolfies

5,174

Thanks. What is the theorem that the software is based on? Is the derivation of the theorem simple? – Jonas Sep 09 '14 at 17:43
References are also appreciated. – Jonas Sep 09 '14 at 17:59
Nice question: in reply, please see note added above (in body) – wolfies Sep 09 '14 at 18:02
This answer seems to ignore the simple geometry involved. – Michael Hardy Sep 09 '14 at 18:17
It's a general approach to any such question: not just the simple case considered here, nor the special assumption of Normality. – wolfies Sep 09 '14 at 18:21
@MichaelHardy: as wolfies says, it seems the problem is distriubtion free. – Jonas Sep 09 '14 at 18:27
wolfies: I am still looking for an analytical derivation for my problem. Thanks. – Jonas Sep 09 '14 at 18:27
1

Jonas, the very first comment to your question indicates how such derivations can be done. Your example, when fully expanded, is a quartic form in the data and therefore the expectation (because it's a linear operator) becomes a homogeneous polynomial (in a suitable sense) of the first four moments of $X_1$. Mathematica merely is doing that routine algebra under the hood. An algebraic theory has been developed; it is explained in great detail in Kendall & Stuart (5th Ed.). – whuber Sep 09 '14 at 19:33
@whuber: I can't find that book. Can you post a solution for that? Thanks. – Jonas Sep 09 '14 at 20:00
See Advanced Theory of Statistics, Volume I, chapters 3 ("Moments and Cumulants") and 12 ("Cumulants of Sampling Distributions--(2)"). – whuber Sep 09 '14 at 20:10
If the answer doesn't depend on the distribution other than through the first two moments, then it should be the same as the answer with a normal distribution wih those moments, and then you can use geometry including orthogonal projections. However, I've deleted my answer for now. I may do some further edits and then reinstate it. – Michael Hardy Sep 09 '14 at 20:26
@whuber: I only find a book of the second edition 1945 by Kendall himself. It has Ch3 for Moments and Cumulants, and there is no chapter for "Cumulants of Sampling Distributions--(2)". I quickly look through ch3, and didn't find where estimating moments of sample moments is mentioned, but I often miss things. Where in the book do you find it? – Jonas Sep 09 '14 at 21:28
1

@MichaelHardy: In your original post, why is the dof $n-1$ not $n$? – Jonas Sep 09 '14 at 21:45
@Jonas : Could you post a separate question on that with a comment below it calling it to my attention? – Michael Hardy Sep 09 '14 at 23:59
@Jonas: The results are $(1/\sigma^2)\sum_{i=1}^n(X_i-\mu)^2\sim\chi^2_n$ and $(1/\sigma^2)\sum_{i=1}^n(X_i-\bar X)^2\sim\chi^2_{n-1}$. Can you see why you would expect the latter sum of squares to be smaller than the former (except when $\bar X=\mu)\text{ ?}$ ${}\qquad{}$ – Michael Hardy Sep 10 '14 at 00:00
@Michael Do you mean the latter sum of squares is bigger than the former instead, because the latter is "variance + squared bias", while the former is just variance? – Jonas Sep 10 '14 at 01:16
No. The former is bigger. – Michael Hardy Sep 10 '14 at 01:18
@Jonas: I wrote this earlier answer about this: http://math.stackexchange.com/questions/61251/intuitive-explanation-of-bessels-correction/61409#61409 – Michael Hardy Sep 10 '14 at 01:53
@Jonas : I am somewhat baffled as to what would make you think the latter sum is a variance plus the square of a bias. Normally such things would be functions of $\mu$ and $\sigma$, i.e. of the parameters that index this family of probability distributions, and not functions of the random variables. – Michael Hardy Sep 10 '14 at 01:55

Mean of squared "sum of squared errors"

1 Answers1