4

What are the prerequisites to prove the central limit theorem? In my statistics textbook it is stated without a complete proof, so I guess I need more than calculus. However, do I need more than undergraduate real and complex analysis?

In what book can I find a complete proof of the central limit theorem?

Avatrin
  • 1,527

2 Answers2

9

The proofs of simple versions of the central limit theorem (for instance, for a sample that's drawn iid from some distribution) use techniques involving characteristic functions or moment generating functions, that can be shown using undergraduate real analysis. These can be found in most books (for instance, the book Statistical Inference by Casella and Berger), or on wikipedia. I'll do my best to adapt an argument below. For instance, consider the following central limit theorem, which can be proved using facts from undergraduate real analysis:

Theorem (CLT): Let $X_1, X_2,\ldots$ be a sequence of iid random variables whose moment generating functions exist. Let $E(X_i)=\mu$ and $Var(X_i) =\sigma^2$. Define $\bar{X}_n=(1/n) \sum_{i=1}^n X_i$. Then the random variables defined by $\sqrt{n}(\bar{X}_n-\mu)/\sigma$ converge in distribution to $N(0,1)$.

The proof of this relies on the properties of moment generating functions, $M_X(t)=E(e^{tX}).$ These satisfy three properties that are important:

First: $$\frac{d^n}{dt^n}M_X(0)=E(X^n),$$ which can be seen by differentiating the Taylor expansion for $M_X$ $$M_X=1+tE(X)+\frac{t^2 E(X^2)}{2!}+\ldots.$$

Second: If $M_{X_n}(t) \to M_X(t)$, then $X_n$ converges in distribution to $X$. Proving this is the most complicated and technical part of the argument for the above theorem. It can be proved with facts from undergraduate real analysis, but the proof is fairly technical.

Third: If two random variables $X_1$, $X_2$ are independent, then the moment generating function for the $X_1+X_2$ is equal to the product of the moment generating functions for $X_1$ and $X_2$. This follows directly from independence and the definition of the MGF

Once we have these facts, we just need to show that the moment generating functions of $Y_n$ converge to the moment generating function of a $N(0,1)$ random variable. Let $Y_i=(X_i-\mu)/\sigma$. We are trying to find the limiting distribution of $\sum_{i=1}^n Y_i/\sqrt{n}$ $$\begin{align*} &M_{\sum_{i=1}^n Y_i/\sqrt{n}} (t)\\ &=M_{\sum Y_i} (t/\sqrt{n})\\ &=M_{Y_1}(t/\sqrt{n})^n\end{align*}$$ Taking the taylor expansion of this, and then taking the limit as $ n\to \infty$ gives us $$\lim_{n\to \infty} M_{Y_1}(t/\sqrt{n})^n=\lim_{n \to \infty} [1+\frac{t^2}{2n}+o(t^2/n)]^n \to e^{t^2/2}.$$ and since $e^t=\lim[1+t/n]^n$ and the error goes to $0$ as $n \to \infty$.

With a small amount of complex analysis, we can get rid of the requirement that moment generating functions exist by using characteristic functions instead, which don't have the non-existence issues moment generating functions have. There are many other central limit theorems, that apply in more general settings and require more complicated techniques that are probably a bit beyond undergraduate analysis.

Rina
  • 336
  • 1
  • 4
  • Can the above techniques be used to prove any distribution tends to N(0,1) –  May 19 '18 at 15:07
  • 1
    The theorem is not true for ANY distribution. The most typical counter-example is the Cauchy distribution. – olaphus Mar 28 '22 at 22:00
0

[This is a very heuristic explanation of CLT. Taken from Lec-10 of Brad Osgood’s lectures here]

Let random variables ${ X _1, X _2, \ldots }$ be i.i.d with mean ${ \mu < \infty }$ and variance ${ \sigma ^2 < \infty }.$ Say ${ f(x) }$ is the common density.

We are interested in how the sums ${ S _n = X _1 + \ldots + X _n }$ are approximately distributed.
Note ${ S _n }$ has mean ${ n \mu }$ and variance ${ n \sigma ^2 }$ (which go to ${ \infty }$ as ${ n \to \infty }$), so to study convergence consider normalised sums ${ \frac{S _n - n\mu}{\sqrt{n} \sigma} }$ having mean ${ 0 }$ and variance ${ 1 }.$ It suffices to approximate the density of ${ \frac{S _n - n\mu}{\sqrt{n} \sigma} }.$

The normalised sums are $${ \frac{S _n - n \mu}{\sqrt{n} \sigma } = \frac{1}{\sqrt{n}} \sum _{k=1} ^{n} \left( \frac{X _i - \mu}{\sigma} \right) = \frac{1}{\sqrt{n}} \sum _{i=1} ^{n} Z _i ,}$$ and ${ Z _i = \frac{X _i - \mu}{\sigma} }$ have mean ${ 0 },$ variance ${ 1 },$ and density ${ g(x) = \sigma f(\mu + \sigma x) }.$

Note density of ${ \sum _{i=1} ^{n} Z _i }$ is the convolution ${ g ^{*n} (x) }.$ So the density ${ f _n (x) }$ of ${ \frac{S _n - n\mu}{\sqrt{n} \sigma} = \frac{1}{\sqrt{n}} \sum _{i=1} ^{n} Z _i }$ is $${ f _n (x) = \sqrt{n} g ^{*n} (\sqrt{n} x) }.$$ Taking Fourier transform, $${ (\mathcal{F} f _n) (s) = \sqrt{n} \frac{1}{\sqrt{n}} (\mathcal{F} g ^{*n}) \left(\frac{s}{\sqrt{n}}\right) }$$ that is $${ F _n (s) = \left( G \left(\frac{s}{\sqrt{n}}\right) \right) ^n . }$$ Hence Fourier transform $${ \begin{align*} F _n (s) &= \left( \int _{-\infty} ^{\infty} e ^{- 2 \pi i (\frac{s}{\sqrt{n}}) x} g(x) \, dx \right) ^{n} \\ &\approx \left( \int _{-\infty} ^{\infty} \left( 1 - \frac{2 \pi i s x}{\sqrt{n}} - \frac{4 \pi ^2 s ^2 x ^2}{2 n} \right) g(x) \, dx \right) ^n \\ &= \left( 1 - \frac{2 \pi ^2 s ^2}{n} \right) ^{n} \\ &\approx e ^{- 2 \pi ^2 s ^2 } . \end{align*} }$$

We used that ${ Z _i }$s have mean ${ 0 }$ and variance ${ 1, }$ to have ${ \int _{-\infty} ^{\infty} x g(x) \, dx = 0 }$ and ${ \int _{-\infty} ^{\infty} x ^2 g(x) \, dx = 1 }.$

Hence required density ${ f _n (x) }$ is inverse Fourier transform $${ f _n (x) \approx \int _{-\infty} ^{\infty} e ^{2 \pi i s x} e ^{- 2 \pi ^2 s ^2} \, ds .}$$


Computing the Inverse Fourier transform:

Differentiating $${ f _n (x) \approx \int _{-\infty} ^{\infty} e ^{2 \pi i s x} e ^{- 2 \pi ^2 s ^2} \, ds }$$ (note RHS is independent of ${ n }$) gives $${ \begin{align*} f _n ^{'} (x) &= \int _{-\infty} ^{\infty} (2 \pi i s) e ^{2 \pi i s x} e ^{- 2 \pi ^2 s ^2} \, ds \\ &= 2 \pi i \left( e ^{2 \pi i s x} \left(\frac{e ^{- 2 \pi ^2 s ^2}}{-4 \pi ^2}\right) \biggr\vert _{-\infty} ^{\infty} + \frac{1}{4 \pi ^2} \int _{-\infty} ^{\infty} e ^{- 2 \pi ^2 s ^2} (2 \pi i x) e ^{2 \pi i s x} \, ds \right) \\ &= 2 \pi i \left( \frac{ix}{2\pi} f _n (x) \right) \\ &= - x f _n (x) \end{align*} }$$ giving $${ f _n (x) = A e ^{- \frac{x ^2}{2}} }.$$ Since ${ f _n (x) }$ is a density, $${ f _n (x) = \frac{1}{\sqrt{2 \pi}} e ^{-\frac{x ^2}{2}}. }$$ So as ${ n \to \infty },$ the density of normalised sums ${ \frac{S _n - n \mu}{\sqrt{n}\sigma} }$ is roughly ${ \frac{1}{\sqrt{2 \pi}} e ^{-\frac{x ^2}{2}} }.$