How is this formula on Stirling's approximation derived?

Question

The following paragraph (equation 1.41) is from the book "Information theory, Inference, and Learning Algorithms"

I don't quite understand how the first approximation in 1.41 is derived. Can anyone give a help?

It does not make much sense. They seem to use the De Moivre–Laplace theorem, which is more general than $(1.40)$. $(1.40)$ indeed follows quickly from the standard form of Stirling's formula. — Gary, Mar 15 '23 at 14:34
Does this answer your question? Understanding the idea behind Stirling's approximation — tryst with freedom, Jun 01 '23 at 14:01

score 1 · Answer 1 · answered Mar 15 '23 at 15:17

The sum is being approximated by an integral. In particular you need

$$ \sum_{r = -N/2}^{N/2} e^{-r^2/2\sigma^2} \approx \sqrt{2\pi} \sigma. $$

The sum is approximated by the integral

$$ \int_{-N/2}^{N/2} e^{-r^2/2\sigma^2} \: dr. $$

Now, $N/2$ is "large enough" that we can replace it by infinity without changing the integral too much. Why? We have $\sigma = \sqrt{N/4}$ as stated in the text. So follows that the integrand is $e^{-N/2}$ at each of its endpoints and smaller beyond them, and $N$ is large.

So we can replace the bounds on the integral with $\pm \infty$ to ge

$$ \int_{-\infty}^\infty e^{-r^2/2\sigma^2} \: dr$$

Then change variables, letting $u = r/{\sigma \sqrt{2}}$, to get

$$ \sqrt{2} \int_{-\infty}^\infty e^{-u^2} \: du $$

and finally use the fact that that integral is $\sqrt{\pi}$ to get the result. This is called the Gaussian integral and is usually proven by integrating $e^{-(x^2+y^2)}$ over the plane in both rectangular and polar coordinates.

I think the more difficult step is when the sum involving the exponentials is itself introduced. — Gary, Mar 15 '23 at 22:35
I think you're right - and the original post did ask about the "first approximation" — Michael Lugo, Mar 16 '23 at 15:25

score 0 · Answer 2 · answered Mar 15 '23 at 14:30

Here is my interpretation, this is not too formal, take it as an intuition instead.

Let $X_i$ be a sequence of i.i.d. Bernouli random variables with probability $\frac12$. Then, by the CLT, the distribution of $A_N=\frac{X_1+\dots X_N-N/2}{\sqrt{N}/2}$ is well approximated by a $\mathcal N(0,1)$ random variable when $N$ is large enough, let $A$ be that Gaussian RV. Also observe that for $k=-N/2, \dots, N/2$, $\mathbb P[\sqrt N A_N /2=k]={N\choose k+N/2}2^{-N}$, so that \begin{align*} {N\choose k+N/2}2^{-N} &= \mathbb P[\sqrt N A_N/2 = k]\\ &= \mathbb P[\sqrt N A_N/2 \leq k]-\mathbb P[\sqrt N A_N/2 \leq k-1]\\ &\approx \mathbb P[\sqrt N A/2 \leq k]-\mathbb P[\sqrt N A/2 \leq k-1]\\ &=\mathbb P\bigg[\sqrt N A/2 \in [k-1,k]\bigg]\\ &=\int_{\frac{2(k-1)}{\sqrt N}}^\frac{2k}{\sqrt N} \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{x^2}{2} \right) dx\\ &\approx \frac{2k-2(k-1)}{\sqrt N}\cdot \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{4k^2}{2N}\right)\\ &=\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left( -\frac{k^2}{2\sigma^2} \right) \end{align*}

This indeed gives \begin{align*} {N\choose N/2} \approx \frac{1}{\sqrt{2\pi\sigma^2}} 2^N \end{align*}

Now it feels almost like if the author was using this two times and then cancels the sum over $K$ because it is a distribution, but I cannot really know what they thought.

How is this formula on Stirling's approximation derived?

2 Answers2