26

How do we get the functional form for the entropy of a binomial distribution? Do we use Stirling's approximation?

According to Wikipedia, the entropy is:

$$\frac1 2 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)$$

As of now, my every attempt has been futile so I would be extremely appreciative if someone could guide me or provide some hints for the computation.

FAM
  • 285
user844541
  • 1,573
  • 3
  • 14
  • 28
  • 3
    A comment: the entropy of the normal distribution with variance $\sigma^2$ is ${1 \over 2} \log (2\pi e \sigma^2)$, which can be computed by a fairly straightforward integration. Perhaps using Stirling's approximation you can reduce the computation of the entropy of the binomial to this same integral plus some error terms. (I haven't actually tried to do this.) – Michael Lugo Nov 25 '12 at 19:51

1 Answers1

22

This answer follows roughly the suggestion of @MichaelLugo in the comments.

We are interested in the sum $$H = -\sum_{k=0}^n {n\choose k}p^k(1-p)^{n-k} \log_2\left[{n\choose k}p^k(1-p)^{n-k} \right].$$ For $n$ large we can use the de-Moivre-Laplace theorem, $$H \simeq -\int_{-\infty}^\infty dx \, \frac{1}{\sqrt{2\pi}\sigma} \exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right] \log_2\left\{\frac{1}{\sqrt{2\pi}\sigma} \exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right] \right\},$$ where $\mu = n p$ and $\sigma^2 = n p(1-p)$. Thus, $$\begin{eqnarray*} H &\simeq& \int_{-\infty}^\infty dx \, \frac{1}{\sqrt{2\pi}\sigma} \exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right] \left[\log_2(\sqrt{2\pi}\sigma) + \frac{(x-\mu)^2}{2\sigma^2} \log_2 e \right] \\ &=& \log_2(\sqrt{2\pi}\sigma) + \frac{\sigma^2}{2\sigma^2} \log_2 e \\ &=& \frac{1}{2} \log_2 (2\pi e\sigma^2) \end{eqnarray*}$$ and so $$H \simeq \frac{1}{2} \log_2 \left[2\pi e n p(1-p)\right].$$ Higher order terms can be found, essentially by deriving a more careful (and less simple) version of de-Moivre-Laplace.

user26872
  • 19,465
  • How is the integration done in the second to last step? – Thomas Ahle Jul 15 '14 at 18:55
  • @ThomasAhle: Here we are asking for the zeroeth and second central moments of the normal distribution. But $E(1) = 1$ and $E((X-\mu)^2) = \sigma^2$. If you are more interested in thinking about tricks for evaluating Gaussian integrals, see here for example. Cheers! – user26872 Jul 15 '14 at 20:14
  • 2
    @user26872 is there any analogue for multinomial distribution? Thanks in advance! – Egorova Lena Jan 14 '16 at 13:28
  • 1
    @EgorovaLena: There is a generalized de-Moivre-Laplace theorem that should be useful in this regard. See these notes, for example. – user26872 Jan 14 '16 at 19:58
  • 2
    Why is the "dx" before the term being integrated. I know that it is a multiplication, but doesn't convention put the increment of integration at the far right side of the integral? $\int{sin \left( x \right) \cdot dx}$ vs. $\int{dx \cdot sin \left( x \right) }$. Is there a reason for the difference? – EngrStudent Mar 09 '16 at 15:54
  • 4
    @EngrStudent: I tend to think about integration as an operation that acts to the right like differentiation, rather than from outside to inside as it is often written: $\int_{y_1}^{y_2} dy\int_{x_1}^{x_2} dx(\ldots)$ rather than $\int_{y_1}^{y_2}\int_{x_1}^{x_2}(\ldots)dx,dy$. This is probably due to my background in physics, where this notation is common. – user26872 Mar 11 '16 at 00:11