Use De Moivre–Laplace to approximate $1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right)$

Question

I am trying to use De Moivre–Laplace theorem to approximate $$1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right)$$

The idea of an approximation is that we don't have the sum term which is difficult to calculate if $n$ is high.

Using the De Moivre–Laplace theorem gets us that: $${n \choose k} p^{k}(1-p)^{n-k} \approx \frac{1}{\sqrt{2 \pi np(1-p)}}e^{-\frac{(k-np)^2}{2np(1-p)}}$$ Now we see that \begin{align} F &= 1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right) \\&\approx 1 - \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi np(1-p)}}e^{-\frac{(x-np)^2}{2np(1-p)}}\log_2\left(1+\left(\frac{p}{1-p}\right)^{n-2x}\right) dx \end{align}

my calculation is inspired by Entropy of a binomial distribution

If one has an other suggestion to approximate $F$ or get a closed for i would like to hear those. So far i've tried approximating $F$ with a least squares method using a tanh function as the fit function.

thanks for your reply ill edit it right way. Any ideas on how to tackle this problem? — Kees Til, Aug 04 '18 at 20:49
I didnt get it. There is something seriously wrong here. The sum formula is eventually independent of $k$ but dependent on $p$ and the final integral is independent of $p$ but dependent on $k$. What does $k$ mean after the integration? sounds strange right? — Seyhmus Güngören, Aug 04 '18 at 21:06
I guess $k$ is eventually $x$ and the integration over $p$ is probably a typo. I do not get the limit $-\infty..+\infty$ though when originally we have $0..n$. — Diger, Aug 04 '18 at 21:12
@Diger, my proof is inspired by this question https://math.stackexchange.com/questions/244455/entropy-of-a-binomial-distribution.
The reason i take the integral to $-\infty$ is to get the whole normal distribution covered. Plotting this functions shows us that it can take on values that are not zero in the negative axis — Kees Til, Aug 04 '18 at 21:16
@SeyhmusGüngören you were right it was a typo, it is supposed to be a dx. I take the integral form $- \infty$ to $\infty$ since we look at the normal distribution which can get negative values on the x-axis — Kees Til, Aug 04 '18 at 21:25
I cannot completely rule out, but it seems to me that it is not possible to get a closed form solution. Just take $p=0.5$, then you will get just the integral over a Gaussian function. In other cases, you are integrating a function which is Gaussian multiplied by some other function. You can easily solve this equation numerically. Why do you need it? — Seyhmus Güngören, Aug 04 '18 at 21:41
@SeyhmusGüngören i am calculating the mutual information within some system. The idea is that if i have some source node $x_0$ and look at the mutual information with some set of nodes on distance $d$, i can use this as a measure of dynamical impact of this source node. The problem is that when i calculate these for big systems my computer is really slow therefor i want an approximation/closed of this mutual information. — Kees Til, Aug 04 '18 at 21:46
Actually it doesnt matter. So for $p<0.5$ use the approximation $\log(1+y)\approx y$ and for large $y$, we have $\log(1+y)\approx \log(y)$. One more thing you have $y=a^{(f(x))}$ and you can write this as $\exp(a,f(x))$, then in the integral you have the multiplication of two exponentials. I think these forms are well studied. — Seyhmus Güngören, Aug 04 '18 at 22:07
i tried this as well, however it did work bad as a fit. Nice thinking though :D — Kees Til, Aug 04 '18 at 22:18
take $p$ small enough and $n$ large enough. I think it should work. Btw. It will be very good, if you could put the results that you already have to the question description. Plus: the answer of wolfram.alpha for the evaluation of this integral, just the link. — Seyhmus Güngören, Aug 04 '18 at 22:28
$n$ needs to grow large as i used the de Moivre–Laplace theorem. — Kees Til, Aug 05 '18 at 09:35

Yuri Negometyanov · Answer 1 · 2018-08-12T17:19:52.170

5

$$\color{brown}{\textbf{Transformations}}$$

Let WLOG the inequality $$q=\dfrac p{1-p}\in(0,1)\tag1$$ is valid. Otherwise, the corresponding opposite events can be reversed.

This allows to present the issue expression in the form of \begin{align} &S(n,p)=1 - (1-p)^n\sum_{k=0}^{n} {n \choose k} q^k\log\left(1+q^{n-2k}\right),\tag2\\[4pt] \end{align} or \begin{align} &=1 - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^kq^{n-2k} - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right)\\[4pt] &=1 - (1-p)^n(1+q)^n - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right)\\[4pt] &S(n,p)= - (1-p)^n\sum_{k=0}^{n} {n \choose k}q^k\left(\log\left(1+q^{n-2k}\right)-q^{n-2k}\right).\tag3\\[4pt] \end{align} Formula $(3)$ can simplify the calculations, because it does not contain the difference of the closed values.

$$\color{brown}{\textbf{How to calculate this.}}$$

Note that the sum of $(3)$ contains both the positive and the negative degrees of $q.$ This means that in the case $n\to \infty$ the sum contains the terms of the different scale.

The calculations in the formula $(3)$ can be divided on the two parts.

$\color{green}{\textbf{The Maclaurin series.}}$

The Maclaurin series for the logarithmic part converges when the term $\mathbf{\color{blue}{q^{n-2k} < 1}}.$ This corresponds with the values $k<\frac n2$ in the case $\mathbf{q<1}$ and with the values $k>\frac n2$ in the case $\mathbf{q>1}.$ Then the Maclaurin series in the form of $$\log(1+q^{n-2k}) = \sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}q^{(2n-k)i}\tag4$$ can be used.

If $\mathbf{\color{blue}{q^{n-2k} > 1}},$ then $$\log(1+q^{n-2k}) = \log(q^{2n-k}(1+q^{k-2n})) = (2n-k)\log q + \log(1+q^{k-2n}).\tag5$$

If $\mathbf{\color{blue}{q^{n-2k} = 1}},$ then $LHS(4) = \log2.$

If $\mathbf{\color{blue}{q^{n-2k} \lesssim 1}},$ then $$\log(1+q^{2n-k}) = \log\frac{1+r}{1-r} = 2r\sum_{i=0}^\infty\frac{(-1)^i}{2i+1}r^{2i},\quad \text{ where } r=\frac{q^{2n-k}}{2+q^{2n-k}}\approx\frac{q^{2n-k}}3,\tag6$$ and can be used some terms of the series.

$\color{green}{\textbf{The double summations.}}$

After the substitution of the $(4)$ or $(5)$ to $(3)$ the sums can be rearranged. For example, $$\sum_{k=0}^{L}{n \choose k}q^k\sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}q^{(2n-k)i}= \sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}\sum_{k=0}^{L}{n \choose k}q^kq^{(2n-k)i}$$ $$= q^{n+1}\sum_{i=1}^\infty\frac{(-1)^{i+1}}{i}\sum_{k=0}^{L}{n \choose k}\left(q^{i+1}\right)^{n-k},$$ wherein the order of the summation can be chosen, taking in account the given data.

edited Aug 12 '18 at 17:19

answered Aug 11 '18 at 00:07

Yuri Negometyanov

28,026

1

hmmmm, i am quite curious now. Does the limit converge to 0 as $n \rightarrow \infty?$ – Kees Til Aug 11 '18 at 00:15
2

@KeesTil If $k<\frac n2,$ then we have suitable Maclaurin series. If $k>\frac n2,$ then the factor $(1-p)^n$ provides the convergence. – Yuri Negometyanov Aug 11 '18 at 00:37
i dont see how p/(1-p) is always in (0,1) here. $0.9/.1 = 9 \notin (0,1)$ – Kees Til Aug 11 '18 at 18:10
1

@KeesTil Yes, that's valid. Fortunately, $p$ and $1-p$ are the possibilities of the opposite events, whose designations can be changed for this task. – Yuri Negometyanov Aug 11 '18 at 19:10
True, i was looking into this but i don't see how i can make a approximation of $S(n)$ where we don't have the sum term :( – Kees Til Aug 11 '18 at 19:18
1

@KeesTil I see that $S(p)=S(1-p).$ Please check this. Thanks for the useful comments. – Yuri Negometyanov Aug 11 '18 at 19:29
i can try using rvvs solution with your expression but the Taylor series does not look pretty. – Kees Til Aug 11 '18 at 20:02
See the updated version. – Yuri Negometyanov Aug 12 '18 at 08:51
1

I don't get the last sentence where you day that the inner sums can be calculated via the de Moivre Laplace theorem. How can we do this, $q+1 \neq 1$. I use this definition btw: https://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem – Kees Til Aug 12 '18 at 10:33
@KeesTil Your are right. Thanks. Fixed. – Yuri Negometyanov Aug 12 '18 at 12:48
1

nice, i think the blue terms should be $q^{2n-k}$ not $q^{n-2k}$? – Kees Til Aug 12 '18 at 16:53
@KeesTil, Course) – Yuri Negometyanov Aug 12 '18 at 17:18

score 4 · Answer 2 · answered Aug 07 '18 at 15:41

The expression looks very much like the Bernstein approximation of a function ($1-f(x)$) on $[0,1]$. But the argument (in fact the degree $n-2k$) of $log$ function ruins everything.

Here is a quick idea. Denote $$ y(p)=\sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right). $$

Let us assume that we can represent $y(p)$ in the form $y(p)=\sum_{m=0}^\infty y_m p^m$, where $y_m$ are constants not depending on $p$.

Note that $y(p)=y(1-p)$. Let us consider the equation $$ y(p)+y(1-p)=f(p). \tag{eq1}\label{eq1} $$ Although we can write out the expression for $f(p)$, let us think that we don't know how $f(p)$ looks like. But for sure, $f(p)$ must satisfy $f(p)=f(1-p)$. It is know (see for example http://eqworld.ipmnet.ru/en/solutions/fe/fe1116.pdf) that equations like \eqref{eq1} have a solution, for example, $$ \tag{eq2}\label{eq2} y(p)=f(p) \sin^2({\pi p \over 2}). $$ By expanding $\sin^2({\pi p \over 2})$ into the Maclaurin series we get $$ y(p)=f(p) \sum_{m=1}^\infty {(-1)^{m+1} 2^{2m-1} \over (2m)!} {p^{2m} \pi^{2m} \over 2^{2m}}. $$

Let us assume that $f(p)$ is an analytic function i.e. $f(p)=\sum_{m=0}^\infty {f^{(m)}(0)\over m!} p^m$. By writing \eqref{eq2} in the series form we have: $$ \sum_{m=0}^\infty y_m p^m = \left ( \sum_{m=0}^\infty {f^{(m)}(0)\over m!} p^m \right ) \left ( \sum_{m=1}^\infty {(-1)^{m+1} \over (2m)!} {p^{2m} \pi^{2m} \over 2} \right ). $$

From this relation it may be possible to find the expressions for $f^{(m)}(0)$ through $y_m$ by equating the coefficients at $p^m$. If this works out, we go back to the right part of \eqref{eq2} and try to find how many terms in the product $$ \left ( f^{(0)}(0) + f^{(1)}(0) p + f^{(2)}(0) {p \over 2} + \dots \right ) \left ( \sum_{m=1}^\infty {(-1)^{m+1} \over (2m)!} {p^{2m} \pi^{2m} \over 2} \right ). $$

yield the approximate value.

It is impossible (at least I don't see how) to use Bernstein approximation because of the (n-2k) degree. But the idea above is the one you can try. — rrv, Aug 08 '18 at 05:21
I tried this method, however my terms were very complicated and i could not see something that was simplified. My Mathemathica code:
A = FullSimplify[ Series[Sum[ Binomial[n, k]*p^k (1 - p)^(n - k) Log[2, 1 + (p/(1 - p))^(n - 2 k)], {k, 0, n}] , {p, 0, 2}]]

B = FullSimplify[ Series[Sum[((-1)^(m + 1))/2 m!(p^(2m)Pi^(2m)/(2)), {m, 1, Infinity}], {p, 0, 2}]] — Kees Til, Aug 08 '18 at 09:30

Maxim · Answer 3 · 2018-08-11T21:33:08.130

0

We can apply a method similar to this. Since the summand has a sharp peak around $k = n/2$, we can take an expansion valid for large $n$ and for $k$ close to $n/2$ and then, also due to the tails being small, extend the summation range indefinitely:

$$a_k = \binom n k p^k q^{n - k} \ln \left( 1 + \left( \frac p q \right)^{n - 2 k} \right), \quad q = 1 - p, \\ a_{n/2 + i} \sim \sqrt {\frac 2 {\pi n}} \left( 2 \sqrt {p q} \right)^n \left( \frac p q \right)^i \ln \left( 1 + \left( \frac q p \right) ^{2 i} \right), \\ \sum_{k = 0}^n a_k \sim \sqrt {\frac 2 {\pi n}} \left( 2 \sqrt {p q} \right)^n \sum_{i = -\infty}^\infty \left( \frac p q \right)^i \ln \left( 1 + \left( \frac q p \right) ^{2 i} \right), \\ n \to \infty, p \text{ fixed}, 0 < p < 1, p \neq \frac 1 2.$$

edited Aug 11 '18 at 21:33

answered Aug 11 '18 at 17:19

Maxim

10,764

i don't really get it right now, but you now have a summation term from $-\infty$ to $\infty$, does that not make the computation expensive? – Kees Til Aug 11 '18 at 18:08
Depends on what your goal is. If $p$ is fixed, the sum over $i$ is a constant. – Maxim Aug 11 '18 at 18:17
can you write out the sum for fixed $p$, i don't see the solution directly – Kees Til Aug 11 '18 at 18:30
Do you mean a closed form? This infinite sum probably doesn't have one. – Maxim Aug 11 '18 at 18:45
too bad my p is not fixed so i dont think this will work for me :( – Kees Til Aug 11 '18 at 18:53

Use De Moivre–Laplace to approximate $1 - \sum_{k=0}^{n} {n \choose k} p^{k}(1-p)^{n-k} \log\left(1+\left(\frac{p}{1-p}\right)^{n-2k}\right)$

3 Answers3