7

Let $x_i$ be iid nonnegative discrete random variables $E[x_i]=N/M$ for some integers $N, M$, variance $\sigma^2$ and higher moments known (finite).

Then, the sum $\displaystyle S = \sum_{i=1}^M x_i$ will have $E[S]=N$.

I'm interested in the probability that $S$ takes that precise value: $A=P\left(S=E[S]\right)$.

Applying the central limit theorem, I can write

$\displaystyle A \approx \frac{1}{\sqrt{ 2 \pi M \sigma^2}}$

My question is: can this approximation be refined?

ADDED: To add some example-context-motivation:

Lets consider $X$ as a sum of $N$ Bernoullis (0/1) with prob=$p$, such that $E(X)=E(N p)$ is an integer. We can compute exactly the probability that $X$ attains its expected value, it's a Binomial:

$\displaystyle P = P(X= N p) = {N \choose N p} p^{N p} q^{N q} \hspace{2cm}$ [1a]

We might also get an approximate value of that probability using the CTL (Central Limit Theorem)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \hspace{2cm} $ [2a]

If we take [1a] and use the Stirling approximation, with $K \approx (K/e)^K \sqrt{2 \pi K}$, we get the same value. Fine.

Now, we may try to refine the approximation, both from [1a] and [2a].

Plugging the next orden Stirling approximation in [1a], we get (I am not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1- p q}{12 N p q} \right) \hspace{2cm} $ [1b]

To refine the CTL, one can think of

  • use some "continuity correction" to evaluate more precisely the (hipothetical) gaussian integral

  • add some terms from the Edgeworth expansions

  • do nothing of the above - because the CLT does not justify those procedures in this scenario (just one value of a discrete variable)

I'm not sure which is the correct way.

But let's try the first one: the next order approximation of the integral gives me (again, if I'm not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1}{24 N p q} \right) \hspace{2cm} $ [2b]

This is not the same as [1b], but it's close.

Is this just casual? Was it a reasonable thing to do? Should I look (also/instead) after the Edgeworth expansions?

leonbloy
  • 63,430
  • This is an unjustified approximation. Indeed, the probability can be as low as zero. E.g., let $x_i$ take the value 1 with probability 5/6 and otherwise let it have the value 3. $N/M$ = 4/3, all moments are finite. The expectation of the sum of three of these $x_i$ equals 4, an even number, but the sum can attain only odd values. – whuber Feb 14 '11 at 23:35
  • Right. Let's add that x takes nonegative (and smooth) values on all integers greater than some x_0 – leonbloy Feb 15 '11 at 00:00
  • What if the expected value is not an integer? Do you take the floor? The closest integer value? – Raskolnikov Feb 15 '11 at 22:30
  • The expected value is an integer, it's a feature-condition of my problem. Anyway, I guess it's enough to consider the problem of just estimating the probability that the sum takes some given integer value, not necessarily the mean. – leonbloy Feb 16 '11 at 01:01
  • I think the Edgeworth expansion should work fine in general. The characteristic function of a discrete distribution looks like that of a continuous distribution, at least around $t=0$. The only difference (as noted by @whuber) will be when the characteristic function has multiple points of modulus $1$, which will occur just when the original distribution is nonzero only at $an+b$ for $n \in \mathbb{Z}$ and some $a>1$. – mjqxxxx Feb 16 '11 at 02:32

1 Answers1

10

For a discrete random variable $X$ with support $\mathbb{Z}$, the Fourier transform of the probability distribution $P_x \equiv P[X=x]$ is given by $$ \tilde{P}(k) = \sum_{x=-\infty}^{\infty} e^{ikx} P_x = E\left[e^{ikx}\right] = e^{h(k)}, $$ where $$ h(k) = \sum_{n=1}^{\infty} \kappa_{n} \frac {(ik)^{n}}{n!} $$ is the natural logarithm of the characteristic function of $X$, and $\kappa_{n}$ is the $n$th cumulant of $X$. Recall that $\kappa_{1} = \mu$ is the mean and $\kappa_{2} = \sigma^2$ is the variance. The probability that a sum of $M$ independent variables $X_i$ with the same distribution is exactly $x \in {\mathbb{Z}}$ is then $$ \begin{eqnarray} P\left[\sum_{i=1}^{M} X_i = x\right] &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{-ikx}\tilde{P}(k)^M \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{Mh(k)-ikx} \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{ik(M\mu - x) - \frac{1}{2}M\sigma^2 k^2} \exp\left(\sum_{n=3}^{\infty}M\kappa_{n}\frac{(ik)^{n}}{n!}\right). \end{eqnarray} $$ Considering the desired case where $x = M\mu \in {\mathbb{Z}}$, and making the change of variable $k \rightarrow k/(\sigma\sqrt{M})$, we have $$ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\int_{-\pi\sigma\sqrt{M}}^{\pi\sigma\sqrt{M}} d\Phi(k) \exp\left(\sum_{n=3}^{\infty} \sigma^{-n}M^{1-\frac{1}{2}n}\kappa_{n}\frac{(ik)^{n}}{n!}\right), $$ where $d\Phi(k) = \phi(k) dk$ is the standard normal distribution (with mean $0$ and variance $1$). Here we assume that exponential decays rapidly away from $k=0$, so we may replace the limits of integration by $\pm\infty$. Then, expanding the exponential in inverse powers of $M$, and using the fact that the $n$th central moment of the standard normal distribution vanishes for odd $n$ and is equal to $(n-1)!!$ for even $n$, we obtain the following: $$ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\left(1 + \frac{\kappa_4}{8M\sigma^4} - \frac{5\kappa_3^2}{24M\sigma^6} + O(M^{-2})\right). $$ This is essentially the Edgeworth expansion. If $X$ is the Bernoulli distribution with probability of success $p = \frac{1}{2}(1+a)$ (and of failure $q=\frac{1}{2}(1-a)$), then it is straightforward to verify that $$ \begin{eqnarray} \kappa_2 &=& \sigma^2 = pq = \frac{1}{4}(1-a^2) \\ \kappa_3 &=& \frac{1}{4}(1-a^2)(-a) = -\frac{1}{4}a(1-a^2) \\ \kappa_4 &=& \frac{1}{8}(1-a^2)(3a^2-1), \end{eqnarray} $$ and hence $$ \begin{eqnarray} \frac{5\kappa_3^2}{24\sigma^6} &=& \frac{5a^2}{6(1-a^2)} \\ \frac{\kappa_4}{8\sigma^4} &=& \frac{3a^2 - 1}{4(1-a^2)}, \end{eqnarray} $$ for a total correction term proportional to $$ -\frac{5\kappa_3^2}{24M\sigma^6} + \frac{\kappa_4}{8M\sigma^4} = \frac{9a^2-3-10a^2}{12M(1-a^2)} = -\frac{3+a^2}{12M(1-a^2)} = -\frac{1-pq}{12Mpq}, $$ which agrees with the Stirling approximation to the exact result.

mjqxxxx
  • 41,358
  • Great answer! Only one doubt: I'm not sure if the aproximation introduced (when extending the integral, which assimilates our discrete-lattice variable to a continuous value) produces an error less that O(1/M). If your procedure is equivalent to a Edgeworth expansion of the distribution function and evaluating it in the points u-1/2, u+1/2, then I think Edgeworth gives an extra error for lattice vars (sort of a discretization error) that requires special correction if we want a total error of order better than $1/sqrt(M)$) (eg, http://goo.gl/c3WTB ). I'll check all this and let you know. – leonbloy Feb 17 '11 at 11:24