Refining the central limit theorem on discrete random vars

Question

Let $x_i$ be iid nonnegative discrete random variables $E[x_i]=N/M$ for some integers $N, M$, variance $\sigma^2$ and higher moments known (finite).

Then, the sum $\displaystyle S = \sum_{i=1}^M x_i$ will have $E[S]=N$.

I'm interested in the probability that $S$ takes that precise value: $A=P\left(S=E[S]\right)$.

Applying the central limit theorem, I can write

$\displaystyle A \approx \frac{1}{\sqrt{ 2 \pi M \sigma^2}}$

My question is: can this approximation be refined?

ADDED: To add some example-context-motivation:

Lets consider $X$ as a sum of $N$ Bernoullis (0/1) with prob=$p$, such that $E(X)=E(N p)$ is an integer. We can compute exactly the probability that $X$ attains its expected value, it's a Binomial:

$\displaystyle P = P(X= N p) = {N \choose N p} p^{N p} q^{N q} \hspace{2cm}$ [1a]

We might also get an approximate value of that probability using the CTL (Central Limit Theorem)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \hspace{2cm} $ [2a]

If we take [1a] and use the Stirling approximation, with $K \approx (K/e)^K \sqrt{2 \pi K}$, we get the same value. Fine.

Now, we may try to refine the approximation, both from [1a] and [2a].

Plugging the next orden Stirling approximation in [1a], we get (I am not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1- p q}{12 N p q} \right) \hspace{2cm} $ [1b]

To refine the CTL, one can think of

use some "continuity correction" to evaluate more precisely the (hipothetical) gaussian integral
add some terms from the Edgeworth expansions
do nothing of the above - because the CLT does not justify those procedures in this scenario (just one value of a discrete variable)

I'm not sure which is the correct way.

But let's try the first one: the next order approximation of the integral gives me (again, if I'm not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1}{24 N p q} \right) \hspace{2cm} $ [2b]

This is not the same as [1b], but it's close.

Is this just casual? Was it a reasonable thing to do? Should I look (also/instead) after the Edgeworth expansions?

This is an unjustified approximation. Indeed, the probability can be as low as zero. E.g., let $x_i$ take the value 1 with probability 5/6 and otherwise let it have the value 3. $N/M$ = 4/3, all moments are finite. The expectation of the sum of three of these $x_i$ equals 4, an even number, but the sum can attain only odd values. — whuber, Feb 14 '11 at 23:35
Right. Let's add that x takes nonegative (and smooth) values on all integers greater than some x_0 — leonbloy, Feb 15 '11 at 00:00
What if the expected value is not an integer? Do you take the floor? The closest integer value? — Raskolnikov, Feb 15 '11 at 22:30
The expected value is an integer, it's a feature-condition of my problem. Anyway, I guess it's enough to consider the problem of just estimating the probability that the sum takes some given integer value, not necessarily the mean. — leonbloy, Feb 16 '11 at 01:01
I think the Edgeworth expansion should work fine in general. The characteristic function of a discrete distribution looks like that of a continuous distribution, at least around $t=0$. The only difference (as noted by @whuber) will be when the characteristic function has multiple points of modulus $1$, which will occur just when the original distribution is nonzero only at $an+b$ for $n \in \mathbb{Z}$ and some $a>1$. — mjqxxxx, Feb 16 '11 at 02:32

score 10 · Accepted Answer · answered Feb 16 '11 at 22:47

For a discrete random variable $X$ with support $\mathbb{Z}$, the Fourier transform of the probability distribution $P_x \equiv P[X=x]$ is given by $$ \tilde{P}(k) = \sum_{x=-\infty}^{\infty} e^{ikx} P_x = E\left[e^{ikx}\right] = e^{h(k)}, $$ where $$ h(k) = \sum_{n=1}^{\infty} \kappa_{n} \frac {(ik)^{n}}{n!} $$ is the natural logarithm of the characteristic function of $X$, and $\kappa_{n}$ is the $n$th cumulant of $X$. Recall that $\kappa_{1} = \mu$ is the mean and $\kappa_{2} = \sigma^2$ is the variance. The probability that a sum of $M$ independent variables $X_i$ with the same distribution is exactly $x \in {\mathbb{Z}}$ is then $$ \begin{eqnarray} P\left[\sum_{i=1}^{M} X_i = x\right] &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{-ikx}\tilde{P}(k)^M \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{Mh(k)-ikx} \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{ik(M\mu - x) - \frac{1}{2}M\sigma^2 k^2} \exp\left(\sum_{n=3}^{\infty}M\kappa_{n}\frac{(ik)^{n}}{n!}\right). \end{eqnarray} $$ Considering the desired case where $x = M\mu \in {\mathbb{Z}}$, and making the change of variable $k \rightarrow k/(\sigma\sqrt{M})$, we have $$ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\int_{-\pi\sigma\sqrt{M}}^{\pi\sigma\sqrt{M}} d\Phi(k) \exp\left(\sum_{n=3}^{\infty} \sigma^{-n}M^{1-\frac{1}{2}n}\kappa_{n}\frac{(ik)^{n}}{n!}\right), $$ where $d\Phi(k) = \phi(k) dk$ is the standard normal distribution (with mean $0$ and variance $1$). Here we assume that exponential decays rapidly away from $k=0$, so we may replace the limits of integration by $\pm\infty$. Then, expanding the exponential in inverse powers of $M$, and using the fact that the $n$th central moment of the standard normal distribution vanishes for odd $n$ and is equal to $(n-1)!!$ for even $n$, we obtain the following: $$ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\left(1 + \frac{\kappa_4}{8M\sigma^4} - \frac{5\kappa_3^2}{24M\sigma^6} + O(M^{-2})\right). $$ This is essentially the Edgeworth expansion. If $X$ is the Bernoulli distribution with probability of success $p = \frac{1}{2}(1+a)$ (and of failure $q=\frac{1}{2}(1-a)$), then it is straightforward to verify that $$ \begin{eqnarray} \kappa_2 &=& \sigma^2 = pq = \frac{1}{4}(1-a^2) \\ \kappa_3 &=& \frac{1}{4}(1-a^2)(-a) = -\frac{1}{4}a(1-a^2) \\ \kappa_4 &=& \frac{1}{8}(1-a^2)(3a^2-1), \end{eqnarray} $$ and hence $$ \begin{eqnarray} \frac{5\kappa_3^2}{24\sigma^6} &=& \frac{5a^2}{6(1-a^2)} \\ \frac{\kappa_4}{8\sigma^4} &=& \frac{3a^2 - 1}{4(1-a^2)}, \end{eqnarray} $$ for a total correction term proportional to $$ -\frac{5\kappa_3^2}{24M\sigma^6} + \frac{\kappa_4}{8M\sigma^4} = \frac{9a^2-3-10a^2}{12M(1-a^2)} = -\frac{3+a^2}{12M(1-a^2)} = -\frac{1-pq}{12Mpq}, $$ which agrees with the Stirling approximation to the exact result.

Great answer! Only one doubt: I'm not sure if the aproximation introduced (when extending the integral, which assimilates our discrete-lattice variable to a continuous value) produces an error less that O(1/M). If your procedure is equivalent to a Edgeworth expansion of the distribution function and evaluating it in the points u-1/2, u+1/2, then I think Edgeworth gives an extra error for lattice vars (sort of a discretization error) that requires special correction if we want a total error of order better than $1/sqrt(M)$) (eg, http://goo.gl/c3WTB ). I'll check all this and let you know. — leonbloy, Feb 17 '11 at 11:24

Refining the central limit theorem on discrete random vars

1 Answers1

Linked