5

We have a set of $t$ independent random variables $X_i \sim \mathrm{Bin}(n_i, p_i)$. We know that $$\mathrm{Pr}[X_i \geq z] = \sum_{j=z}^{\infty} { n_i \choose j } p_i^j (1-p_i)^{n_i -j}.$$

But is there an easy way to compute:

$$\mathrm{Pr}\left[\sum_{i=1}^t X_i \geq z\right]?$$

MY IDEAS: This should have to do something with convolution, but I am not sure.

Is it easier to compute $$\mathrm{Pr}\left[\sum_{i=1}^t X_i \leq z \right]?$$

What I thought of is maybe: $$\mathrm{Pr}\left[\sum_{i=1}^t X_i \leq z \right]=\sum_{j=1}^{z} \mathrm{Pr}\left[\sum_{i=1}^t X_i = z \right]$$ but this seems to be quite hard with $t$ random variables.

Would appreciate any hint and if you don't write an answer I am interested in whether it is too difficult or too easy?! thank you..

Davide Giraudo
  • 172,925
user136457
  • 2,560
  • 1
  • 22
  • 41
  • 1
    partial sum of rows in the Pascal triangle doesn't have a closed form, but you can either 1) use an approximation or 2) use the Central limit theorem. For large $t$ it gives a good approximation. – Alex Jul 10 '14 at 11:48
  • I am very interested in an Approximation. How could one approximate this? – user136457 Jul 10 '14 at 11:49
  • I am mainly intersted in approximations where $n_i p_i \leq 1$ or $n_i p_i \gg 1$, but not for $n_i p_i = O(1)$, if this should be relevant. – user136457 Jul 10 '14 at 11:51
  • When $t$ is large, approximations with the normal distribution are the way. When $t$ is small, you can rely on the behaviour of the hypergeometric function $\phantom{}_2 F_1$, as shown in http://math.stackexchange.com/questions/870063/prove-by-induction-that-a-k-sum-limits-n-2k3k-binom3kn-cdot-frac/870480#870480. – Jack D'Aurizio Jul 18 '14 at 16:16

2 Answers2

2

The CLT is indeed applicable here as an approximation to the CDF of a sum of binomial random variables (in particular, the Lindeberg-Feller CLT) as long as we assume that the $\mathrm{Var}(X_i)=p_i(1-p_1)n_i$ are bounded from above or grow slow enough that individual variances become small compared to the variance of the entire sum (see Lindeberg Condition). This is satisfied as long as the $n_i$ don't drastically increase with $t$.

Given the above assumption (which is usually reasonable), we can use the Lindeberg-Feller CLT to approximate the sum of t binomial random variables:

Let $X_i \sim \mathrm{Bin}(n_i,p_i)$ and $s_t= \sqrt{\sum\limits_{i=1}^t n_ip_i(1-p_i)}$. Then the Lindeberg-Feller CLT states that:

$$\frac{\sum\limits_{i=1}^t (X_i-n_ip_i)}{s_t}\xrightarrow{d}\mathcal{N}(0,1).$$

Therefore, we can approximate your sum of binomials with a normal distribution:

$\frac{\sum\limits_{i=1}^t (X_i-n_ip_i)}{s_t}\xrightarrow{d}\mathcal{N}(0,1)\implies \sum\limits_{i=1}^t (X_i-n_ip_i)\xrightarrow{d} \mathcal{N}(0,s_t^2) \implies \sum\limits_{i=1}^t X_i \xrightarrow{d}\mathcal{N}\left(\sum\limits_{i=1}^t n_ip_i,s_t^2\right)$.

Thus, $\lim\limits_{t\rightarrow \infty}P\left(\sum\limits_{i=1}^t X_i \leq z\right)=\Phi\left(\frac{z-\sum\limits_{i=1}^t n_ip_i}{s_t}\right)$.

Now, the sum of these binomials is really only defined for integer values of z. Therefore, to get an approximation to $P\left(\sum_{i=1}^t X_i \geq z\right)$, just use the above approximation to estimate $1-P\left(\sum_{i=1}^t X_i < z\right)\approx 1-\Phi\left(\frac{z-1-\sum\limits_{i=1}^t n_ip_i}{s_t}\right)$

Davide Giraudo
  • 172,925
  • is the lindeberg condition true if one can Show that $\frac{\sqrt{n_i p_i (1-p_i)}}{s_n}\rightarrow 0$ for every $i$? – user136457 Jul 16 '14 at 11:37
  • 1
    @user136457 yes, if by $\rightarrow$ you mean as $n\rightarrow\infty$ See bottom of page of the link on the Lindeberg condition. –  Jul 16 '14 at 11:48
  • thanks for your answer. This was really helpful and got an upvote. As I am intersted also in an Approximation not in Distribution but as an approximation $1-P[\sum X_i < z] \sim g$ for some $g$, do you know something about this? Could this be possible? – user136457 Jul 16 '14 at 11:51
  • @user136457 I think there is some confusion on terminology here...what I just gave you was an approximation to $1-P[\sum X_i < z]$. In other words, to use your notation $g = 1-\Phi\left(\frac{z-1-\sum\limits_{i=1}^t n_ip_i}{s_t}\right)$ –  Jul 16 '14 at 11:55
  • @user136457 (clarified) A sum of random variables is a function, hence when a sum of random variables converges in distribution, it is the same as saying that it has an approximation. –  Jul 16 '14 at 11:56
  • Ok, let me state it differently. This approximation is in Distribution, isn't it? So, as I am not very good in mathematics, I am not sure whether this gives me some bounds in error. What I really would like to know additionally, is whether it is true that the relative error is in $o(1)$ and if one can give a concrete term for the relative error $\frac{1-P[\sum X_i < z]}{g}$? – user136457 Jul 16 '14 at 11:59
1

I'll give you a general idea. If you want the details, look up an article by Tomas Woersch (here).

So you have $$ S_m(n) = \sum_{k=0}^{m} \binom{n}{k}x^k = 1 + \binom{n}{1}x +\ldots + \binom{n}{m}x^m\\ \frac{S_m(n)}{\binom{n}{m}x^m} = 1 + \ldots + \frac{\binom{n}{1}x}{\binom{n}{m}x^m} + \frac{1}{\binom{n}{m}x^m} = 1 + \ldots a_{m-1}x^{1-m} +a_m x^{-m} $$ Now you need to do the following: obtain the ration of binomial coefficients I denoted with $a_k$ and then approximate then using Stirling approach. Use the fact that $$(1 - o(1))\sqrt{2 \pi n} (\frac{n}{e})^n \leq n! \leq (1+o(1))\sqrt{2 \pi n} (\frac{n}{e})^n$$

Alex
  • 19,262
  • So I tried to understand what you mean and also had a look at Woersch's paper. But somehow I miss the link to the original question. $S_m(n)=Pr[Bin(n,p)\leq m]$ for $x=p(1-p)^{n/k -1}$ And what you write lets me think that $S_m(n)$ can be approximated by ${ n \choose m} x^m$, thus by the last summand what makes a lot of sense. But now, I am intersted in $Pr[\sum_i X_i \geq z] = 1- Pr[\sum_i X_i \leq z]$, but then somehow I have to use convolution again? – user136457 Jul 10 '14 at 13:07
  • Taken together: I think, I understood what you have written, to get an Approximation of $S_m(n)$ but I have no idea how to use $S_m(n)$ to get the probability from above, as I think it has something to do with convolution? – user136457 Jul 10 '14 at 13:09
  • Here $(\frac{p}{1-p})^n = x$. IF $X_k \sim Binomial (n_k,p)$, then the sum of $m$ variables such that $m n_k = n$ is also Binomial with parameters $n$ and $p$. – Alex Jul 10 '14 at 13:23
  • The Problem is that $X_k \sim Bin(n_k, p_k)$ and thus with different probabilities $p_k$. Is then something similar possible? If it helps, we will have $X_k \sim Bin(n_k, \frac{k}{l})$ for some $l$. – user136457 Jul 10 '14 at 13:29
  • OK< I might have misunderstood your question. Have a look here mb for rvs that are independent but not identically distributed: http://sites.stat.psu.edu/~dhunter/asymp/fall2002/lectures/ln04.pdf – Alex Jul 10 '14 at 13:33
  • thanks for your help and this link, but I don't think that this is what I am interested in, as I don't want to get Limits in Distribution.
    Assuming general distributions of the $X_i$. Isn't it possible to find some easier term for this?

    Or is it possible to find an Approximation using the mean of all the $p_k$?

    Or is it possible to find upper and lower bounds using Minimum value of $p_k$ and Maximum value of $p_k$?

    – user136457 Jul 10 '14 at 13:38