1

This is an extension to question 821984:

Given $N$ balls independently distributed randomly among $m$ bins, what is the probability that at least one bin gets exactly $k$ balls? And is this probability maximized when $N = k m$?

To be more clear, for fixed $N$ and $m$, there is for each integer $k$ a probability $P_{mN}(k)$ that at least one bin gets exactly $k$ balls. Is it true for all fixed $(N,m)$ that $$ \frac{N}{m} - 1 < k_0 : \max_k P_{mN}(k) = P_{mN}(k_0) < \frac{N}{m} + 1 $$

For example, for $m=3, N=7$ we get $$ P_{37}(0) = \frac{127}{729}, P_{37}(1) = \frac{409}{729}, P_{37}(2) = \frac{462}{729}, P_{37}(3) = \frac{420}{729}, \\ P_{37}(4) = \frac{280}{729}, P_{37}(5) = \frac{84}{729}, P_{37}(6) = \frac{14}{729}, P_{37}(7) = \frac{1}{729} $$ with the maximum occuring at $k=2$ which satisfies $\frac73 - 1 < 2 < \frac73+1$.

I can produce an expression for the probability, based on the probability that at least $r$ bins will contain exactly $k$ balls: $$ P(k,m,N) = \frac{1}{m^N}\sum_{r=1}^{\lfloor N/k \rfloor}(-1)^r \left( \begin{array}{c}m\\r\end{array} \right) (m-r)^{N-rk}\prod_{s=0}^{r-1} \left( \begin{array}{c}N-2s\\k\end{array} \right) $$ (The minus signs come from the usual mantra of subtracing the double-counted overlap where two bins contain $k$ balls, then adding back the triple overlap, and so forth.)

But even after wandering around in Concrete Mathematics, I can't simplify this enough to answer the maximization question.

Mark Fischler
  • 41,743
  • I don't know how to simplify that expression. At this point, I'd take a step back and try to compute the probability that no bin contains k balls. – nomen Jun 05 '14 at 21:49
  • @Mark Fischler If $N$ balls are independently distributed randomly among $m$ bins, does it imply that any two final configurations, say for example if $m=3$, ${k,1,n-(k+1)}$ and ${k,2,n-(k+2)}$ are equally probable? – talegari Jun 06 '14 at 03:51

2 Answers2

2

I'm not sure if this can be simplified.

For large values of $N,m$, an asympotic approximation is straighforward (Poissonization). Letting $\lambda = N/m$, $t = \lambda/k$:

$$P \approx 1- \left( \frac{e^{-\lambda} \lambda^k}{k!}\right)^m=\\ =1- \left( \frac{k^k}{k!}\right)^m \left( t^k e^{-t k} \right)^m $$

Regarding the last expression as a function of $t$, keeping $k,m$ fixed, we get the extremum at $t=1$, as expected ($N=k m$). At this point, the value of the probability can be further approximated by the Stirling approximation, giving:

$$ P_{max}\approx 1- \left( \frac{1}{\sqrt{2 \pi k }}\right)^m$$

leonbloy
  • 63,430
0

We assume the $N$ balls are indistinguishible while the $m$ bins are distinguishible. So, if $m=2$ and $N\geq k$ there are two favorable outcomes: \begin{align*} (k,N-k)\qquad\text{and}\qquad (N-k,k) \end{align*}

With this assumption the probability $P(k,m,N)$ is somewhat different than OPs formula and the probability will also in general not be maximized at $N=km$.

We calculate the probability $P(k,m,N)$ by looking at the number of favorable outcomes, i.e. the number with at least one bin containing precisely $k$ balls and divide this number by the number of all possible outcomes.

Special case: $P(k,1,N)$

At first we look at the special case with $m=1$, i.e. there is only one bin. In this case there is only one favorable outcome, if the number $N$ of balls is equal to $k$.

We obtain \begin{align*} P(k,1,N)=\begin{cases} 1&\qquad N=k\\ 0&\qquad \text{otherwise}\\ \end{cases} \end{align*}

Special case: $1\leq N<k$

If the number $N$ of balls is less than the number $k$ we can't reach any favorable outcome and the probability is zero: \begin{align*} P(k,m,N)=0\qquad\quad 1\leq N<k,m\geq 1 \end{align*}

General case: $m\geq 2, N\geq k$

In the following we assume there are more than one bins availabe, i.e. $m\geq 2$ and the number $N$ of balls is at least $k$, i.e. $N\geq k$. The probability $P(k,m,N)$ is given by \begin{align*} P(k,m,N)&= \left(\!\!\binom{m}{N}\!\!\right)^{-1} \sum_{r=1}^{\left\lfloor\frac{N}{k}\right\rfloor} (-1)^{r+1}\binom{m}{r}\left(\!\!\binom{m-r}{N-rk}\!\!\right)\qquad N<km\tag{1}\\ \text{and}\\ P(k,m,N)&=\left(\!\!\binom{m}{N}\!\!\right)^{-1}\sum_{r=1}^{m-1} (-1)^{r+1}\binom{m}{r}\left(\!\!\binom{m-r}{N-rk}\!\!\right)\\ &\qquad\qquad+(-1)^m\delta_{N,km}\left(\!\!\binom{m}{N}\!\!\right)^{-1}\quad\qquad\qquad\qquad N\geq km\tag{2}\\ \end{align*}

The number of all possible outcomes is the number of multisets \begin{align*} \left(\!\!\binom{m}{N}\!\!\right)=\binom{m+N-1}{m-1} \end{align*} to distribute $N$ indistinguishible balls on $m$ distinguishible bins. If we select one bin containing exactly $k$ balls, there are $m-1$ bins left which contain $m-N$ balls. Since we can select a bin in $\binom{m}{1}$ different ways, the number of possible outcomes is \begin{align*} \binom{m}{1}\left(\!\!\binom{m-1}{N-k}\!\!\right) \end{align*} The expression $\left(\!\!\binom{m-1}{N-k}\!\!\right)$ contains outcomes whereby some other bins may also contain precisely $k$ balls. In order to respect this surplus we have to apply the inclusion-exclusion principle as indicated by OP. The general term is \begin{align*} (-1)^{r+1}\binom{m}{r}\left(\!\!\binom{m-r}{N-rk}\!\!\right) =(-1)^{r+1}\binom{m}{r}\binom{m-r+N-rk-1}{m-r-1} \end{align*} with the sign $(-1)^{r+1}$ indicating the current surplus or shortage. The general term compensates for $r$ bins each containing precisely $k$ balls leaving $N-rk$ balls to distribute on the remaining $m-r$ bins.

This strategy is feasible up to $N=mk-1$ balls in which case the last term of (1) contains $m-1$ bins with precisely $k$ balls and one bin contains the other $N-k(m-1)$ balls.

If the number of balls increases, i.e. $N\geq mk$, the upper limit of the sum (2) is always $m-1$, since we always have to correct the surplus resp. shortage of up to $m-1$ bins.

In case of $N=km$ balls there is a special favorable outcome, namely all $m$ bins contain precisely $k$ balls. This is possible only in case $N=km$. For all greater values of $N$, i.e. $N>km$ no more than $m-1$ bins can contain precisely $k$ balls. This is realized by multiplying $(-1)^{m}$ with the Kronecker delta $$\delta_{N,km}=\begin{cases}1&\qquad N=km\\0&\qquad\text{otherwise}\end{cases}$$

Probabilities with $N=km$:

Calculation of the probability $P(k,m,N)$ for small $k,m$ and $N$ shows that besides the trivial case $P(k,1,N)$ the maximum was not reached at $N=km$.

In the following table we see for small $m,k$ and $N$ the probability $P(k,m,km)$ and another value of $N$ with higher probability.

\begin{array}{rrrlrrrlrrrl} k&m&N&P(k,m,N)\qquad&\qquad k&m&N&P\qquad&\qquad k&m&N&P\\ \hline\\ 2&3&3&0.6\qquad&\qquad3&3&5&0.42857\qquad&\qquad4&3&7&0.33333\\ 2&3&6&0.46429\qquad&\qquad3&3&9&0.34545\qquad&\qquad4&3&12&0.27473\\ \\ 2&4&3&0.6\qquad&\qquad3&4&5&0.42857\qquad&\qquad4&4&7&0.33333\\ 2&4&8&0.51515\qquad&\qquad3&4&12&0.39780\qquad&\qquad4&4&16&0.32301\\ \\ 2&5&7&0.60606\qquad&\qquad3&5&11&0.47253\qquad&\qquad4&5&15&0.38700\\ 2&5&10&0.59041\qquad&\qquad3&5&15&0.46207\qquad&\qquad4&5&20&0.37841\\ \end{array}

The last table entry with $k=4$ and $m=5$ gives

\begin{align*} P(4,5,15)&=\left(\!\!\binom{5}{15}\!\!\right)^{-1} \sum_{r=1}^{\left\lfloor\frac{15}{4}\right\rfloor} (-1)^{r+1}\binom{5}{r}\left(\!\!\binom{5-r}{15-4r}\!\!\right)\\ &=\binom{19}{4}^{-1}\sum_{r=1}^{3} (-1)^{r+1}\binom{5}{r}\binom{18-6r}{4-r}\\ &=\binom{19}{4}^{-1}\left(\binom{5}{1}\binom{14}{3}-\binom{5}{2}\binom{9}{2}+\binom{5}{3}\binom{4}{1}\right)\\ &=\frac{125}{323}=0.38700\\ P(4,5,20)&=\left(\!\!\binom{4}{20}\!\!\right)^{-1} \left(\sum_{r=1}^{5-1} (-1)^{r+1}\binom{5}{r}\left(\!\!\binom{5-r}{20-4r}\!\!\right)+(-1)^{m}\right)\\ &=\binom{23}{3}^{-1}\left(\sum_{r=1}^{3} (-1)^{r+1}\binom{5}{r}\binom{24-5r}{4-r}+1\right)\\ &=\binom{24}{4}^{-1}\left(\binom{5}{1}\binom{19}{3}-\binom{5}{2}\binom{14}{2}+\binom{5}{3}\binom{9}{1}-\binom{5}{4}\binom{4}{0}+1\right)\\ &=\frac{4021}{10626}=0.37841\\ \end{align*}

and we see $P(4,5,20)<P(4,5,15)$.

Markus Scheuer
  • 108,315
  • That shows the probability is not maximized over possible values of $N$ for fixed $m$ and $k$ at $N = km$. The question should have read that I wanted to show that for fixed $N$ and $m$ such that $N/m =r$ the in $k$ maximum is reached at $k=r$. I will clarify in the question. – Mark Fischler Aug 30 '16 at 21:22