The probability of distributing K balls over N boxes of size M with at least Q boxes empty.

Question

I am struggling with a questions similar to one discussed here before but with a twist: Having identical m empty bins, each can contain up to R balls. k (k<=mR*) identical balls are randomly distributed among the bins. Bins that received R balls are considered full and are not taking more balls in the distribution process.
1. What is the probability that at least q bins are empty? 2. What is the probability that exactly q bins are empty?

OK, here is what I found at http://www.mathpages.com/home/kmath337/kmath337.htm

If this paper is correct, and N[m,R] (k) is the number of combinations to pack k balls in m bins with capacity R expressed as:

$N[m,R](k) = \sum\limits^{m}_{t=0}(-1)^{t}\binom{m}{t}\binom{m+k-t(R+1)-1}{m-1} => m \sum\limits^{m}_{t=0}(-1)^{t}\frac{(m+k-t(R+1)-1)!}{t! (m-t)! (k-t(R+1))!}$

So, if N[m-q,R] (k) is the number of combinations to pack the same amounts of balls in q less bins, the probability I am looking for should be $\frac{N[m-q,R] (k)}{ N[m,R] (k)}$

Anybody comment?

If this is correct, then I need to figure out how to calculate these in Excel with very large numbers....

Welcome to Math Stack Exchange, can you show us your work? You need to make an attempt at solving a question or at least explain how you think the question can be solved. — Agent 0, Jan 28 '18 at 01:49

G Cab · Answer 1 · 2018-03-31T10:19:26.287

If we are speaking of undistinguishable balls into distinguishable bins, as it is the common understanding from the wording of your problem, then yes that article and the formula you cite is fully correct.

However I suggest to rewrite the formula as $$ N_b (s,r,m)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s} {r}\, \leqslant \,m} \right)} {\left( { - 1} \right)^k \left( \begin{gathered} m \hfill \\ k \hfill \\ \end{gathered} \right)\left( \begin{gathered} s + m - 1 - k\left( {r + 1} \right) \\ s - k\left( {r + 1} \right) \\ \end{gathered} \right)} $$ with

$$N_{\,b} (s,r,m) = \text{No}\text{. of solutions to}\;\left\{ \begin{gathered} 0 \leqslant \text{integer }x_{\,j} \leqslant r \hfill \\ x_{\,1} + x_{\,2} + \cdots + x_{\,m} = s \hfill \\ \end{gathered} \right.$$ or No. of ways to distribute $s$ undist. balls into $m$ dist. bins, each of capacity max $r$ balls.

The advantages of this formulation are extensively described in this related post and in this other one.

Coming now to the number of ways in which at least $q$ bins remain empty, your approach is not correct.
In fact (allow me to keep the symbols above) that number is not given by $ N_{\,b} (s,r,m-q)$, because you shall multiply by the number of ways to insert the $q$ empty bins and this in turn depends on how many empty bins are already accounted by $ N_{\,b} (s,r,m-q)$.

Taking the case of exactly $q$ empty bins, the No. of ways to obtain them will be
No. of ways to choose $q$ bins out of $m\quad \quad \quad \times$
No. of ways to put the $s$ balls into the remaining $m-q$ bins, each filled with at least one ball
as in this formulation the two type of bins (empty/non-empty) are separated.

Since $$ {\rm No}{\rm .}\,{\rm of}\,{\rm sol}{\rm .}\,{\rm to}\;\left\{ \matrix{ {\rm 1} \le {\rm integer}\;x_{\,j} \le r \hfill \cr x_{\,1} + x_{\,2} + \; \cdots \; + x_{\,m} = s \hfill \cr} \right.\quad = \quad {\rm No}{\rm .}\,{\rm of}\,{\rm sol}{\rm .}\,{\rm to}\;\left\{ \matrix{ 0 \le {\rm integer}\;\left( {x_{\,j} - 1} \right) \le r - 1 \hfill \cr \left( {x_{\,1} - 1} \right) + \left( {x_{\,2} - 1} \right) + \; \cdots \; + \left( {x_{\,m} - 1} \right) = s - m \hfill \cr} \right. $$ then clearly it is $$ N_{be} (s,r,m,q) = \left( \matrix{ m \cr q \cr} \right)N_b (s - m + q,r - 1,m - q)\quad \left| \matrix{ \;s < 0\; \vee \;r < 0\; \vee \;m < 0\;\; \Rightarrow \;N_b (s,r,m) = 0 \hfill \cr \;1 \le s \hfill \cr} \right.\quad $$ where the limit cases ($m=0$ etc.) shall be treated properly.

The "sprout" example that you cite in your comment perfectly fits with this model, if sprouting of one seed in one compartment is independent from the sprouting of the other seeds in the same or in other compartment, apart from having then a total of $k$.
Just allow to replace your $k$ with $s$, and we can apply the formula above $$ \eqalign{ & N_{be} (s,r,m,q) = \left( \matrix{ m \cr q \cr} \right)N_b (s - m + q,r - 1,m - q) = \cr & = \left( \matrix{ m \cr q \cr} \right)\sum\limits_{\left( {0\, \le } \right)\,\,k\,\,\left( { \le \,{s \over r}\, \le \,m} \right)} {\left( { - 1} \right)^k \left( \matrix{ m - q \cr k \cr} \right)\left( \matrix{ s - 1 - kr \cr s - m + q - kr \cr} \right)} \cr} $$ To make a small numerical example, with $s=3,\; m=3,\; r=2$ $$ \eqalign{ & N_{be} (3,2,3,q) = \left( \matrix{ 3 \cr q \cr} \right)\sum\limits_{\left( {0\, \le } \right)\,\,k\,\,\left( { \le \,1} \right)} {\left( { - 1} \right)^k \left( \matrix{ 3 - q \cr k \cr} \right)\left( \matrix{ 2 - 2k \cr 0 + q - 2k \cr} \right)} = \cr & = \left( \matrix{ 3 \cr q \cr} \right)\left( {\left( \matrix{ 3 - q \cr 0 \cr} \right)\left( \matrix{ 2 \cr q \cr} \right) - \left( \matrix{ 3 - q \cr 1 \cr} \right)\left( \matrix{ 0 \cr q - 2 \cr} \right)} \right) = \cr & = \left( \matrix{ 3 \cr q \cr} \right)\left( {\left( \matrix{ 2 \cr q \cr} \right) - \left[ {q = 2} \right]} \right) = \underbrace {1,6,0,0}_{q\, = \,0 \cdots 3}\quad \Rightarrow \cr & \Rightarrow \quad \left\{ \matrix{ \underbrace {(1,1,1)}_{{\rm 3}\,{\rm boxes}\,{\rm content}}\quad q = 0 = {\rm no}\,{\rm empty} \hfill \cr {\rm permut}{\rm .}\,{\rm of}\;(0,1,2) = 6\quad q = 1 = {\rm one}\,{\rm empty} \hfill \cr \emptyset \quad q = 2,3 \hfill \cr} \right. \cr} $$

------- notes for computation -------

a) you need first of all a computer program which include the binomial defined as in this wiki article, i.e. $$ \left( \matrix{ x \cr m \cr} \right) = \left\{ {\matrix{ {{{x^{\,\underline {\,m\,} } } \over {m!}} = {1 \over {m!}}\prod\limits_{0\, \le \,k\, \le \,m - 1} {\left( {x - k} \right)} } & {0 \le m \in Z} \cr 0 & {m < 0\; \vee \;m \notin Z} \cr } } \right. $$ because, either you have to adjust the bounds of the sum in a complicated way prone to errors, or you get an error for negative factorial.
If you don't have, it is not difficult to implement a "user defined function" accordingly.
The proposed formulations take advantage of the fact that, in the definition above, the binomial is null for lower term $<0$.

b) Take the formula above, for the number of ways to get exactly $q$ empty bins $$ N_{be} (s,r,m,q) = \left( \matrix{ m \cr q \cr} \right)\sum\limits_{0\, \le \,\,k\,\, \le \,m} {\left( { - 1} \right)^k \left( \matrix{ m - q \cr k \cr} \right)\left( \matrix{ s - 1 - kr \cr s - m + q - kr \cr} \right)} $$ for computation purposes, you can fix the summation bounds to be simply $0 \le k \le m$.
The number of ways to get at least $q$ empty bins will clearly be $$ \sum\limits_{q\, \le \,\,j\,\, \le \,m} {N_{be} (s,r,m,j)} $$

example

For the very few lowest values of the parameters the attached are the values that we get applying the formula for $N_{be}$ given above.
You can verify that they are correct.

You understand the problem correctly and I understand the need to add the selection of q of m however I am not clear on the rest. I will explain in the answer below. — Eyalbc, Feb 08 '18 at 23:39
I will explore the representation of N[m,R] (k) as you suggested.
Adding the selection of choosing q from m is also understood, however, if I follow with calculating the Num. of combination to fit k indistinguishable balls into m-q dist. bins with capacity R out of m bins as $\binom{m}{q} N[m-q,R] (k)$ I get values that are larger than $N[m,R] (k)$ which I fail to understand. I definately do something wrong... — Eyalbc, Feb 09 '18 at 00:06
@Eyalbc: read properly what I wrote in my answer "No. of ways .... at least one ball" — G Cab, Feb 09 '18 at 09:29
@Eyalbc: I completed my answer so that it might be useful to other interested in this topic. — G Cab, Feb 09 '18 at 18:52
Hi G Cab, Thank you, it helped me think.... I still struggle with it and you seems more experienced than I am. Here is the basic question that I am trying to solve: I have $m$ bins with $r$ slots. In each slot let say a seed is placed which may or may not sprout. If we know that only $k$ seeds out of the $m*r$ planted in the slots will sprout, what is the probability that $q$ of the bins will have no sprouted seed at all. Thank you in advance, Eyal — Eyalbc, Feb 21 '18 at 16:32
Thank you again. If I am looking for the probability of having at least $q$ empties, I expect it to be $P(s,r,m,q) = \frac { Nbe(s,r,m,q) } {Nb(s,r,m)} $. — Eyalbc, Feb 22 '18 at 19:36
@Eyalbc: yes, right, but the $N_{be}$ I wrote above is for exactly $q$ bins empty. — G Cab, Feb 22 '18 at 23:46
Hi G. Cab. I read through the two links you directed to. I think it confused me more than clarify why the formula in that state is better. I need to grammatically calculate the figures and I am not sure how to do it in this representation. Also, how I extend it to be at least $q$ empties? Thank you for all your help, I am more rusty than I thought.... — Eyalbc, Mar 02 '18 at 20:21
a) the links are to understand how the $Nb$ formula is derived, could you get it ? however it's well tested and you can rely on that. b) you know that $\binom {n}{m}=\binom{n}{n-m}$: it happens that writing the 2nd binomial in $Nb$ in that way the bounds are greatly simplified; computationally just put $0 \le k \le m$. c) at least $q$ = exactly$q$+exactly$q+1$+... d) I will add in the answer something concerning practical computation. — G Cab, Mar 02 '18 at 22:20
Hi G. Cab. implementing the functions I find a few items that may need to be corrected: The first is the limit on $$ N_b (s,r,m)\quad \left| {;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0, \leqslant } \right),,k,,\left( { \leqslant ,\frac{s} {r}, \leqslant ,m} \right)} {\left( { - 1} \right)^k \left( \begin{gathered} m \hfill \ k \hfill \ \end{gathered} \right)\left( \begin{gathered} s + m - 1 - k\left( {r + 1} \right) \ s - k\left( {r + 1} \right) \ \end{gathered} \right)} $$ I believe that $$k\leqslant \frac{s}{(r+1)}$$ — Eyalbc, Mar 08 '18 at 01:50
also, in $Nbe$ the $s - m + q - kr$ is negative for simple examples such as $m>s+q$ — Eyalbc, Mar 08 '18 at 01:55
@Eyalbc: pls. see note a) in my answer, and of course also the b) — G Cab, Mar 08 '18 at 02:01
Hi G Cab. First, my name is Eyal. Through implementing the solution I found that my numbers don't align. I assume that $Nbe(s,r,m,q)$ for $q=0$ $ = Nb(s,r,m)$ and it is not.
In case that $q=0$ this formula is not workig: $$ N_{be} (s,r,m,q) = \left( \matrix{ m \cr q \cr} \right)N_b (s - m + q,r - 1,m - q)\quad \left| \matrix{ ;s < 0; \vee ;r < 0; \vee ;m < 0;; \Rightarrow ;N_b (s,r,m) = 0 \hfill \cr ;1 \le s \hfill \cr} \right.\quad $$ For $q=0$ the formula $$N_{be} (s,r,m,q) = \binom{m}{q} * N_b (s+q,r,m-q)$$ does work. Thoughts? — Eyalbc, Mar 30 '18 at 19:33
@Eyalbc: hi Eyal, my name is Giuliano. Of course your numbers do not align: for $q=0$, $N_{be}$ can't be equal to $N_b$. $N_{be}$ is the number of ways to have exactly, repeat exactly , $q$ bins empty, and not at least $q$. To get the last you have to sum $N_{be}(q)+N_{be}(q+1)+\cdots$. I added some computed values for you to check with your computations. — G Cab, Mar 31 '18 at 10:42

The probability of distributing K balls over N boxes of size M with at least Q boxes empty.

1 Answers1

Linked