2

Suppose I have $m$ bins, each containing $k$ balls. I randomly remove balls (picking uniformly random balls, not bins) until one of the bins is empty. How many balls do I have to remove on average before this happens?

This feels like it should be an easy question to answer, but it's been a while since I've done any combinatorics, and I'm struggling to find an elegant solution. I've found one way to do it, but the calculation involved is incredibly messy.

At the $n$th step, we will have $mk-n$ balls partitioned into $m$ bins. Let $N(x,b,r)$ be the number of ways of partitioning $x$ balls into $b$ bins, such that no bin contains more than $r$ balls. This question provides an expression for $N$ in terms of a sum (which, I assume, is the simplest expression we can reasonably expect). It's fairly easy to see that $N(x-b,b,r-1)$ is the number of partitions with no empty bins (add one ball to each bin, which leaves $x-b$ balls to allocate).

So the probability of having an empty bin if you remove $n$ balls is $$ \begin{align*} P(n)&=\frac{N(mk-n,m,k)-N(m(k-1)-n,m,k-1)}{N(mk-n,m,k)} \end{align*} $$

and the expected value is given by $$\sum_{n=1}^{mk}nP(n)\left(1-P(n-1)\right)$$

So in principle, I've found the answer. In practice, this answer is useless. Is there a better way to solve this problem?

hasnohat
  • 3,882
  • 1
  • 18
  • 26
  • Saying "randomly remove balls" is not sufficiently precise. At the $n^{\text{th}}$ step, how do you determine which ball is removed? Do you pick a uniformly random bin? (This makes balls in a full bin less likely to be drawn.) Or pick a uniformly random ball? (This makes bins with many balls more likely to be chosen.) Or do something else? (Your answer doesn't match either of these possibilities.) – Misha Lavrov Nov 10 '17 at 03:15
  • @MishaLavrov I pick uniformly random balls, not bins. – hasnohat Nov 10 '17 at 03:46
  • I would not expect to have a simpler answer. This problem is incredibly complicated, from a combinatorial point of view, as the probability of taking a ball out of bin $i$at step $j$ depends on precisely which balls have been chosen in the past. – Stella Biderman Nov 10 '17 at 04:07
  • 1
    This appears to have appeared at the following MSE link. – Marko Riedel Nov 10 '17 at 23:15

2 Answers2

0

Here is an approximate approach, which should be pretty good when $m,k$ are reasonably large. First calculate the chance that the first bin is empty after you have removed $n$ balls. There are $mk$ balls total, so the number of ways to choose $n$ balls that leave bin $1$ empty is ${(m-1)k \choose n-k}$. The chance bin $1$ is empty is then $\frac {{(m-1)k \choose n-k}}{mk \choose n}$. If there are lots of bins and balls the impact of one bin being empty on the distribution of probabilities in the other bins is small. We are interested in cases where the chance bin $1$ is empty has rather low probability, so the chance of two bins being empty is even lower. The chance some bin is empty is then $\frac {m{(m-1)k \choose n-k}}{mk \choose n}$ The Alpha plot below shows this for $m=50,k=10$. It hits $0.5$ at about $320$ balls, where you will have removed $6.4$ of the $10$ balls in each bin on average.

enter image description here

Ross Millikan
  • 374,822
0

Your problem is clearly equivalent to that of
randomly throwing $n$ balls into $m$ bins (of max capacity $k$), till one bin reaches $k$ balls.

Now the number of configurations balls/bins, one bin with $k$ balls and the other $m-1$ with max $k-1$ balls and $n$ balls in total, can be computed by the sum in the reference you posted, or better as given in this other related post, through $$ m\,N_b (n - k,k - 1,m-1) $$

These are the number of the weak compositions of $n$, into $m$ parts, with only one part being $k$ and the others being less.

However, the process of "randomly picking one ball .." or "randomly throw one ball ..." has a probability space that is different from that of "making composition of.."
as explained in the answer to this post.
In this other acception, you may refer to this additional post.

That premised, you shall decide to which probability space you are referring to.

G Cab
  • 35,272