21

I've seen many variations of this problem but I can't find a good, thorough explanation on how to solve it. I'm not just looking for a solution, but a step-by-step explanation on how to derive the solution.

So the problem at hand is:

You have m balls and n bins. Consider throwing each ball into a bin uniformly and at random.

  • What is the expected number of bins that are empty, in terms of m and n?
  • What is the expected number of bins that contain exactly 1 ball, in terms of m and n?

How would I approach solving this problem?

Thanks!

Hristo
  • 617

3 Answers3

32

You want to use what are called indicator variables. To see how this works, let's take your first problem. Let $Y$ be the number of bins that are empty. You want $E[Y]$. Now define the indicator variables $X_i$ so that $X_i$ is $1$ if bin $i$ is empty and $0$ otherwise. Then we have

$$Y = X_1 + X_2 + \cdots + X_n.$$

By linearity of expectation,

$$E[Y] = E[X_1] + E[X_2] + \cdots + E[X_n].$$

So now the problem reduces to calculating the $E[X_i]$ for each $i$. But this is fairly easy, as $$E[X_i] = 1 P(X_i = 1) + 0 P(X_i = 0) = P(X_i = 1) = P(\text{bin $i$ is empty}) = \left(1 - \frac{1}{n}\right)^m,$$ where the last equality is because balls $1, 2, \ldots, m$ must all go in a bin other than $i$, each with probability $1 - \frac{1}{n}$.

Therefore, $$E[Y] = n \left(1 - \frac{1}{n}\right)^m.$$

Your second problem can be worked in a similar fashion. Let $Y$ be the number of bins that have exactly $1$ ball. Let $X_i$ be $1$ if bin $i$ has exactly one ball and $0$ otherwise. Then all that's left is to figure out $E[X_i]$, which is the probability that a given bin has exactly $1$ ball, and go from there. Since this is homework, I'll let you finish.

Mike Spivey
  • 55,550
  • I didn't quite follow how you got the (1-1/n) term and why you raise it to power m? – Hristo Mar 25 '11 at 03:51
  • So I explained to myself the (1-1/n) term but I'm still confused on why its raised to power m... – Hristo Mar 25 '11 at 03:58
  • 7
    @Hristo: The probability that the first ball goes into bin $i$ is $1/n$, so the probability that the first ball goes into a bin other than $i$ is $1 - 1/n$. The probability that the second ball goes into a bin other than $i$ is also $1 - 1/n$. The same is true for the third, and so forth. In order for bin $i$ to be empty, ball $1$ and ball $2$ and ball $3$ and so forth must all be in bins other than $i$. That means you have to multiply $1-1/n$ by $1-1/n$ by $1-1/n$, etc., with a factor of $1-1/n$ for each of the $m$ balls. That's $(1-1/n)^m$. – Mike Spivey Mar 25 '11 at 03:59
  • Got it. That makes sense. Now why is that whole thing multiplied by n? Is it because this (1-1/n)^m term can happen for each bin? Since there are n bins, then each has (1-1/n)^m probability to be empty, so you get n*(1-1/n)^m? – Hristo Mar 25 '11 at 04:06
  • @Hristo: Yes. Go back to the $E[Y] = E[X_1] + E[X_2] + \cdots E[X_n]$ equation. Since, for each $i$, $E[X_i] = (1-1/n)^m$, we have $E[Y] = (1-1/n)^m + (1-1/n)^m + \cdots + (1-1/n)^m = n (1-1/n)^m$. – Mike Spivey Mar 25 '11 at 04:22
  • Awesome! That makes sense. Now to figure out the second problem... – Hristo Mar 25 '11 at 04:29
  • So I thought about it some more and seems like the answer for my second problem would be the same as the first, except for the power of m... it would have to be to the m-1 power right, since bin i would have exactly 1 ball? So... $$E[Y] = n \left(1 - \frac{1}{n}\right)^{m-1}$$ ... but that seems like its missing something since we can have any one of the m balls be the one in bin i... so should there also be an m multiplicative term in there? – Hristo Mar 25 '11 at 21:54
  • @MikeSpivey the idea of indicator variables in this context was the key part, superb answer... –  Aug 15 '17 at 07:35
  • @MikeSpivey How to approach if it's asked to calculate the variance? –  Aug 15 '17 at 09:23
6

Let's work out the probability there are $k$ balls (out of $m$) in the first bin (out of $n$). This is a simple binomial probability with $p=1/n$ and $1-p = (n-1)/n$ so the probability is ${m \choose k} \dfrac{(n-1)^{m-k}}{n^m}$.

The expected number of times the first bin has $k$ balls is the same, so the expected number of bins with $k$ balls is $n$ times this, i.e.

$${m \choose k} \dfrac{(n-1)^{m-k}}{n^{m-1}}$$

Henry
  • 157,058
  • can you somehow rewrite the equations? they didn't render well – Hristo Mar 25 '11 at 18:41
  • @Hristo: better? – Henry Mar 25 '11 at 18:46
  • Yup... much better! Thanks! Now I need to understand it :) – Hristo Mar 25 '11 at 21:28
  • @Henry... so I've been trying to understand your solution and I'm not quite sure how you're coming up with m-k as the exponent in the numerator and m-1 as the exponent in the denominator? Also, why do you do m choose k? – Hristo Mar 25 '11 at 21:56
  • You have a binomial distribution (with $m$ rather than $n$, and $p=1/n$ and $1-p = (n-1)/n$). $m-k$ is the exponent of $(1-p)$ since $m-k$ balls do not go in the bin. $m-1$ is the exponent of denominator because I multiplied the previous expression by $n$; $m$ had been the denominator exponent because I had $p^n (1-p)^{m-k}$ and $n$ was the denominator of both $p$ and $1-p$. – Henry Mar 25 '11 at 22:36
  • @Henry,

    I expected (n-1)/n to have the exponent m-k. What happened to the denominator n?

    –  Apr 08 '11 at 12:25
  • 1
    $Pr(K=k|m,n)$ $ = {m \choose k} \left(\dfrac{1}{n}\right)^k \left(\dfrac{n-1}{n}\right)^{m-k}$ $ = {m \choose k} \dfrac{1}{n^k} \cdot \dfrac{(n-1)^{m-k}}{n^{m-k}}$ $ = {m \choose k} \dfrac{(n-1)^{m-k}}{n^m}$ – Henry Apr 08 '11 at 16:50
0

To do the second part of the question, we want to 1) first find the number of ways we can fill a particular bin with exactly one ball:

step 1: Choose a particular bin out of n bins to place one of the m balls. There are $\binom{n}{1}$ ways to do that. Note that the balls are considered identical.

Step 2: After step 1 is completed, we don't want to touch that particular bin anymore. There are m-1 balls and n-1 bins left, and for each ball, there are n-1 choices to make in terms of which bin we want to place the ball, we multiply (n-1) by it self (m-1) times to get the number of ways to distribute m-1 balls into n-1 bins: there are $(n-1)^{m-1}$ ways to do so.

By product rule, there are $\binom{n}{1} * (n-1)^{m-1} = n * (n-1)^{m-1}$ ways to complete 1).

Now, divide this by the total number of ways to distribute m balls into n bins to get $p(X_i=1)$, the probability that any bin has exactly one ball. (here, $X_i=1$ if the bin has exactly one ball, $X_i = 0$ if the bin does not have exactly one ball). We get $p(X_i) = \frac{n * (n-1)^{m-1}}{n^m}$. And for any given bin, the expected 'proportion' that has exactly one ball is $E(X_i) = (\frac{n * (n-1)^{m-1}}{n^m})*1 + 0(\frac{n * (n-1)^{m-1}}{n^m})$.

Since we have n bins total, we multiply that by n to get $E(X) = \sum_{i=0}^n E(X_i) = n(\frac{n * (n-1)^{m-1}}{n^m})$.