1

There are $n$ bottomless slots, meaning they can each hold as many balls as needed. I throw balls in a way that each ball is equally likely to land in any of them. How many throws will I need to have at least one ball in all the slots?

Now, let's say the random variable describing the throws needed is $T_m$ when $m$ slots already have balls and $n-m$ are empty. It's easy to see that $T_m$ satisfies the following recurrence:

$$T_m = 1 + I\left(\frac{m}{n}\right)T_m+\left(1-I\left(\frac{m}{n}\right)\right)T_{m+1} \tag{1}$$

Here, $I(q)$ is a Bernoulli random variable.

This simplifies to:

$$T_m\left(1-I\left(\frac{m}{n}\right)\right)=1+T_{m+1}\left(1-I\left(\frac{m}{n}\right)\right) \tag{1}$$

We are interested in $T_0$ and it's not hard to see $E(T_0)=n\sum\frac{1}{k}$.

But I'm more interested in $T_{n-1}$. This is a geometric random variable with $p=\frac{1}{n}$. So, we get:

$$E(T_{n-1}) = \frac{1}{p} = n \tag{2}$$

Substitute $m=n-1$ in equation (1) and using the fact that $T_n=0$, we get: $$E(T_{n-1})=n$$

and this is the same as equation (2). So far, so good. The problem is when I try to find the variance of $T_{n-1}$. For this, I square equation (1). After some algebra and taking expectations, I get:

$$E(T_m^2) = \frac{n}{n-m}+E(T_{m+1}^2)+2E(T_{m+1})$$

Now substitute $m=n-1$ as before.

$$E(T_{n-1}^2) = n + E(T_n^2) + 2E(T_{n})$$

But we know that $T_n=0$. So this gives us:

$$E(T_{n-1}^2)=n$$

But this can't be right since $E(T_{n-1})=n$ as well and the two equations above would make the variance negative.

What am I missing here?

Rohit Pandey
  • 6,803
  • 2
    You can't fill a bottomless slot. Do you mean to have a ball in every slot? In which case this is the coupon collector's problem in disguise (a ball in a slot = a coupon collected - how long does it take to obtain a full set). There is a coupon collector tab which may help you to find what you need. – Mark Bennet Sep 27 '19 at 21:48
  • Yes, a ball in every slot. My question is very specific to $E(T_{n-1}^2)$. – Rohit Pandey Sep 27 '19 at 21:49

2 Answers2

2

For simplicity let $S=T_{n-1}$. The correct results are as follows, where $S'$ and $S$ are identically distributed.

$$S=1.\frac{1}{n}+(1+S')\frac{n-1}{n}$$

$$S^2=1.\frac{1}{n}+(1+S')^2\frac{n-1}{n}$$

These give $E(S)=n$ and $E(S^2)=2n^2-n$.

So, what has gone wrong in your derivation? In essence, you cannot square the equation in the way you have done. Just to pick up one problem, the $T_{n-1}$s on each side of the equation are not equal; they are different random variables with the same expectation. Therefore you can only manipulate them in the way you appear to have done once you have taken the expectation.

  • Thanks. The random variables being different is a great catch! But if I have an equation involving random variables on both sides, I can still square both sides, can't I? – Rohit Pandey Sep 27 '19 at 23:28
  • 1
    The simplest answer is to say no, don't do it! For example suppose X is Y with probability .5 and Z with probability 0.5. Then $X^2$ is $Y^2$ with probability .5 and $Z^2$ with probability 0.5. Think what these two equations look like and what would happen if you square the first. –  Sep 27 '19 at 23:47
  • However, you will come across different types of equations e.g $X=Y+Z$ meaning that whatever values Y and Z have, X is the sum. These you can square. –  Sep 27 '19 at 23:50
  • I'm hesitant about making things seem confused but the essential difference is that people write X= .5Y ... to mean half the value of Y and to mean that X is equal to Y half the time. –  Sep 27 '19 at 23:56
  • Right, no - this is very helpful. Appreciate these different examples. Need to be vigilant when playing with random numbers! – Rohit Pandey Sep 27 '19 at 23:59
  • 1
    Yes - being vigilant is key. –  Sep 28 '19 at 00:00
1

The answer by @S. Dolan was very helpful. But I think it might be possible to square equations involving random variables after all, provided we are careful which of terms are clones (i.i.d) and which are mirror images (literally the same thing). Take equation (1) from the question. We have:

$$T_{m}-1 = I\left(\frac{m}{n}\right)T_m'+\left(1-I\left(\frac{m}{n}\right)\right)T_{m+1}$$

Here, $T_m$ and $T_{m}'$ are clones (i.i.d random variables) as S Dolan pointed out. But, the two $I(\frac{m}{n})$ terms are mirror images, completely identical. This means that $I(\frac{m}{n})(1-I(\frac{m}{n}))=0$. Once we take note of this, all cross terms will cancel out. Also, squaring a Bernoulli results in the same Bernoulli. This leads us to the correct recurrence:

$$(T_{m}-1)^2 = I\left(\frac{m}{n}\right)T_m'^2+\left(1-I\left(\frac{m}{n}\right)\right)T_{m+1}^2$$

Rohit Pandey
  • 6,803