1

I've been struggling with this for a while:

Info: I have a bot on Discord that posts a random question every 4.5 minutes (on average). It posts the question from a database I prepared, which has 150 unique questions. Questions can repeat.

I want to find the probability that all the questions were posted atleast once, in $150 + n$ posts (i.e in $675 + 4.5n$ minutes). I know the probability as:

$$p = \frac{\text{possibilities of getting all questions atleast once}}{\text{all possibilities}}$$

For the denominator, it's simply $150^{150+n}$.

For the numerator, for some 150 questions it posts, we have $150!$ ways of getting all questions atleast once, and for the remaining $n$ questions, we would have $150^n$ ways, giving us $150! \times 150^n$.

Piecing this together, I should get:

$$p = \frac{150! \times 150^n}{150^{150+n}}$$

$$p = \frac{150! \times 150^n}{150^{150} \times 150^n}$$

$$p = \frac{150!}{150^{150}}$$

This makes no sense. It says that the probability is independent of $n$, which is completely absurd, since with increasing $n$, the probability that you'd see all the challenges should increase!

What have I done wrong?

joriki
  • 238,052
  • Look for the search term "Coupon Collector Problem." – JMoravitz Dec 14 '19 at 15:59
  • As for what you did wrong, you did not account for which times were used in the 150 guaranteed positions for all 150 questions to have been seen. You effectively counted the probability that very specifically the first 150 questions had every question appearing. Correcting this is not as easy as just multiplying the numerator by $\binom{150+n}{150}$ however, as this introduces a large amount of double-counting. You might have better luck with stirling numbers of the second kind. – JMoravitz Dec 14 '19 at 16:01
  • @JMoravitz Ahh that makes sense. I dont know what Stirling numbers are, but can I find my answer if I check the number of ways I can swap the guaranteed and unguaranteed ones [$(n+1)!$ ways]? – Pritt Balagopal Dec 14 '19 at 16:04
  • Consider a smaller problem with only three questions and $n=6$. Try to figure out the sequence of steps and answers to steps that lead to counting the sequence $abcabcabc$. This could have been $\color{red}{abc}abcabc$. This could also have been $\color{red}{a}bca\color{red}{b}cab\color{red}{c}$, or it could be many others. We only wanted to count this once, not $27$ times. Compare this now to trying to count the sequence $aaaaaaabc$. We wanted to count this once, not $7$ times. – JMoravitz Dec 14 '19 at 16:08
  • Stirling numbers of the second kind (not to be confused with stirling numbers of the first kind) count the number of ways that you can partition an $n$ element set of distinct elements into $k$ unlabeled nonempty subsets. – JMoravitz Dec 14 '19 at 16:09
  • @JMoravitz Hmm, so basically I would have to find the number of ways I can rearrange the 150, 149, 148, ... (in the factorial) and the trailing 150, 150, 150 (n times). So, it's basically the question of arranging 150+n objects in which 150 are identical, am I right? – Pritt Balagopal Dec 14 '19 at 16:12
  • No. I encourage you to visit the links in joriki's answer. – JMoravitz Dec 14 '19 at 16:12

1 Answers1

2

This is the coupon collector's problem. The probability distribution you're looking for is given at Probability distribution in the coupon collector's problem.

Your count is wrong because you're undercounting by not taking into account that different sets of $150$ posts can contain all questions (there are $\binom{150+n}{150}$ such sets).

joriki
  • 238,052