21

I heard an interesting question recently:

What is the minimum number of people required to make it more likely than not that all 365 possible birthdays are covered?

Monte Carlo simulation suggests 2287 ($\pm 1$, I think). More generally, with $p$ people, what is the probability that for each of the 365 days of the year, there is at least one person in the group with that birthday? (Yes, ignoring the leap-day.)

ManRow
  • 447
Isaac
  • 36,557

3 Answers3

27

As Ross also states, this can be framed as a coupon collector problem where birthdays correspond to coupons and individuals correspond to trials. Then, good bounds and a strong asymptotic result are both known for the time (number of trials) needed to collect at least one coupon of each unique type.

Let $T$ be the number of trials at which all coupons have been collected. Let there be $n$ coupons (so $n = 365$ in our case). For each $n$, and any $c > 0$,

$$ \mathbb{P}(T > n \log n + c n) < e^{-c} $$

and, also,

$$ \mathbb{P}(T < n \log n - c n) < e^{-c} \> . $$

From the first inequality we get that no more than 2407 trials are needed so that there is greater than 50% chance all coupons are collected, and from the second we know that at least 1900 are needed.

For more details, see the recent question and answer here.


The following classical asymptotic result holds for $c \in \mathbb{R}$,

$$ \mathbb{P}(T > n \log n + c n ) \to 1 - e^{-e^{-c}} . $$

See, e.g., Motwani and Raghavan, Randomized Algorithms, pp. 60--63 for a proof.

Note that if you plug in $c = -\log \log 2$ here, one gets 2287.240 which matches very closely to Byron's exact answer and the Monte Carlo estimate reported in the question.

cardinal
  • 7,420
  • 5
    I would be interested to know about the reason for the downvote. I don't post answers very often, preferring to comment where possible, instead. Hence, I try to take great care when answering---even continually revisiting what I've written in an effort to improve. Suggestions for improvement are welcome. – cardinal Mar 14 '11 at 12:19
  • 1
    I dunno who or why about the downvote, but I've upvoted your answer to acknowledge its merit. – Mike Jones Aug 25 '11 at 20:02
27

For the coupon collector's problem with $n$ objects, let $T$ be the number of trials needed to get a complete set. Then we have the formula $$P(T\leq k)=n^{-k}\ n!\ \left\lbrace {k\atop n}\right\rbrace.$$ Here the braces indicate Stirling numbers of the second kind.

With $n=365$, Maple gives me $P(T\leq 2286)=.4994$ while $P(T\leq 2287)=.5003$, so that $2287$ is the smallest number to give a 50% chance to get all 365 birthdays.

  • 3
    Thanks, but I admit that approximations like those you provide are often more useful. –  Mar 13 '11 at 20:40
  • 3
    It's always good to see both. It can provide intuition in both directions. – cardinal Mar 13 '11 at 20:42
  • 1
    Curiously, I produced the same formula for a recent similar question on the probability that with $m$ balls and $n$ boxes each box had a ball but a different formula producing different results was much more popular. – Henry Mar 13 '11 at 21:34
  • 1
    @Henry That is a nice answer! –  Mar 13 '11 at 21:52
  • 6
    @Henry Just in case, I mention you should not worry about the relative popularities of answers on MSE and the like. To ponder on the ways votes accumulate on some answers and not on others is simply a waste of time, believe me. – Did Mar 14 '11 at 06:50
  • @Didier Piau: That is wise advice. It is just that on that problem it seems obvious to me that the most popular answer is clearly wrong (it suggests that with two balls in two boxes, the probability of one ball in each box is 1/3). But you are probably sensible in suggesting I should not worry about it. – Henry Mar 14 '11 at 07:26
  • @Henry Yes, obviously wrong answers that are the most popular ones... These things just happen. One should simply move on, every second wasted on this could be spent on doing or learning new exciting mathematics. Amen. :-) – Did Mar 14 '11 at 07:52
  • Curiously I produced the same formula in answer to a recent similar question here – Henry Mar 14 '11 at 23:10
  • @Henry See, you are not alone... http://meta.math.stackexchange.com/questions/1795 – Did Mar 16 '11 at 13:07
7

This is the Coupon collector problem The expected value to cover them all (not quite what you asked) is $365 \sum_{i=1}^{365}\frac{1}{i}\approx 365n \ln 365 + 365\gamma + \frac{1}{2}$

Ross Millikan
  • 374,822
  • +1 Ahh, right, of course it's the coupon collector problem. $365\sum_{i=1}^{365}\frac{1}{i}\approx 2364.65$, so I'm pretty sure the expected number of people is not the same as the number of people to have a better-than-50% chance, though. – Isaac Mar 13 '11 at 20:23
  • 2
    @Isaac: I would think the expected number is higher than the number to have 50% chance as the tail can go on a long way. The numerics quoted support this. – Ross Millikan Mar 14 '11 at 01:05