31

I'm trying to solve the well known Coupon Collector's Problem by explicitly finding the probability distribution (so far all the methods I read involve using some sort of trick). However, I'm not having much luck getting anywhere as combinatorics is not something I'm particularly good at.

The Coupon Collector's Problem is stated as:

There are $m$ different kinds of coupons to be collected from boxes. Assuming each type of coupon is equally likely to be found per box, what's the expected amount of boxes one has to buy to collect all types of coupons?

What I'm attempting:

Let $N$ be the random variable associated with the number of boxes one has to buy to find all coupons. Then $P(N=n)=\frac{|A_n|}{|\Omega _n|}$, where $A_n$ is the set of all outcomes such that all types of coupons are observed in $n$ buys, and $\Omega _n$ is the set of all the possible outcomes in $n$ buys. I think $|\Omega _n| = m^n$, but I'm not even sure about that anymore, as all my attempts so far led to garbage probabilities that either diverged or didn't sum up to 1.

joriki
  • 238,052
Spine Feast
  • 4,770
  • 1
    This is why the tricks. An expression for the probabilities can be found, but it is unattractive, and very difficult to work with for a calculation of the mean. – André Nicolas May 02 '13 at 18:51
  • Still, I'm interested in finding it, even if it's a pointless excercise in combinatorics. Does it involve the inclusion-exclusion principle perhaps? – Spine Feast May 02 '13 at 19:16
  • Yes, Inclusion/Exclusion can be used. – André Nicolas May 02 '13 at 19:21
  • See also (unless you consider GF a trick) http://en.wikipedia.org/wiki/Coupon_collector%27s_problem_(generating_function_approach) – leonbloy May 02 '13 at 19:25
  • I haven't read about generating functions in depth yet (I'm learning probability independently), but from what I understand, they are defined in terms of a probability distribution, $G(z) = \sum_{n} P(N=n) z^n$. I've found this wiki page earlier, but I don't understand where that stuff came from. – Spine Feast May 02 '13 at 19:31
  • After wrestling with this problem for quite some time again, I'm no closer to finding an answer. How would I go about using the inclusion-exclusion formula here? And is the $\Omega_n$ calculated correctly in my first post? – Spine Feast May 02 '13 at 21:25
  • Here's what I got so far. Fix $n$. Let $A$ denote the desired event and $\Omega$ be the space of all outcomes. Then $|\Omega| = m^n$. Let $A_i$ be the event that the $i$-the coupon type appears at least once, then $A_i '$ is the event that the $i$-the coupon type didn't appear at all. Then $A = \bigcap_{i=1}^{m}A_i = \Omega \setminus \bigcup_{i=1}^{m} A_i '$. By the inclusion exclusion principle, $\left| \bigcup_{i=1}^{m} A_i ' \right| = \sum_{k=1}^{m} (-1)^{k+1} \left( \sum_{1 \le i_1 < ... < i_k \le m} |A_{i_1}' \cap ... \cap A_{i_k}' | \right)$. The inner sum, for fixed $k$ – Spine Feast May 03 '13 at 15:22
  • (cont'd) consists of ${m \choose k}$ equal terms, they are equal to $|A_{i_1} ' \cap... \cap A_{i_k} '| = (m-k)^n$. Therefore, $\left| \bigcup_{i=1}^{m} A_i ' \right| = \sum_{k=1}^{m} (-1)^{k+1} \left( {m \choose k} (m-k)^n \right)$ and so $|A| = |\Omega| - \sum_{k=1}^{m} {m \choose k} (-1)^{k+1} (m-k)^n $, leading to $P(N=n)= \frac{|A|}{|\Omega|} = 1-\sum_{k=1}^{m} {m \choose k} (-1)^{k+1} \left( 1- \frac{k}{m}\right) ^n$. Is any of this even remotely close to the truth? – Spine Feast May 03 '13 at 15:24
  • 1
    This question has been answered later ("subsequent duplicates"?) for example at http://math.stackexchange.com/questions/693222/combinatorics-of-the-coupon-collectors-problem and http://math.stackexchange.com/questions/963077/cdf-of-probablity-distribution-with-replacement giving the answer $P(N=n)=\dfrac{m!}{m^n}S_2(n-1,m-1)$ where $S_2$ represents Stirling numbers of the second kind. – Henry Oct 08 '14 at 17:27
  • And at http://math.stackexchange.com/questions/669685. – joriki Jun 11 '16 at 03:10

1 Answers1

41

As Henry pointed out in a comment, the probability has been determined elsewhere as

$$ \def\stir#1#2{\left\{#1\atop#2\right\}} P(N=n)=\frac{m!}{m^n}\stir{n-1}{m-1}\;, $$

where

$$\stir nk=\frac1{k!}\sum_{j=0}^k(-1)^{k-j}\binom kjj^n$$

is a Stirling number of the second kind and counts the number of partitions of a set of $n$ labeled objects into $k$ non-empty unlabeled subsets.

To get the expected value, it's slightly more convenient to work with the probability

$$ P(N\gt n)=1-\frac{m!}{m^n}\stir nm\;, $$

which can be derived in much the same manner: There are $m^n$ sequences of length $n$; choose one of $\stir nm$ partitions into $m$ non-empty subsets and one of $m!$ assignments of the coupons types to the subsets.

Then

\begin{align} E[N]={}&\sum_{n=0}^\infty P(N\gt n)\\ ={}&\sum_{n=0}^\infty\left(1-\frac{m!}{m^n}\stir nm\right)\\ ={}&\sum_{n=0}^\infty\left(1-\frac{m!}{m^n}\frac1{m!}\sum_{j=0}^m(-1)^{m-j}\binom mjj^n\right)\\ ={}&\sum_{n=0}^\infty\frac1{m^n}\sum_{j=0}^{m-1}(-1)^{m-j+1}\binom mjj^n\\ ={}&\sum_{j=0}^{m-1}(-1)^{m-j+1}\binom mj\sum_{n=0}^\infty\frac{j^n}{m^n}\\ ={}&\sum_{j=1}^m(-1)^{j+1}\binom mj\sum_{n=0}^\infty\frac{(m-j)^n}{m^n}\\ ={}&\sum_{j=1}^m(-1)^{j+1}\binom mj\frac mj\\ ={}&-m\sum_{j=1}^m\int_0^{-1}\mathrm dq'\binom mjq'^{j-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\sum_{j=1}^m\binom mjq'^{j-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\frac{(1+q')^m-1}{q'}\\ ={}&-m\int_0^{-1}\mathrm dq'\frac{(1+q')^m-1}{(1+q')-1}\\ ={}&-m\int_0^{-1}\mathrm dq'\sum_{j=0}^{m-1}(-q')^j\\ ={}&-m\sum_{j=0}^{m-1}\int_0^{-1}\mathrm dq'(-q')^j\\ ={}&m\sum_{j=1}^m\frac1j\;. \end{align}

I'll leave it to you to decide whether this counts as "using some sort of trick". :-)

joriki
  • 238,052
  • 17
    Thanks, this answer was worth the 2,5 year wait :) – Spine Feast Sep 28 '15 at 18:20
  • 1
    This is proving useful to me now several years later! I am not familiar with the technique used when introducing the integral here? Especially clueless as to how you removed the summation on line 10, any suggestions on what to Google or look for when trying to understand this step? – Zack Ashman May 16 '18 at 13:31
  • 1
    @ZackAshman: Glad to read it's proving useful :-) Sorry, I haven't been on the site for a very long time and only now saw your question. The step where the integral is introduced uses the elementary definite integral $\int_0^{-1}\mathrm dq'q'^{j-1}=\left[\frac{q'^j}j\right]_0^{-1}=\frac{(-1)^j}j$. The summation is removed in line $10$ using the binomial expansion of $\left(1+q'\right)^m$ after multiplying by $q'$ and adding $1$ for the $j=0$ term. – joriki May 30 '18 at 21:38