2

I have been given a version of the coupon collector's problem for homework which includes a modification of the standard problem. Mainly, the modification is that there are n possible coupons, and I can get k (k<=n) coupons per box. Additionally, a coupon can be repeated in a box, so some of those k coupons can be the same coupon (for example: in a box with 10 coupons, you can get 6 different coupons while some of them can be found multiple times).

I need to find the probability distribution of this problem for the given Xn,k random variable which represents the number of boxes I need to buy to get all n coupons, and I can't use the traditional formula because of the modification. How could I implement this new rule?

Any help would be appreciated. Thank you in advance.

joriki
  • 238,052

2 Answers2

2

Henning's solution is correct; but since we know not only $P(X_{n,1}=x)$ for the standard problem, but also $P(X_{n,1}\le x)$ (see e.g. Probability distribution in the coupon collector's problem), we can also write this without a sum:

\begin{align} \def\stir#1#2{\left\{#1\atop#2\right\}} P(X_{n,k}\le a)&=P(X_{n,1}\le ak)\\ &=\frac{n!}{n^{ak}}\stir{ak}n\;, \end{align}

and thus

\begin{align} P(X_{n,k}=a)&=P(X_{n,k}\le a)-P(X_{n,k}\le a-1)\\ &=P(X_{n,1}\le ak)-P(X_{n,1}\le(a-1)k)\\ &=\frac{n!}{n^{ak}}\stir{ak}n-\frac{n!}{n^{(a-1)k}}\stir{(a-1)k}n\\ &=\frac{n!}{n^{ak}}\left(\stir{ak}n-n^k\stir{(a-1)k}n\right)\;. \end{align}

joriki
  • 238,052
  • Does the first identity hold for the case where there are no repeatead coupons in each group? If so, what's the logical argument behind it, or how do you prove such relation? – Guilherme takata Feb 28 '20 at 02:02
  • @Guilhermetakata: It does not; that was not the subject of this question. The case without repeated coupons is treated here. – joriki Feb 28 '20 at 07:20
0

If you know how to calculate the distribution for the standard problem $X_{n,1}$, then you can simply say $$ P(X_{n,k}=a) = \sum_{j=0}^{k-1} P(X_{n,1}=ak-j) $$

Things get more involved if you're asked for the expectation instead of the distribution -- because there's a shortcut to computing the expectation for $k=1$ that doesn't seem to generalize as nicely here.