0

This is an instance of the "coupon collector's problem" in which not all the "coupons" have the same probability of appearance. Here the "coupons" are boxes of cereal, each having a prize in it. I have been struggling with it for some time.

Suppose there are 7 items (e.g., Snow White's 7 dwarfs). One per box of cereal. Except for Dopey, they are distributed evenly over the boxes of cereal. But the number of Dopeys is 1/2 the number of each of the others. The question is: What is the expected value of the number boxes of cereal that should be bought to get a complete set of dwarfs? As an example, if they were all evenly distributed, I get that you would need to buy 1+7/6+7/5+7/4+7/3+7/2+7/1 = 18.15 boxes.

RobPratt
  • 45,619
  • 1
    It may help to work backwards. For example, suppose you already have 5 items, and you have two left. What's the expected number of boxes needed to get the remaining two? There's some casework to be done (is one of the remaining two Dopey or not?). Then work backward to situations where there are 3 items left, etc. – angryavian May 28 '21 at 01:07
  • The comment of @angryavian seems like better analysis than what I am about to offer. I am thinking intuitively, with no math to back me up. Therefore, this idea could easily be wrong. Anyway, it seems as if you have to collect (on average) 2 of each of the other coupons for each Dopey coupon. This suggests (perhaps wrongly) that the problem is equivalent to seeking $(15)$ coupons. Perhaps the most productive idea is to pretend [similar to angryavian's idea] that there are only $(2)$ coupons, one of which is Dopey's. Then, pretend that there are only 3 coupons, and so forth. – user2661923 May 28 '21 at 01:54
  • I think I see where angryavian is going. Must do all the cases. Say last piece to get is a Dopey, with a 1/14 probability, then 14/1 = 14 boxes have to be bought to get him. If a Dopey occurs at first, then the chance of getting the next new piece would be 1- 1/14, or 13/14, so that 14/13 boxes would need to be bought. Next one would be 13/14 - 1/7, or 11/14, or 14/11 boxes. And so forth. Then you look at the case where Dopey occurs on the 2nd draw. Etc. One of these cases will yield a largest number of boxes that has to be bought - that would be the answer we need? – Ken Bannister May 28 '21 at 12:32
  • And thank you user2661923 for your suggestion also. Excellent idea - work from the particular to the general. – Ken Bannister May 28 '21 at 12:35
  • Help! Still struggling with this. Can anyone help me solve this problem? – Ken Bannister May 29 '21 at 11:20

1 Answers1

1

Here is a solution based on exponential probability generating functions. Readers interested in learning about generating functions can find many resources here: How can I learn about generating functions?

Number the "coupon" types from $1$ to $7$, with Dopey as type number $1$, and associated probabilities $p_1, p_2, p_3, \dots , p_7$. From the problem statement we find $p_1 = 1/13$ and $p_i = 2/13$ for $2 \le i \le 7$. Let's say $T$ is the number of the draw on which we first have a complete set of coupons.

If $T \le n$ then we have a compete set of coupons on draw $n$ or earlier. The exponential generating function of $P(T \le n)$ is $$f(x) = \prod_{i=1}^7 (e^{p_i x} - 1) = (e^{p_1 x} -1) (e^{p_2 x} -1)^6$$ since $p_i = p_2$ for $i \ge 2$. Since $P(T > n) = 1 - P(t \le n)$, the EGF of $P(T > n)$ is $e^x - f(x)$. We are interested in $P(T > n)$ because by a well-known theorem, $$E(T) = \sum_{n \ge 0} P(T > n)$$ Since we know the EGF of $P(T > n)$, we can relate this infinite sum to an integral. Because $$\sum_{n=0}^{\infty} P(T>n) \frac{1}{n!} x^n = e^x - f(x)$$ and $$\int_0^{\infty} x^n e^{-x} \;dx = n!$$ we have $$\begin{align} E(T) = \sum_{n=0}^{\infty} P(T>n) &= \int_0^{\infty} e^{-x}(e^x - f(x)) \;dx \\ &= \int_0^{\infty} e^{-x}(e^x - (e^{p_1 x} -1) (e^{p_2 x} -1)^6) \;dx \\ & \approx \boxed{20.3579} \end{align}$$ on substituting $p_1 = 1/13$ and $p_2 = 2/13$. I admit to having used a computer algebra system to evaluate the integral, but a paper and pencil computation should not be too difficult.

awkward
  • 14,736
  • This is awesome, Awkward, and my profuse apologies to everyone for posting the same problem again - it was really getting to me. Is your result the final result or do further cases need to be considered? By that I mean, is there another case that might result in more boxes? Your analysis approach is totally new to me and I will have to study it carefully to gain an understanding - also understanding the symbols used. Did you use Mathematica? I have Matlab's symbolic math toolbox, would that be good enough? – Ken Bannister May 29 '21 at 13:44
  • @KenBannister That is the final result, there are no more cases. I used Mathematica. I think Wolfram Alpha would also work, but I haven't tested it. I don't know enough about Matlab to have an opinion. Numerical evaluation of the integral is another possibility, but as I said, paper and pencil should work too. – awkward May 29 '21 at 17:27
  • OK, thanks for the confirmation. KAB – Ken Bannister May 29 '21 at 17:52