4

There are $n$ types of coupons. All types are equally likely to turn up and each "draw" of a coupon is independent of others. If someone collects coupons until they have a complete set of all the $n$ types, what is the expected value of the number of coupons that only appear once in this complete set?
In the book, they give this solution:
Let $X$ be the number of singletons. Let $T_i$ be the $i$th type of coupon collected and $A_i$ the event that there is only one $T_i$ coupon in the set. Then $$\mathbb{E} [X]=\sum_{i=1}^n \mathbb{P}(A_i)$$ This much I understand. What I do not understand is the following:
Now, at the moment when the first type $T_i$ coupon is collected, there remain $n − i$ types that need to be collected to have a complete set. Because, starting at this moment, each of these $n − i + 1$ types (the $n − i$ not yet collected and type $T_i$) is equally likely to be the last of these types to be collected, it follows that the type $T_i$ will be the last of these types (and so will be a singleton) with probability $\frac {1}{ n−i+1}$.
I do not understand this derivation of the probability. If $T_i$ just got collected, how is it that it can be collected last of the $n-i$ not yet collected types?

RobPratt
  • 45,619
GuPe
  • 7,318

3 Answers3

4

They are looking for the chance that you get another $T_i$. You can collect it again until you have found all the other types you are looking for. After you get the first $T_i$, they say make a list of the first occurrence after now of $T_i$ and all the coupons you have not found yet. If $T_i$ is the last, you will only have one $T_i$ when you complete your set. If it is not the last, you will have a duplicate $T_i$ when you complete your set.

As a specific example, say you have all the coupons except $a,b,c$. Now you draw your first $a$ (this is $T_i$ in the above). They say you should look at the order of the next draw of $a,b,c$. There are $3!$ possible orders, of which $2$ have $a$ after the others, so you now have $\frac 23$ chance of getting a second $a$ before you finish the set and a $\frac 13$ chance you get your complete set while you have only one $a$.

Ross Millikan
  • 374,822
2

By way of enrichment here is the expectation using Stirling numbers of the second kind. In referencing the notation from this MSE link we have $n$ coupons, and ask about the expected number of singletons once a complete set of $n$ different coupons has been drawn. We will be using OGFs and EGFs of Stirling numbers and switch between them.

First let us verify that we indeed have a probability distribution here. We have for the number $T$ of coupons being $m$ draws classified according to the number of singletons that

$$P[T=m] = \frac{1}{n^m} \times {n\choose n-1} \\ \times \sum_{q=0}^{n-1} {n-1\choose q} {m-1\choose q} q! {m-1-q\brace n-1-q}_{\ge 2} (n-1-q)!.$$

What is happening here is that we first choose the $n-1$ types of coupons that go into the prefix, where the one not selected goes into the suffix, completing the set of coupons. Next we choose $q$ coupons from the ones in the prefix which will be represented by singletons (factor ${n-1\choose q}$). Next we choose the positions from the available slots where the singletons will be placed (factor ${m-1\choose q} q!$). We split the leftover $m-1-q$ slots into sets of at least two elements, one for each of the $n-1-q$ types that have not been instantiated (factor ${m-1-q\brace n-1-q}_{\ge 2} (n-1-q)!$).

This probability simplifies to

$$P[T=m] = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} \frac{(n-1)!}{q!} \frac{1}{(m-1-q)!} {m-1-q\brace n-1-q}_{\ge 2} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} \frac{(n-1)!}{q!} [z^{m-1-q}] \frac{(\exp(z)-z-1)^{n-1-q}}{(n-1-q)!} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} {n-1\choose q} [z^{m-1-q}] (\exp(z)-z-1)^{n-1-q} \\ = \frac{n \times (m-1)!}{n^m} \sum_{q=0}^{n-1} {n-1\choose q} [z^{m-1}] z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n \times (m-1)!}{n^m} [z^{m-1}] (\exp(z)-1)^{n-1} \\ = \frac{n! \times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-1)!}.$$

We then get for the sum of the probabilities (observe that the EGF has morphed into an OGF)

$$\sum_{m\ge 1} P[T=m] = \frac{n!}{n} \sum_{m\ge 1} \frac{1}{n^{m-1}} [z^{m-1}] \prod_{q=1}^{n-1} \frac{z}{1-qz} = \frac{n!}{n} \prod_{q=1}^{n-1} \frac{1/n}{1-q/n} \\ = \frac{n!}{n} \prod_{q=1}^{n-1} \frac{1}{n-q} = \frac{n!}{n} \frac{1}{(n-1)!} = 1.$$

The probabilities sum to one and the sanity check goes through.

Continuing with the expected number of singletons we get an extra factor $q$ which yields

$$\frac{n \times (m-1)!}{n^m} \sum_{q=1}^{n-1} q {n-1\choose q} [z^{m-1}] z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n(n-1) \times (m-1)!}{n^m} [z^{m-1}] \sum_{q=1}^{n-1} {n-2\choose q-1} z^q (\exp(z)-z-1)^{n-1-q} \\ = \frac{n(n-1) \times (m-1)!}{n^m} \\ \times [z^{m-1}] z \sum_{q=1}^{n-1} {n-2\choose q-1} z^{q-1} (\exp(z)-z-1)^{n-2-(q-1)} \\ = \frac{n(n-1) \times (m-1)!}{n^m} [z^{m-2}] (\exp(z)-1)^{n-2} \\ = \frac{n! \times (m-1)!}{n^m} [z^{m-2}] \frac{(\exp(z)-1)^{n-2}}{(n-2)!}.$$

Now we have

$$\sum_{m\ge 2} w^{m-2} (m-1)! [z^{m-2}] \sum_{q\ge 0} A_q \frac{z^q}{q!} \\ = \sum_{m\ge 2} w^{m-2} (m-1) A_{m-2} = \left.\left(z \sum_{q\ge 0} A_q z^q\right)'\right|_{z=w}.$$

Applying this to the expectation yields

$$\frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \left(\prod_{q=0}^{n-2} \frac{z}{1-qz}\right)' \\ = \frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \prod_{q=0}^{n-2} \frac{z}{1-qz} \sum_{q=0}^{n-2} \frac{1-qz}{z} \frac{1}{(1-qz)^2} \\ = \frac{n!}{n^2} \sum_{m\ge 2} \frac{1}{n^{m-2}} [z^{m-2}] \prod_{q=0}^{n-2} \frac{z}{1-qz} \sum_{q=0}^{n-2} \frac{1/z}{1-qz} \\ = \frac{n!}{n^2} \prod_{q=0}^{n-2} \frac{1/n}{1-q/n} \sum_{q=0}^{n-2} \frac{n}{1-q/n} \\ = n! \prod_{q=0}^{n-2} \frac{1}{n-q} \sum_{q=0}^{n-2} \frac{1}{n-q}.$$

This simplifies to the end result

$$\bbox[5px,border:2px solid #00A000]{ H_n \sim \log n + \gamma}$$

where we have included an increment of one that represents the singleton which completed the set of coupons.

There is also a beginning level Perl script available which will confirm this formula for approximately four digits precision.

#! /usr/bin/perl -w
#

MAIN: {
    my $n = shift || 5;
    my $trials = shift || 1000;

    my $data = 0;

    for(my $tind = 0; $tind < $trials; $tind++){
        my $seen = 0; my @dist;

        @dist[1..$n] = (0) x $n;

        while($seen < $n){
            my $coupon = 1 + int(rand($n));

            $seen++ if $dist[$coupon] == 0;
            $dist[$coupon]++;
        }

        my $single = 0;
        for(my $type = 1; $type <= $n; $type++){
            $single++ if $dist[$type] == 1;
        }

        $data += $single;
    }

    print $data/$trials;
    print "\n";

    1;
}

This post made extensive use of the technique of annihilated coefficient extractors (ACE). There are more of these at this MSE link I and at this MSE link II and also here at this MSE link III.

Addendum. We can simplify the above computation somewhat by using the species of ordered set partitions with singletons marked. This is

$$\mathfrak{S}(\mathcal{U}\mathcal{V}\mathcal{Z} + \mathcal{U}\mathfrak{P}_{\ge 2}(\mathcal{Z}))$$

and has generating function

$$G(z,u,v) = \frac{1}{1-u(\exp(z)-z+vz-1)}.$$

We are partitioning $m-1$ slots into $n-1$ sets and we extract

$$(m-1)! [z^{m-1}] [u^{n-1}] G(z,u,v) \\ = (m-1)! [z^{m-1}] (\exp(z)-z+vz-1)^{n-1}.$$

We thus obtain a generating function for the probability which encapsulates its value but classifies according to the number of singletons which is

$$P[T = m] = \frac{1}{n^m} \times {n\choose n-1} \times (m-1)! [z^{m-1}] (\exp(z)-z+vz-1)^{n-1} \\ = \frac{n!\times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-z+vz-1)^{n-1}}{(n-1)!}.$$

I.e. $n^m P[T=m]$ is the OGF of sequences of draws where the last coupon is obtained at draw number $m,$ classified according to the number of singletons represented by the exponent on $v.$ E.g. for four coupons and six draws we obtain

$$360v+240v^2.$$

Divide by four to account for the choice of the last coupon, leaving five draws and three coupons and the term $90v+60v^2$ (the last coupon was a singleton but we did not count it until the very end, see above). For one singleton choose it in three ways and combine with pairs of the remaining two types to get $3\times {5\choose 2,2,1} = 90.$ For two singletons choose the two types in ${3\choose 2}$ ways, the remaining type gets three slots and we have $3\times {5\choose 3,1,1} = 60.$

Returning to the main thread and setting $v=1$ we remove the marking and obtain the probability

$$\frac{n!\times (m-1)!}{n^m} [z^{m-1}] \frac{(\exp(z)-1)^{n-1}}{(n-1)!}$$

and may continue as before.

We compute the expectation by differentating with respect to $v$ and setting $v=1$ and obtain

$$\left.\frac{n!\times (m-1)!}{n^m} [z^{m-1}] (n-1) \frac{(\exp(z)-z+vz-1)^{n-2}}{(n-1)!} \times z\right|_{v=1} \\ = \frac{n!\times (m-1)!}{n^m} [z^{m-2}]\frac{(\exp(z)-1)^{n-2}}{(n-2)!}.$$

We then continue once more as in the first version.

Marko Riedel
  • 61,317
  • It has taken me a while to read through all of this and it is really interesting indeed, I think I will need some more time to understand it all – GuPe Dec 13 '16 at 06:28
0

Let $T_i$ be the $i$-th type collected during the process of collecting all n types. For example, $T_1$ is whichever type you get on the first selection. Let $A_i$ be the event that $T_i$ is collected precisely once in the process of collecting all types. The claim is that $$P(A_i)=\frac{1}{n-i+1},$$ or equivalently $$P(A_{n-i+1})=\frac{1}{i}.$$ It should not be surprising to find that $P(A_n)=1$, as the process stops immediately upon collecting $n$ types. Let us see why $P(A_{n-1})=1/2$.

At the moment when type $T_{n-1}$ is collected, we need to make sure that type $T_{n-1}$ is never picked up again as we collect type $T_n$. We can achieve this by selecting type $T_n$ in the next selection, which has probability $1/n$. Or, we encounter types $T_{1},\ldots, T_{n-2}$ in the next selection, and then type $T_n$. This has probability $\frac{n-2}{n}\cdot\frac{1}{n}$. Since we can encounter type $T_{1}\ldots, T_{n-2}$ arbitrary number of times, the probability that we collect type $T_n$ without touching type $T_{n-1}$ is $$\sum_{k=0}^{\infty}(\frac{n-2}{n})^k\cdot\frac{1}{n}=1/2.$$

One should be able to calculate other $P(A_i)$ based on this strategy. For examople, let us move to the extreme and show that $P(A_1)=\frac{1}{n}.$

For this, we need to make sure as we collect type $T_2$, no type $T_1$ is collected. Namely, a type $T_2$ follows immediately on the second selection. This has probability $\frac{n-1}{n}$. Next, from type $T_2$ to type $T_3$, we need to collect type $T_3$ without touching type $T_1$. Namely, only type $T_2$ can be collected. This amounts to probability $$\sum_{k=0}^{\infty}(\frac{1}{n})^k\cdot\frac{n-2}{n}=\frac{n-2}{n-1}.$$

In general, going from $T_{i-1}$ to $T_i$ where $i\geq 3$, we need to select type $T_{i}$, and no type $T_1$ is selected. This has probability $$\sum_{k=0}^{\infty}(\frac{i-2}{n})^k\cdot\frac{n-i+1}{n}=\frac{n-i+1}{n-i+2}.$$ Note $\frac{i-2}{n}$ is the probability of selecting type $T_2,\ldots, T_{i-1}$, and $\frac{n-i+1}{n}$ is the probability to collect type $T_{i}$.

Finally, to make sure type $T_1$ is never collected in the remaining process, we simply require that type $T_1$ is never collected as we go from type $T_{i-1}$ to type $T_i$, for $i=2,\ldots n$. We calculated that the probability of such event is $\frac{n-i+1}{n-i+2}$. Since the selection is independent from the previous ones, we can multiply these together to get $$\frac{n-1}{n}\times\frac{n-2}{n-1}\times\cdots\times \frac{1}{2}\times 1= \frac{1}{n}$$ as desired.