Expected number of distinct items picked from a set, putting back an item in the box every time it is picked

Question

I have a box with $N$ items. For $W$ times, I pick an item, label it somehow, and put it back. Items that are picked twice or more are labeled only once.

At the end of the process, what is the expected number of items that I have labeled? This seems quite a simple problem, I would look for this on Google but I don't know exactly what to search.

score 1 · Answer 1 · answered Nov 15 '16 at 19:29

1

Hints:

Find the probability a particular item is not picked the first time
Find the probability a particular item is not picked any time
Find the probability a particular item is picked at least once
Find the expected number of items picked at least once

answered Nov 15 '16 at 19:29

Henry

157,058

But this assumes that every element has an independent chance to be picked at least once, but that's not completely true. For example, if I have 10 elements and pick two, the probability of element 3 being picked at least once given that element 1 and 2 have been picked is zero. Am I correct...? – Matteo Monti Nov 16 '16 at 14:02
To get from the third point to the fourth requires linearity of expectation, which does not require independence. So you can just add up the $N$ values, i.e. multiply by $N$. – Henry Nov 16 '16 at 14:58

score 1 · Accepted Answer · edited Apr 13 '17 at 12:20

First, if you're just looking around online, you should try the coupon collector's problem. In its canonical form, the coupon collector's problem is to calculate the expected number of picks until all the coupons (or items in your box) are labelled. Your problem has the tables turned. You want to calculate the expected number of coupons (items) labelled in a specified number of picks.

Second, I think it's worth pointing out the bull-headed approach to this problem and why it doesn't work! In the bull-headed approach, the expectation of a discrete random variable $X$ is $\sum_m m\cdot\text{prob}(X=m)$. In this problem we need the probability that in $w$ rounds exactly $m$ items get labelled, a computation that requires some work. First, you can choose $m$ items among the entire collection of $n$, and there are $\binom n m$ ways to make this choice. Then you need to count the number of onto functions from the rounds $\overline{w}=\{1,\dots,w\} $ to the selected items $\overline{m}=\{1,\dots,m\}$. This is equivalent to selecting all $m$ items in the $w$ rounds. Using a well-known argument from the inclusion-exclusion principle, it turns out that there are $$\sum_{j=0}^m (-1)^j\binom m j(m-j)^w$$ onto functions from $\overline{w}$ to $\overline{m}$. Finally, you have to compare these onto functions with the total number of functions from $\overline{w}$ to $\overline{m}$, which is just $m^w$. Putting all this together, the expected number of items labelled in $w$ rounds is $$\sum_{m=1}^n m\binom n m \sum_{j=0}^m (-1)^j\binom m j(m-j)^w\big/m^w.$$

The problem with this method is that I certainly don't know how to get a closed-form expression for the double sum representing the expectation. (For that matter, neither does Maple.) This impasse highlights the elegance of the solution that Henry suggested. Also, it makes me interested in seeing a cleaned-up version of the "bull-headed" method that produces the same expected number of items that Henry got in his elegant solution, namely $n\bigl(1-(1-\frac1n)^w\bigr)$.

Expected number of distinct items picked from a set, putting back an item in the box every time it is picked

2 Answers2