18

I have a set of $n$ integers $\{1, . . . , n\}$, and I select three values with replacement. How can I find the expected number of distinct values?

Note each value is chosen uniformly and independently.

Mike Spivey
  • 55,550
Alex
  • 181

3 Answers3

14

We first answer the exact question asked, where one draws three items, then we present a more powerful approach which solves the more general case where one draws any number of items.


The probability that the first item was not chosen before is $1$. The probability that the second item was not chosen before is $\frac{n-1}n$. The probability that the third item was not chosen before is $\frac{n-2}n$ if the two first items are different and $\frac{n-1}n$ if the two first items coincide. The two first items are different with probability $\frac{n-1}n$ and they coincide with probability $\frac{1}n$. Hence the expected number of distinct items is $$ 1+\frac{n-1}n+\frac{n-2}n\times\frac{n-1}n+\frac{n-1}n\times\frac1n=\frac{3n^2-3n+1}{n^2} $$


More generally, consider the number $N_k$ of different items chosen after $k$ picks. Then $N_0=0$ almost surely and, knowing $N_k$ the probability to pick a new item at time $k+1$ is $(n-N_k)/n$ hence $$\mathrm E(N_{k+1}\mid N_k)=N_k+\frac{n-N_k}n\qquad\text{almost surely}$$ This shows that the expected number of different items after $k$ picks $\mathrm E(N_k)$ is such that $\mathrm E(N_0)=0$ and $$\mathrm E(N_{k+1})=\mathrm E(N_k)+\frac{n-\mathrm E(N_k)}n=1+a_n\mathrm E(N_k)$$ for every $k\geqslant0$, with $$a_n=1-\frac1n$$ Thus, for every $k\geqslant0$, $$\mathrm E(N_k)=\frac{1-a_n^k}{1-a_n}$$ or, equivalently, $$ \mathrm E(N_k)=n\,\frac{n^k-(n-1)^k}{n^{k}}=\sum\limits_{i=0}^{k-1}(-1)^{i}{k\choose i+1}\frac1{n^i} $$ For example, the fifth row of the Pascal triangle reads $$1\quad5\quad10\quad10\quad5\quad1$$ hence $$E(N_5)=\frac{5n^4-10n^3+10n^2-5n+1}{n^5}$$

Did
  • 279,727
  • This was very helpful. I just have one question,

    "if the two first items are different, which happens with probability n−1/n, and n−1/n.." is their a typo there should it be n-1/n and n-2/n?

    – Alex Oct 13 '11 at 08:08
  • There was no typo. I rephrased. – Did Oct 13 '11 at 08:56
  • 1
    I like the use of recursion. It makes for a nice derivation. (+1) – robjohn Oct 13 '11 at 17:31
  • 1
    @robjohn, yep, a nice recursion based on conditional expectations is simply the thing to make a probabilist happy for hours... – Did Oct 13 '11 at 22:20
12

@Did's method uses recursion very nicely to arrive at the expected number. I arrived at the same answer using more mundane computations.

Let's generalize by picking $p$ numbers from $1\dots n$ with replacement. Let us compute the probability of choosing $d$ distinct numbers. Choose one of the $\binom{n}{d}$ sets of $d$ distinct numbers. The probability of selecting all $p$ picks from those $d$ distinct numbers is $\left(\frac{d}{n}\right)^p$. However, this also counts cases where some of the $d$ numbers were not chosen. Inclusion-exclusion says that the probability of picking all of those $d$ is $$ \sum_k(-1)^{k}\binom{d}{d-k}\left(\frac{d-k}{n}\right)^p=\sum_k(-1)^{d-k}\binom{d}{k}\left(\frac{k}{n}\right)^p\tag{1} $$ Thus, the probability of picking exactly $d$ distinct items is $\binom{n}{d}$ times $(1)$. The expected value is therefore $$ \begin{align} &\sum_d\sum_k(-1)^{d-k}d\binom{n}{d}\binom{d}{k}\left(\frac{k}{n}\right)^p\\ &=\sum_d\sum_k(-1)^{d-k}d\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-d)\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-k)\binom{n}{k}\binom{n-k-1}{n-d-1}\left(\frac{k}{n}\right)^p\\ &=n-\sum_k(n-k)\binom{n}{k}\delta(n-k-1)\left(\frac{k}{n}\right)^p\\ &=n-n\left(\frac{n-1}{n}\right)^p\\ &=n\left(1-\left(\frac{n-1}{n}\right)^p\right)\tag{2} \end{align} $$

robjohn
  • 345,667
  • 4
    I think that Dinesh's solution with indicator random variables is also quite elegant: http://math.stackexchange.com/questions/5775/how-many-bins-do-random-numbers-fill –  Oct 13 '11 at 17:33
  • @Byron: yes, indeed! The probability that all $p$ picks are not $1$ is $\left(\frac{n-1}{n}\right)^p$ so the expected value of $1$ is $1-\left(\frac{n-1}{n}\right)^p$ and so the total expected value is $n\left(1-\left(\frac{n-1}{n}\right)^p\right)$. Very neat! – robjohn Oct 13 '11 at 20:06
  • can you elaborate more on how you are using the inclusion-exclusion principle? What are you events defined as? Is your summation in equation $(1)$ going from $k = 0$ to $k = d-1$? – Quantum Guy 123 Jun 24 '22 at 18:26
  • explaining using the notation from here: https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#In_probability is convenient – Quantum Guy 123 Jun 24 '22 at 18:27
  • @QuantumGuy123: We want to compute the number of ways of choosing exactly $d$ items from $n$ after making $p$ picks. Counting $d^p$ will also include choices which are limited to those $d$, but may miss some, so we need to count the number of ways to miss some of the items. Define the sets $S_i$ for $1\le i\le d$ where each $S_i$ contains choices from the $d$, but leaves out item $i$. To compute the size of the intersections of $k$ of the $S_i$, there are $\binom{d}{d-k}$ ways to choose the $k$ sets and each set contains $(d-k)^p$ choices. – robjohn Jun 24 '22 at 19:57
  • Inclusion-Exclusion says that the number of choices from the $d$ that miss something is $$\sum\limits_{k=1}^d(-1)^{k-1}\binom{d}{d-k}(d-k)^p$$ subtracting this from $d^p$ gives us the number of ways to choose exactly $d$ items: $$\sum\limits_{k=0}^d(-1)^k\binom{d}{d-k}(d-k)^p$$ – robjohn Jun 24 '22 at 19:57
  • Ah, I see. It looks like the answer here is doing the same thing: https://math.stackexchange.com/a/2754693/901223 – Quantum Guy 123 Jun 27 '22 at 17:50
  • I wrote a question similar to this one that you might find interesting: https://math.stackexchange.com/questions/4481749/probability-of-getting-k-different-colored-balls-from-an-urn-with-k-different-co – Quantum Guy 123 Jun 28 '22 at 18:14
2

I would like to give a simple reasoning for the same.

Suppose we have a SRS of size p selected from a population of size n with replacement. To find the expected no. of unique elements in the sample, we make use of the linearity of expectation.

Let $X$: no. of unique elements in p picks Let $X_i$: Indicator that $i$ is a unique element

Then, $X = \sum X_i$,

Where $X_i=1$, if $i$ is selected in $p$ picks and $X_i=0$, otherwise

Then,$E(X) = \sum E(X_i) = \sum P(\text{i is selected in p picks}) = \sum(1-P(\text{i is not selected in p picks})) = \sum(1-((n-1)/n)^p)= n(1−((n-1)/n)^p)$

Note: Here, all the picks are independent (SRSWR)

Trajan
  • 5,194
  • 2
  • 27
  • 71