I have a set of $n$ integers $\{1, . . . , n\}$, and I select three values with replacement. How can I find the expected number of distinct values?
Note each value is chosen uniformly and independently.
I have a set of $n$ integers $\{1, . . . , n\}$, and I select three values with replacement. How can I find the expected number of distinct values?
Note each value is chosen uniformly and independently.
We first answer the exact question asked, where one draws three items, then we present a more powerful approach which solves the more general case where one draws any number of items.
The probability that the first item was not chosen before is $1$. The probability that the second item was not chosen before is $\frac{n-1}n$. The probability that the third item was not chosen before is $\frac{n-2}n$ if the two first items are different and $\frac{n-1}n$ if the two first items coincide. The two first items are different with probability $\frac{n-1}n$ and they coincide with probability $\frac{1}n$. Hence the expected number of distinct items is $$ 1+\frac{n-1}n+\frac{n-2}n\times\frac{n-1}n+\frac{n-1}n\times\frac1n=\frac{3n^2-3n+1}{n^2} $$
More generally, consider the number $N_k$ of different items chosen after $k$ picks. Then $N_0=0$ almost surely and, knowing $N_k$ the probability to pick a new item at time $k+1$ is $(n-N_k)/n$ hence $$\mathrm E(N_{k+1}\mid N_k)=N_k+\frac{n-N_k}n\qquad\text{almost surely}$$ This shows that the expected number of different items after $k$ picks $\mathrm E(N_k)$ is such that $\mathrm E(N_0)=0$ and $$\mathrm E(N_{k+1})=\mathrm E(N_k)+\frac{n-\mathrm E(N_k)}n=1+a_n\mathrm E(N_k)$$ for every $k\geqslant0$, with $$a_n=1-\frac1n$$ Thus, for every $k\geqslant0$, $$\mathrm E(N_k)=\frac{1-a_n^k}{1-a_n}$$ or, equivalently, $$ \mathrm E(N_k)=n\,\frac{n^k-(n-1)^k}{n^{k}}=\sum\limits_{i=0}^{k-1}(-1)^{i}{k\choose i+1}\frac1{n^i} $$ For example, the fifth row of the Pascal triangle reads $$1\quad5\quad10\quad10\quad5\quad1$$ hence $$E(N_5)=\frac{5n^4-10n^3+10n^2-5n+1}{n^5}$$
"if the two first items are different, which happens with probability n−1/n, and n−1/n.." is their a typo there should it be n-1/n and n-2/n?
– Alex Oct 13 '11 at 08:08@Did's method uses recursion very nicely to arrive at the expected number. I arrived at the same answer using more mundane computations.
Let's generalize by picking $p$ numbers from $1\dots n$ with replacement. Let us compute the probability of choosing $d$ distinct numbers. Choose one of the $\binom{n}{d}$ sets of $d$ distinct numbers. The probability of selecting all $p$ picks from those $d$ distinct numbers is $\left(\frac{d}{n}\right)^p$. However, this also counts cases where some of the $d$ numbers were not chosen. Inclusion-exclusion says that the probability of picking all of those $d$ is $$ \sum_k(-1)^{k}\binom{d}{d-k}\left(\frac{d-k}{n}\right)^p=\sum_k(-1)^{d-k}\binom{d}{k}\left(\frac{k}{n}\right)^p\tag{1} $$ Thus, the probability of picking exactly $d$ distinct items is $\binom{n}{d}$ times $(1)$. The expected value is therefore $$ \begin{align} &\sum_d\sum_k(-1)^{d-k}d\binom{n}{d}\binom{d}{k}\left(\frac{k}{n}\right)^p\\ &=\sum_d\sum_k(-1)^{d-k}d\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-d)\binom{n}{k}\binom{n-k}{n-d}\left(\frac{k}{n}\right)^p\\ &=n-\sum_d\sum_k(-1)^{d-k}(n-k)\binom{n}{k}\binom{n-k-1}{n-d-1}\left(\frac{k}{n}\right)^p\\ &=n-\sum_k(n-k)\binom{n}{k}\delta(n-k-1)\left(\frac{k}{n}\right)^p\\ &=n-n\left(\frac{n-1}{n}\right)^p\\ &=n\left(1-\left(\frac{n-1}{n}\right)^p\right)\tag{2} \end{align} $$
I would like to give a simple reasoning for the same.
Suppose we have a SRS of size p selected from a population of size n with replacement. To find the expected no. of unique elements in the sample, we make use of the linearity of expectation.
Let $X$: no. of unique elements in p picks Let $X_i$: Indicator that $i$ is a unique element
Then, $X = \sum X_i$,
Where $X_i=1$, if $i$ is selected in $p$ picks and $X_i=0$, otherwise
Then,$E(X) = \sum E(X_i) = \sum P(\text{i is selected in p picks}) = \sum(1-P(\text{i is not selected in p picks})) = \sum(1-((n-1)/n)^p)= n(1−((n-1)/n)^p)$
Note: Here, all the picks are independent (SRSWR)