1

I have a problem that looks quite simple but I can't write it down mathematically correctly.

Imagine I have $N$ persons, and I randomly sample $k$ persons with replacement throughout different rounds $t=1,...T$. I want to know how many different persons $n_{sampled, t}$ have I sampled in $t$ rounds, typically for what $t$ would I have sampled all $N$ persons at least once.

At round $t=1$, I know for sure that I have sampled $k$ different persons. But the next round, I may sample persons that I had already sampled. In the worst case, I would have sampled the exact same persons, so $n_{sampled}=k$ and in the best case, I would have sample completely different persons, so $n_{sampled}=2k$.

Thus, assuming that at the first round $n_{sampled,1}=k$, I need to add for the next rounds the average persons who have not already been sampled.

The probability (P) of a given person to be chosen at round $t$ is :

$$ P=\frac{k}{N} $$

And thus the probability $p_t$ of a given person to not have been already sampled at round $t$ would be :

$$p_t = (1 - P)^t = (1-\frac{k}{N})^t$$

(Is this correct?)

So if we consider a random variable $X_t$ counting the number of persons that haven't already been sampled at round $t$ ($0 \leq X_t \leq k$) and assume it follows a binomial law of parameters $p_t$ and $k$, we would have : $$ \mathbb{E}(Xt) = k p_t$$

And thus,

$$n_{sampled, T} = k + \sum_{t=2}^{T} \mathbb{E}(Xt)$$

Would this be correct ?

  • You can use the principle of inclusion-exclusion to find the probability that you have sampled everyone after $T$ rounds, and then use that to find the average number of rounds it takes to sample everyone (of course, "average" is not quite the same as "typical"). See this answer. – Mike Earnest Mar 17 '21 at 03:48

0 Answers0