There are $n$ numbers in a set. $k$ of these numbers have a specific property; the task is to learn how many of the numbers have such a property, i.e. the number $k$.
The learning is done by repeated experiments. Each experiment consists of randomly selecting numbers from the set until a number who does not have the property is selected. Each number can be selected only once.
Each new experiment starts from the full set again.
For example, if numbers 1, 2, 3 have the property but the number 4 does not, then "1, 4" and "2, 3, 4" and "4" are all examples of valid experiments, but "4, 1" or "3,3,4" are not.
What is the probability that after $m$ experiments all $k$ numbers with the property have been seen, i.e. the number $k$ is learned?
I tried to reason like this (assuming that learning each number is independent):
- The probability for each single specific number to be observed in a single experiment is $p = \frac{1}{n - k + 1}$.
- The probability a number is observed after $m$ experiments is $1 - (1-p)^m$
- The probability that all $k$ numbers are observed is $\Pi_{i=1}^{k}{(1 - (1-p_i)^m)} = (1 - (1-p)^m)^k$.
However, this does not match the numbers I got from running simulations on specific cases.
For example, for $n=8$ and $k=4$ and number of experiments $m=10$, the formula gives probability 0.6348597233188475, but simulations give around 0.660309.