0

I have a situation where I have N items in the population (e.g. these are item IDs in some inventory). Suppose I can sample and get back 1 item at a time, where each item has an independent and uniform distribution of being sampled; for example, I would call some function sample() and it would return one of the N items with probability $\frac{1}{N}$.

On average, how many samples would I need to observe all N items at least once? In other words, how many samples would I need to get the entire population? Obviously, I would need at least N samples as a lower bound, but is there a tighter bound?

Thank you for any help. I'm a software engineer, so please be gentle with any math.

Henry
  • 157,058

1 Answers1

1

There is no such number. Even if you only have two items, it is possible that you can sample the pair randomly one million times and still select the same item every single time.

It's highly unlikely, but not impossible.

You probably want to ask how many samples are needed to guarantee seeing every item with a probability of (say) $99.99\%$.

Am I misunderstanding?

MPW
  • 43,638