0

I have a set, of an unknown size $N$. I can cheaply query a random item from this set. I need to estimate $N$ to within a given confidence range.

I have already setup a structure where I can query an item, determine if it has occurred before, and update my overlap counter. My issue is how do I take a sequence of occurrences counts, and infer the set size they are sampled from.

To give a real example:

Consider Set $s$, $s:=\{1, 2, 3, 4, 5, 6\}$.

8 Random Samples yielded $2, 2, 4, 5, 6, 2, 3, 1$. This sequence is processed into this map:

2 -> 3,
4 -> 1,
5 -> 1,
6 -> 1,
3 -> 1

How could I infer set size from this map?

My research has yielded this formula, for inferring population size from discrete random samples: $$ P\left(N\ |\ {s}_1,{s}_2,o\right)\propto P\left(\ o\ |\ N,{s}_1,{s}_2\right)\times P(N). $$ from this paper, but this formula involves seperate random samples, and I cannot adapt it to handle one sample.

I also had difficult finding answers that gave confidence intervals. As the solution will ultimately be used in an algorithm to determine total size, I plan to simply continue running until my confidence falls into a given range.

Any help is appreciated! I'm half sure there's just some statistical law that everyone but me knows that answers this question.

Carson
  • 135
  • 1
    German tank problem? – user619894 Oct 06 '21 at 19:19
  • @user619894 I read over the problem, and it appears to be for sampling without replacement, while my issues is related to sampling with replacement (sampled items remain in the set). I will read more about it though, I imagine there are still useful parallels – Carson Oct 06 '21 at 19:34
  • Some related questions https://math.stackexchange.com/questions/615464/how-many-books-are-in-a-library and https://math.stackexchange.com/questions/4197635/guessing-number-of-colors-of-beads-in-an-urn – Henry Oct 06 '21 at 23:33

0 Answers0