1

Suppose I have a list of $N$ elements $\{e_1,e_2,\cdots, e_N\}$. Some elements may be non-unique and so this list has $N^\star\leq N$ unique elements. My question is:

Is there a way to construct an estimator of the fraction of unique elements $f\equiv N^\star/N$ if you are allowed to draw $M$ samples uniformly at random from the list ?

For practical purposes, I'm interested in the limit where $M \ll N$, and where $N$ can be "very large".

Just to clarify the setting in a small example, consider the list of integers $[1,2,3,3,4,5,5,6,1,2]$. For this list $N=10$ and $N^\star=6$.

I am also trying to refine this question and any comment, suggestion or reference is most welcomed.

To make things perhaps more simple, one can assume that each unique element in the list has the same number of duplicates (if any).

  • 1
    Some sort of Bayesian estimator seems straightforward, but for small $M$ the results will depend heavily on the prior distribution over possible values of $N^*$. Does the problem have a real-world context? – Connor Harris Jul 25 '18 at 17:57
  • I don't have a good, easy to explain, real-world example, so I distilled it to the core of the problem. However, to make things simple, we can assume that each unique element in the list has the same number of duplicates. That is, no unique elements has disproportionality many duplicates compared to other unique elements. – VanillaSpinIce Jul 25 '18 at 18:01
  • 1
    I get the impression that by "unique elements" you mean what I'd call "distinct elements". Are you saying that $N^$ elements appear exactly once in the list, or rather that the list contains $N^$ different elements? I think this should be clarified in the question. Also, are the samples with or without replacement? – joriki Jul 25 '18 at 20:33
  • I will add a simple example to clarify this. – VanillaSpinIce Jul 25 '18 at 20:37
  • Please see this question and this question. Do you agree that this is essentially a duplicate of those questions (if the simplification in the last paragraph in your question is applied)? (If not, please explain the difference.) – joriki Jul 25 '18 at 20:39
  • 1
    This is wonderful, the first link especially has a crystal clear formulation of this problem. You may mark it has a duplicate ! – VanillaSpinIce Jul 25 '18 at 20:47

0 Answers0