4

In this problem I want to estimate the unknown total number of colors of balls in the urn. Say that there are N total balls in the urn, and after m draws (with replacement, if that makes it easier), I've observed k colors. We can assume that all colors are equally represented. N can be >> m, N >> k, which may also simplify. Extreme cases: If I've drawn 5 times and see blue,blue,blue,blue,blue, then it's likely that all the balls are blue and there are no other colors. But if I draw 5 times and see red,orange,yellow,green,blue, then I have no idea how many colors there could be -- up to N. My conjecture is that the answer is mk/(m - k) for N >> m but I'd appreciate a proper derivation.

Rob
  • 41
  • You can find a probability to have $q$ colors for every $q>k$. – N74 Oct 13 '16 at 19:41
  • Yes, I know. That would look something like 1 - sum( 1/q^m ) where the sum is over q = k+1 to infinity and m = number of samples taken. But I want an estimator of the total number of colors, and also if possible an estimated variance for that result. – Rob Oct 14 '16 at 19:21
  • Since this is a real world problem (not homework!), I've turned to simulation to look for a solution. It looks like k = K(1 - exp(-m/K)) [ where m = #draws, K = unknown total #colors, k = #different colors observed ] is an excellent fit (drawing with replacement), far better than my first conjecture above. – Rob Feb 08 '17 at 22:00
  • Did you manage to find an analytical solution? – Jakub M. Nov 25 '17 at 21:44
  • No. The simulation provided a sufficient answer for my business need but I would still like to see a good statistician find an analytical solution. – Rob Dec 04 '17 at 23:24
  • Hi Rob – not sure whether you're still around and whether you get notified if I close this as a duplicate, so here's a ping – enjoy the solution :-) – joriki Jan 14 '24 at 18:33

0 Answers0