Consider a machine like so:
- Every time I activate it, the machine generates a "sufficiently large" pool of plastic balls for me.
- Each ball is made out of colored plastic and painted over with white paint.
- I do not see the balls being created, and the white coat makes it impossible to tell what color the balls are inside.
- The machine chooses red, green or blue plastic for each ball randomly and with equal probability of each.
- I can only activate the machine once per day.
The only way to determine the true color of a ball is to use a chemical stripping process that takes a whole day. Luckily, you can do more than one ball in parallel, but there is still an extra cost per-ball (although doing $2$ balls in one strip is still much better than doing $1$ ball each in $2$ strips).
My objective is to obtain $1$ ball of every color. To do this, I am using the following strategy:
- Activate the machine, getting a large pool of balls.
- Pick a small number of balls to "screen" (for instance, $5$).
- Take this "screening set", strip them all in one go, take one ball of each color and discard the rest.
At the end of this, I will probably get one of each ball, but there is a chance that I might not get every kind. If that happens, I will have to go back and screen some more balls, which will cost me an extra day - but time is money and I am very impatient. Therefore, I don't want to screen too few balls at once (the extreme case is screening 1 ball at a time, which will take forever).
On the other hand, stripping balls costs money, so I don't want to strip more balls than I have to. For example, screening a whole pool of $1000$ balls when there are only $3$ colors will almost guarantee success, but I'm not really getting much more assurance than a smaller screen (e.g. $10$ balls) while it costs me $100$ times more.
Obviously, if I want $100\%$ probability to catch all my balls in the first screen, I need to screen an infinite number of them. However, if I can accept a $95\%$ probability of needing more than one screen, then there is a finite and small number that will accomplish this. For $3$ colors with equal probability, the chances of needing a second screen after screening $n$ balls are:
$$ 3\cdot\left(\frac{1}{3}\right)^n + 3\cdot\left(\frac{2}{3}\right)^n $$
If I want $95\%$ or more confidence, I set this to:
$$ 3\cdot\left(\frac{1}{3}\right)^n + 3\cdot\left(\frac{2}{3}\right)^n \le 0.05 $$
Then solve for $n$. (in this case it comes out $2.68$, so I would screen $3$ balls at a time)
Is it possible to generalize this for any desired minimum confidence $p$ of getting all colors on each try, $q$ colors of equal probability, and derive a function of $q$ and $p$ that gives $n$?
If you are curious, I am trying to figure out how many colonies to screen after a single transformation with a pooled sample. The machine producing a number of balls represents colonies you get from one transformation, the balls themselves are strains of bacteria, the color represents the plasmid carried by each strain, and the paint stripping process represents extracting and sequencing the plasmid.