2

I am trying to calculate the probability of rolling a die $N$ times and seeing all the values $1$ to $6$ in these $N$ trials at least once.

From this question, I am going to solve a more complex question that is: I have $N$ nodes of $C$ clusters each of size $c_j$. I want to subsample the the data and choose $M$ nodes out of it, and I need the probability of having at least one sample from each cluster. In other words I want to subsample but with high probability have at least one member of each cluster in my subsampled set; so I need to measure that probability.

Thanks all.

saulspatz
  • 53,131

1 Answers1

2

Using the technique from the following MSE link we have for the probability from first principles (exponential generating functions and generalized Stirling numbers) that it is given by

$$\frac{1}{M!} {N\choose M}^{-1} \times M! [z^M] \prod_{j=1}^C \sum_{k=1}^{c_j} \frac{c_j!}{(c_j-k)!} \frac{z^k}{k!} = {N\choose M}^{-1} [z^M] \prod_{j=1}^C \sum_{k=1}^{c_j} {c_j\choose k} z^k.$$

This is

$$\bbox[5px,border:2px solid #00A000]{ {N\choose M}^{-1} [z^M] \prod_{j=1}^C (-1+(1+z)^{c_j}).}$$

For the special case of all clusters having the same size $j$ we get

$${N\choose M}^{-1} [z^M] (-1+(1+z)^j)^C = {N\choose M}^{-1} [z^M] \sum_{q=0}^C {C\choose q} (-1)^{C-q} (1+z)^{qj}.$$

This is

$$\bbox[5px,border:2px solid #00A000]{ {N\choose M}^{-1} \sum_{q=0}^C {C\choose q} (-1)^{C-q} {qj\choose M}.}$$

We can use this to compute the expected number of draws until a representative from every cluster has been seen. Note that the complementary probability counts draws where at least one type of cluster is missing, i.e. the number of draws until having seen all is more than $M.$ Hence we get for the expectation

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[T] = N-j+1 - \sum_{M=0}^{N-j} {N\choose M}^{-1} \sum_{q=0}^C {C\choose q} (-1)^{C-q} {qj\choose M}.}$$

As a sanity check when $j=1$ the expectation should be $C.$ We obtain

$$C - \sum_{M=0}^{C-1} {C\choose M}^{-1} \sum_{q=M}^C {C\choose q} (-1)^{C-q} {q\choose M}.$$

Now we have

$${C\choose q} {q\choose M} = \frac{C!}{(C-q)! \times M! \times (q-M)!} = {C\choose M} {C-M\choose C-q}.$$

Substituting we find

$$C - \sum_{M=0}^{C-1} {C\choose M}^{-1} {C\choose M} \sum_{q=M}^C {C-M\choose C-q} (-1)^{C-q} \\ = C - \sum_{M=0}^{C-1} \sum_{q=0}^{C-M} {C-M\choose C-M-q} (-1)^{C-M-q} = C - \sum_{M=0}^{C-1} \sum_{q=0}^{C-M} {C-M\choose q} (-1)^{q} \\ = C - \sum_{M=0}^{C-1} 0 = C,$$

as claimed. Here we have used that $C-1\ge M$ or $C\ge M+1\gt M$.

Marko Riedel
  • 61,317
  • Thank, I will review the answer tonight, can you please take a look at this: https://math.stackexchange.com/questions/3119313/how-many-players-to-play-coupon-collector-without-replacement-so-they-collect-al – ameerosein Feb 21 '19 at 03:31