1

Consider a 12-sided fair die. What is the distribution of the number T of rolls required to roll a 1, a 2, a 3, and a 4?

Taking inspiration from the Coupon Collector's Problem, I believe that the expected number of rolls to achieve the goal would be

$$E[T] = \sum\limits_{i=0}^3 \frac{12}{4-i} = 25$$

Similarly, the variance would be

$$Var[T] = \sum\limits_{i=0}^3 \frac{1-\frac{4-i}{12}}{(\frac{4-i}{12})^2} = 180$$

But applying Chebyshev here does not yield very useful bounds. My question is therefore, how would you compute, for example, $P(T=16)$ or $P(T<30)$?

Ideally this could be generalized to a set of k required numbers, not just 4 as in the example.

sffc
  • 125

3 Answers3

1

You can get this distribution by conditioning on the number $N$ of rolls up to $4$ required:

\begin{align} \textsf{Pr}(T\le t) &=\left(\frac23\right)^t\sum_{n=0}^t\binom tn2^{-n}\textsf{Pr}(N\le n) \\ &=\left(\frac23\right)^t\sum_{n=0}^t\binom tn2^{-n}\frac{4!}{4^n}\left\{n\atop4\right\} \\ &=\left(\frac23\right)^t\sum_{n=0}^t\binom tn8^{-n}\sum_{j=0}^4(-1)^j\binom4jj^n \\ &=\left(\frac23\right)^t\sum_{j=0}^4(-1)^j\binom4j\sum_{n=0}^t\binom tn\left(\frac j8\right)^n \\ &=\left(\frac23\right)^t\sum_{j=0}^4(-1)^j\binom4j\left(1+\frac j8\right)^t \\ &=\sum_{j=0}^4(-1)^j\binom4j\left(\frac23+\frac j{12}\right)^t \\ &=1-4\left(\frac{11}{12}\right)^t+6\left(\frac56\right)^t-4\left(\frac34\right)^t+\left(\frac23\right)^t\;, \end{align}

where $\left\{n\atop4\right\}$ is a Stirling number of the second kind and the distribution $P(N\le n)$ is given at Probability distribution in the coupon collector's problem.

The expectation comes out right as

\begin{align} E[T]&=\sum_{t=0}^\infty\textsf{Pr}(T\gt t)\\ &=\sum_{t=0}^\infty\left(4\left(\frac{11}{12}\right)^t-6\left(\frac56\right)^t+4\left(\frac34\right)^t-\left(\frac23\right)^t\right)\\ &=12\left(4\cdot\frac11-6\cdot\frac12+4\cdot\frac13-1\cdot\frac14\right)\\ &=25\;. \end{align}

Here are plots of the cumulative distribution function and the probability mass function.

In your examples,

$$ \textsf{Pr}(T=16)=\textsf{Pr}(T\gt15)-\textsf{Pr}(T\gt16)=\frac{293289532268461}{7703510787293184}\approx3.8\% $$

and

$$ \textsf{Pr}(T\lt30)=\textsf{Pr}(T\le29)=\frac{292029927548835623394780280045}{412111655902378135987760922624}\approx70.9\%\;. $$

For general $k$ instead of $4$ numbers required and general $m$ instead of $12$ numbers on the die, the result is

$$ \textsf{Pr}(T\le t)=\sum_{j=0}^k(-1)^j\binom kj\left(1-\frac{k-j}m\right)^t\;. $$

joriki
  • 238,052
1

Personally I would call $P(n,b,k,j)$ the probability that after rolling $n$ dice with $b$ sides, $j$ of the target $k$ sides had been found, with the formula

$$P(n+1,b,k,j)= \frac{b-k+j}{b} P(n,b,k,j) + \frac{k+1-j}{b} P(n,b,k,j-1)$$

starting from $P(0,b,k,j)=0$ for $j \not = 0$ and $P(0,b,k,0)=1$.

Then $$\Pr(T \le t) = P(t,b,k,k)$$ and $$\Pr(T = t) = P(t,b,k,k)-P(t-1,b,k,k).$$

So in your example it is not difficult to calculate $\Pr(T = 16) \approx 0.0380722$ and $\Pr(T \le 30) \approx 0.7305294$. Your expected value of $T$ and variance appear to be correct.

Henry
  • 157,058
  • Thanks for the response. A recursive definition is a good idea. I feel though that in your response, in the recurrence relation, one of the P's needs to have a j-1. – sffc Dec 27 '12 at 13:31
  • @vote539: your final point is correct and I will edit – Henry Dec 27 '12 at 13:48
  • Thanks for the clarification! The results yielded by your solution match those yielded by a simulation of the dice rolls. A note to anyone who wants to implement this: the recursion stack grows very quickly, so it is necessary to use dynamic programming; that is, cache the results of the P function and only perform the computation when there is no cached value. – sffc Dec 27 '12 at 19:46
  • Or you could use a spreadsheet – Henry Dec 27 '12 at 23:26
  • Our answers agree on $\textsf{Pr}(T=16)$ and $\textsf{Pr}(T\le30)$. (The example in the question was $\textsf{Pr}(T\lt30)$.) – joriki May 18 '16 at 19:15
  • @joriki: Fair enough, I get $\Pr(T \lt 30)= P(29,12,4,4)\approx 0.708618$ as you do – Henry May 18 '16 at 20:22
0

I think it's the sum of four different geometric distributions.

For example, you start off and you haven't rolled any of the numbers. The probability of rolling one of them is 4/12 so you start a geometric distribution with that parameter. Then you suddenly roll one of them. Now you need to roll one of the other three, each roll has a probability of 3/12 of getting one of them, so it's another geometric distribution.

By this logic I think T ~ Geo(4/12) + Geo(3/12) + Geo(2/12) + Geo(1/12) which of course is probably some hideous ditribution define in terms of convolutions (or you know, could miraculously be nice and of closed-form, I don't know what.)

  • This is sort-of what I was thinking, but what are the next steps to get to the point when you can actually compute the probability densities? – sffc Dec 27 '12 at 10:01