2

I have something that I am trying to compute.

Let's say that a number is uniformly generated 1-4.

What would be the expected number of generations required to get at least 3 1s and every other number was generated at least once.

I know how to calculate at least 3 1s as that would be a negative binomial distribution with p=3/4 and r=3.

I know how to calculate at least every number shows up once, since those are geometric distrubutions (or negative binomial with r=1).

How can I marry these concepts to come up with an expected value?

joriki
  • 238,052
swinch
  • 58

2 Answers2

1

This is a generalized coupon collector's problem. Two approaches you can take to this are used in the two answers at Coupon Collecting Problem: $4$ coupons with $p_1 = p_2 = \frac{1}{8}$ and $p_3 = p_4 = \frac{3}{8}$.

In the spirit of Ross's answer, you can define a state $(j,k)$ with $0\le j,k\le3$, in which you've generated $j$ $1$s and $k$ of the other numbers at least once. Then the expected number $a_{jk}$ of remaining generations satisfies the recurrence

$$ a_{jk}=1+\frac14a_{j+1,k}+\frac k4a_{jk}+\frac{3-k}4a_{j,k+1}\;, $$

where the indices aren't incremented beyond $3$ and the initial value is $a_{33}=0$. The result is

\begin{array}{c|cccc} j\setminus k&0&1&2&3\\\hline 0&\frac{1915}{144}&\frac{349}{27}&\frac{25}2&12\\ 1&\frac{125}{12}&\frac{88}9&9&8\\ 2&\frac{25}3&\frac{22}3&6&4\\ 3&\frac{22}3&6&4&0\\ \end{array}

Thus the expected number of generations is $a_{00}=\frac{1915}{144}\approx13.3$.

In the spirit of my answer, we can apply inclusion–exclusion to the four conditions $A_i$ that you've generated $i$ a sufficient number of times ($3$ for $i=1$ and $1$ otherwise). Let $N_i$ denote the number of generations required until $A_i$ is fulfilled, and let $N=\max_iN_i$ denote the number of generations required until all the conditions $A_i$ are fulfilled. Then

\begin{eqnarray*} \mathsf E[N] &=& \sum_{n=0}^\infty\mathsf P(N\gt n) \\ &=& \sum_{n=0}^\infty\mathsf P\left(\bigvee_{i\in\{1,2,3,4\}}N_i\gt n\right) \\ &=& \sum_{n=0}^\infty\sum_{\emptyset\ne S\subseteq\{1,2,3,4\}}(-1)^{|S|+1}\mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)\;. \end{eqnarray*}

Now there are two cases. If $1\notin S$, we have

$$ \mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)=4^{-n}(4-|S|)^n\;. $$

If $1\in S$, we have

$$ \mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right)=\sum_{j=0}^2\binom nj4^{-n}(4-|S|)^{n-j}\;. $$

Splitting the sum over $S$ into these two cases, we obtain

\begin{eqnarray*} \sum_{n=0}^\infty\sum_{\emptyset\ne S\subseteq\{2,3,4\}}(-1)^{|S|+1}\mathsf P\left(\bigwedge_{i\in S}N_i\gt n\right) &=& \sum_{n=0}^\infty\sum_{k=1}^3(-1)^{k+1}\binom3k4^{-n}(4-k)^n \\ &=& \sum_{k=1}^3(-1)^{k+1}\binom3k\frac4k \\ &=& 12-6+\frac43 \\ &=& \frac{22}3 \end{eqnarray*}

and

\begin{eqnarray*} \sum_{n=0}^\infty\sum_{S\subseteq\{2,3,4\}}(-1)^{|S|}\mathsf P\left(\bigwedge_{i\in S\cup\{1\}}N_i\gt n\right) &=& \sum_{n=0}^\infty\sum_{j=0}^2\sum_{k=0}^3(-1)^k\binom3k\binom nj4^{-n}(4-(k+1))^{n-j} \\ &=& 4\sum_{j=0}^2\sum_{k=0}^3(-1)^k\binom3k\left(\frac1{k+1}\right)^{j+1} \\ &=& 12-4\left(\frac32-1+\frac14+\frac34-\frac13+\frac1{16}+\frac38-\frac19+\frac1{64}\right) \\ &=& \frac{859}{144}\;. \end{eqnarray*}

Together, this is

$$ \frac{22}3+\frac{859}{144}=\frac{1915}{144}\approx13.3\;, $$

in agreement with the first result.

joriki
  • 238,052
  • Nicely done... I agree for this question Ross M's approach is easier to digest... for me personally the notation necessary in the second approach takes a lot of mental effort to comprehend, I know you're solving the terms in inclusion exclusion but unfortunately my brain is like a 30 horsepower lawn mower. Question for you... when coming up with the 4 by 4 table with all the states in the Ross M approach do you use a scripting language? If so can you share which one? – HJ_beginner Sep 20 '18 at 17:43
  • @HJ_beginner: Actually, in this case I did it by hand :-). If it had been $4$ instead of $3$, I would have coded it in Java. I often include links to Java code I use in my answers. – joriki Sep 20 '18 at 17:46
  • Ah thanks for sharing. In regards to the second approach this notation was bizarre to me $\emptyset\ne S\subseteq{1,2,3,4}$ but I see on the wiki page you linked it's a compact way to express the inclusion exclusion formula. Great answer with two approaches and I enjoyed OP's question. There's so many variations of coupon collecting it's obscene but they remind me of collecting rare items in video games so very relevant to today. – HJ_beginner Sep 20 '18 at 17:51
  • 2
    This was actually to solve the expected number of ultra jumps I would do on Super Mario RPG. Video games, indeed. Thank you so much for this answer. – swinch Sep 20 '18 at 18:17
  • @swinch That is an excellent game. Good taste. – HJ_beginner Sep 20 '18 at 18:56
  • 2
    I had made a python script that estimated this result and I got basically 13.2985, but I figured there had to be an elegant solution. I really like the recurrence relation way as I think it is very digestible. Here's the relevance of this answer: https://pastebin.com/7ghExJR6 – swinch Sep 20 '18 at 19:03
  • @swinch Thanks for sharing your article and python script. It's been years since I beat the Axem Rangers lol. Also per this particular site if you found joriki's answer "good" you should accept it – HJ_beginner Sep 20 '18 at 20:59
0

Here’s yet another way to obtain the solution, using “Poissonization”. Instead of generating numbers at discrete times, consider $4$ independent Poisson processes with rate $1$, one for each number.

The interarrival times of a Poisson process are independently geometrically distributed, so at time $t$ the probability that a given process has generated its number is $1-\mathrm e^{-t}$ and the probability that the process that generates $1$s has generated three $1$s is given by the cumulative distribution function $1-\mathrm e^{-t}\left(1+t+\frac12t^2\right)$ of an Erlang-$3$ distribution, the distribution of the sum of $3$ independent identically geometrically distributed variables. The sum of the $4$ Poisson processes is itself a Poisson process, with rate $4$. As shown in this answer, the expected number of number generations required is equal to number of generations expected to occur in the expected time required, which in this case yields

$$ 4\int_0^\infty\left(1-\left(1-\mathrm e^{-t}\right)^3\left(1-\mathrm e^{-t}\left(1+t+\frac12t^2\right)\right)\right)\mathrm dt=\frac{1915}{144}\;, $$

in agreement with the other two approaches.

joriki
  • 238,052