Expected number of ball tosses to have at least 5 balls in 4 out of 5 bins (Skyrim application)

Question

I have a bit of an interesting probability question that has an application to Skyrim and the number of quests you need to complete to get an achievement for the Thieves Guild. I can generalize the problem in terms of balls and bins.

Say you have an infinite number of balls available, and there are 5 bins, we can label them bins 1-5 (the bins are distinct). When you toss a ball, it is equally likely to fall into each bin (1/5 chance). What is the expected number of tosses so bins 1-4 have at least 5 balls in them? Each bin can hold an infinite number of balls, and we don't care about the balls falling into bin 5 (meaning it can't necessarily be the first 4 bins to have 5 balls).

I know that if I only cared about 1 bin reaching 5 balls, the expected value would be 5/p where p is the probability (1/5), but I can't continue this logic once one of the bins has 5 balls since the other bins may already have balls in them (the "misses" from trying to fill the first bin) so I have to use some other reasoning.

I wrote some code that I think simulates the rules above and I am getting around 29.7, which is lower than I would expect (the absolute minimum tosses is 20) so I would like to confirm or disprove this result as well as know how to generate a mathematical formula and calculate this without code.

Link to the code:https://github.com/nodnarb22/Skyrim-Thieves-Guild-Radiant-Quest-Simulator/blob/main/thievesguild

Any help or input would be much appreciated!

I would do it recursively, via states. Label a state as $(n_1,n_2,n_3,n_4)$ where $n_i$ is the number of balls in bin $#i$ (or rather, the max of that and $5$), and then, of course, $E(n_1,n_2,n_3,n_4)$ is the expected number of trials until you win, given that you start in the indicated state. Of course $E(5,5,5,4)=5$ and any permutation of the ${n_i}$ gives the same expectation. — lulu, Jun 13 '22 at 23:01
In my opinion, both the direct approach and Inclusion-Exclusion seem ugly. Therefore, I would also put my money on recursion, as suggested by @lulu. — user2661923, Jun 13 '22 at 23:23
The difficulty with Inclusion-Exclusion may be visualized by considering that after $n$ throws, that bin-1 fails to have $5$ balls. The problem is that there are $5$ possibilities, because bin-1 might have any element in ${0,1,2,3,4}$. So, when setting up the subsets of unsatisfactory possibilities, you have to consider $5$ sub-cases for each subset. This gets ugly fast. — user2661923, Jun 13 '22 at 23:26
Note: since permutations don't change the expected value, you can simplify the computation somewhat by using as states unordered $4-$tuples of non-negative integers bounded above by $5$. That reduces the number of active states a lot, but I expect you'll still need to automate the computation. — lulu, Jun 13 '22 at 23:27
@lulu I'm not sure where to go with the states method. E(5,5,5,4) and all its permutations are equal to 5, and I can figure out E(5,5,5,0) = 25, but I would not know how to get E(5,5,4,4) due to the problem that misses have a probability of landing in one of the "good" bins which are not accounted for if I treat it as a E(5,5,5,4) + E(5,5,4,5) state. — quantumtunnler, Jun 14 '22 at 09:54
Well, look at the transitions, and work with the unordered collections. ${5,5,4,4}$ has a $\frac 35$ chance of staying where it is (as three of the five bins are not helpful) and a $\frac 25$ chance of moving to ${5,5,5,4}$. Thus $E{5,5,4,4}=1+\frac 35\times E{5,5,4,4}+\frac 25\times E{5,5,5,4}\implies E{5,5,4,4}=\frac {15}2$. — lulu, Jun 14 '22 at 11:33

Markus Scheuer · Answer 1 · 2022-06-14T17:19:38.773

Here is a starter. Let $p_k$ denote the probability that we have after $k$ tosses in bin $1$ to bin $4$ the first time that each of them contains at least five balls. We can write $p_k$ as \begin{align*} \color{blue}{p_k=\frac{4}{5^k}\binom{k-1}{4}\sum_{j_1\geq 5}\binom{k-5}{j_1}\sum_{j_2\geq 5}\binom{k-5-j_1}{j_2} \sum_{j_3\geq 5}\binom{k-5-j_1-j_2}{j_3}}\tag{1} \end{align*} The expression is valid due to the following. We consider the situation that after $k-1$ tosses we have precisely one out of the four bins containing four balls, whereas the other of these four bins contain at least five balls. Some other balls, we don't care might be in the fifth bin.

With the $k$-th toss the bin with $4$ balls gets one more ball with probability $\frac{1}{5}$.
We assume wlog bin $1$ has four balls after $k-1$ tosses and respect this symmetry with a factor $4$.
There are $\binom{k-1}{4}$ ways that bin $1$ has four balls after $k-1$ tosses,
leaving $\binom{k-1-4}{j_1}=\binom{k-5}{j_1}, j_1\geq 5$ ways that bin $2$ has at least $j_1$ balls,
leaving $\binom{k-5-j_1}{j_2}, j_2\geq 5$ ways that bin $3$ has at least $j_2$ balls,
leaving $\binom{k-5-j_1-j_2}{j_3}, j_3\geq 5$ ways that bin $4$ has at least $j_3$ balls,
and $k-5-j_1-j_2-j_3\geq 0$ balls go to bin $5$.

We can use multinomial coefficients \begin{align*} \binom{k-5}{j_1}&\binom{k-5-j_1}{j_2}\binom{k-5-j_1-j_2}{j_3}\\ &=\frac{(k-5)!}{j_1!(k-5-j_1)!}\,\frac{(k-5-j_1)!}{j_2!(k-5-j_1-j_2)!}\, \frac{(k-5-j_1-j_2)!}{j_3!(k-5-j_1-j_2-j_3)!}\\ &=\binom{k-5,4}{j_1,j_2,j_3,k-5-j_1-j_2-j_3}\tag{2} \end{align*} and derive from (1) and (2) a probability generating function

\begin{align*} \color{blue}{Q(z)=\sum_{k\geq 20}\frac{4}{5^k}\binom{k-1}{4}\sum_{j_1,j_2,j_3\geq 5} \binom{k-5,4}{j_1,j_2,j_3,k-5-j_1-j_2-j_3}z^k} \end{align*} so that the wanted expectation value can be found as \begin{align*} \color{blue}{\mathbb{E}(5,5,5,.)=Q^{\prime}(1)} \end{align*}

Regrettably I don't see a convenient way to write $Q(z)$ as rational function in $z$, so that the expection value can be easily derived.

But at least two plausibility checks: We consider the cases $p_{20}$ and $p_{21}$. Denoting with $[z^n]$ the coefficient of $z^n$ of a series we obtain

\begin{align*} \color{blue}{p_{20}}&=[z^{20}]Q(z)\\ &=\frac{4}{5^{20}}\binom{19}{4}\sum_{j_1,j_2,j_3\geq 5}\binom{15,4}{j_1,j_2,j_3,15-j_1-j_2-j_3}\\ &\,\,\color{blue}{=\frac{4}{5^{20}}\binom{19}{4}\binom{15,4}{5,5,5,0}}\\ \\ \color{blue}{p_{21}}&=[z^{21}]Q(z)\\ &=\frac{4}{5^{21}}\binom{20}{4}\sum_{j_1,j_2,j_3\geq 5}\binom{16,4}{j_1,j_2,j_3,16-j_1-j_2-j_3}\\ &\,\,\color{blue}{=\frac{4}{5^{21}} \binom{20}{4}\left(\binom{16,4}{5,5,5,1}+\binom{16,4}{5,5,6,0}+\binom{16,4}{5,6,5,0}+\binom{16,4}{6,5,5,5,0}\right)} \end{align*}

awkward · Answer 2 · 2022-06-16T13:37:46.380

The expected number of tosses necessary until bins 1-4 all contain at least 5 balls is $37.1378$. This value is consistent with a Monte Carlo simulation I wrote but not with the OP's simulated value of $29.7$. I think this is due to an error in the use of the random.randint function in the linked-to Python code. One should be aware that the function random.randint(1,5) returns only integers in the range $[1,4]$; it will never return $5$.

The following solution uses exponential generating functions. The reader not familiar with generating functions may find many resources in the answers to the question How can I learn about generating functions?

Define $T$ to be the number of the first toss in which bins 1-4 all contain at least $5$ balls each (and bin $5$ contains any number of balls whatever), and let $p_n = P(T \le n)$. The EGF of $p_n$ is

$$f(x) = \left( e^{x/5} - 1 - \left( \frac{x}{5} \right) - \frac{1}{2!} \left( \frac{x}{5} \right)^2 - \frac{1}{3!} \left( \frac{x}{5} \right)^3 - \frac{1}{4!} \left( \frac{x}{5} \right)^4 \right)^4 \; e^{x/5} \tag{*}$$

We are interested in $q_n = P(T > n)$. Since $q_n = 1 - p_n$, the EGF of $q_n$ is $e^x - f(x)$. By a well-known theorem, $E(T) = \sum_{n=0}^{\infty} q_n$. Making use of the identity $$n! = \int_0^{\infty} e^{-x} x^n \; dx$$ in combination with the definition of the EGF $$e^x - f(x) = \sum_{n=0}^\infty \frac{q_n}{n!} x^n$$ we have $$\sum_{n=0}^{\infty} q_n= \int_0^{\infty} e^{-x} (e^x - f(x)) \; dx$$ So $$E(T) = \int_0^{\infty} e^{-x} (e^x - f(x)) \; dx$$ where $f(x)$ is given by $(*)$.

Evaluating the integral (I used Mathematica) yields $E(T) = 37.1378$.

Yes, this is the way go. It generalizes easily to more bins and other imposed minimal numbers of balls in the bins (like $5,5,5,5,0$ in the five bins in our case). The argument can be also adapted to compute related more complicated parameters like the variance of the random variable $T$... plus one. — dan_fulea, Jun 18 '22 at 10:13

score 2 · Answer 3 · answered Jun 14 '22 at 19:19

Let's start with considering that $$ \begin{array}{l} \left( {x_1 + x_2 + x_3 + x_4 + x_5 } \right)^n = \cdots + x_{k_{\,1} } x_{k_{\,2} } \cdots x_{k_{\,n} } + \cdots \quad \left| {\;k_j \in \left\{ {1,2, \cdots ,5} \right\}} \right.\quad = \\ = \cdots + x_{\,j_{\,1} } ^{r_{\,1} } x_{\,j_{\,2} } ^{r_{\,2} } \cdots x_{\,j_{\,n} } ^{r_{\,n} } + \cdots \quad \left| \begin{array}{l} \;j_i \in \left\{ {1, \ldots ,5} \right\} \\ \;\sum\limits_i {r_i } = n \\ \end{array} \right.\quad = \\ = \sum\limits_{\left\{ {\begin{array}{*{20}c} {0\, \le \,k_{\,j} \,\left( { \le \,n} \right)} \\ {k_{\,1} + k_{\,2} + \, \cdots + k_{\,5} \, = \,n} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n \\ k_{\,1} ,\,k_{\,2} ,\, \cdots ,\,k_{\,5} \\ \end{array} \right)x_{\,1} ^{k_{\,1} } x_{\,2} ^{k_{\,2} } \cdots x_{\,5} ^{k_{\,5} } } \\ \end{array} $$ is enumerating all possible sequences of $n$ tosses ending with $k_j$ balls in box $j$, and $$ \begin{array}{l} \left( {1 + 1 + 1 + 1 + 1} \right)^n = 5^n = \\ = \sum\limits_{\left\{ {\begin{array}{*{20}c} {0\, \le \,k_{\,j} \,\left( { \le \,n} \right)} \\ {k_{\,1} + k_{\,2} + \, \cdots + k_{\,5} \, = \,n} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n \\ k_{\,1} ,\,k_{\,2} ,\, \cdots ,\,k_{\,5} \\ \end{array} \right)} \\ \end{array} $$

Now let's consider the configuration of boxes having respectively $\ge 5, \ge 5,\ge 5,\ge 5, \le 4 $ balls: last box has a different content, it is distinguishable and we have $5$ ways to choose it out of the five.
So the number of sequences that have such a configuration after $n$ tosses is $$ \begin{array}{l} N(n) = 5\sum\limits_{\left\{ {\begin{array}{*{20}c} {5\, \le \,k_{\,1,2,3,4} \,\left( { \le \,n} \right)} \\ {\,0 \le k_{\,5} \le 4} \\ {k_{\,1} + k_{\,2} + \, \cdots + k_{\,5} \, = \,n} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n \\ k_{\,1} ,\,k_{\,2} ,\, \cdots ,\,k_{\,5} \\ \end{array} \right)} = \\ = 5\sum\limits_{\left\{ {\begin{array}{*{20}c} {0\, \le \,j_{\,1,2,3,4} \,\left( { \le \,n - 5} \right)} \\ {\,0 \le k\left( { \le 4} \right)} \\ {j_{\,1} + j_{\,2} + \,j_{\,3} + j_{\,4} \, = \,n - 24 + k} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n \\ 5 + j_{\,1} ,\,5 + j_{\,2} ,\,5 + j_{\,3} ,5 + j_{\,4} ,\,4 - k \\ \end{array} \right)} = \\ = \quad \ldots \\ \end{array} $$ there are many ways to rewrite the multinomial in terms of binomials etc. and I will omit them.

Clearly $$ \begin{array}{l} N(n) = 0\quad \left| {0 \le n \le 19} \right. \\ N(20) = 5\frac{{20!}}{{\left( {5!} \right)^4 0!}} \\ \quad \vdots \\ \end{array} $$

But to answer to your question, the above is not much of interest.
We need in fact to find the number of sequences that becomes "successful" at the n-th toss.

The $n-1$ -sequences which can become successful just at the following step $n$ are only of these two types $$ \begin{array}{l} \left\{ { \ge 5,\; \ge 5,\; \ge 5,\; = 4,\; = 4} \right\}, \\ \left\{ { \ge 5,\; \ge 5,\; \ge 5,\; = 4,\; < 4} \right\} \\ \end{array} $$ and since they can be permuted, we have respectively $$ \left( \begin{array}{c} 5 \\ 2 \\ \end{array} \right),\; 2\left( \begin{array}{c} 5 \\ 2 \\ \end{array} \right) $$ ways to arrange them, and thereafter

two ways to place the $n$th ball for the first,
one way the second.

Therefore $$ \begin{array}{l} N_{first} (n) = 2\left( \begin{array}{c} 5 \\ 2 \\ \end{array} \right)\left( {\sum\limits_{\left\{ {\begin{array}{*{20}c} {5\, \le \,k_{\,1,2,3} \,\left( { \le \,n - 9} \right)} \\ {k_{\,1} + k_{\,2} + \, \cdots + k_{\,5} \, = \,n} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n - 1 \\ k_{\,1} ,\,k_{\,2} ,k_{\,3} ,4,\,4 \\ \end{array} \right)} + \sum\limits_{\left\{ {\begin{array}{*{20}c} {5\, \le \,k_{\,1,2,3} \,\left( { \le \,n - 5 - j} \right)} \\ {0 \le j \le 3} \\ {k_{\,1} + k_{\,2} + k_{\,3} \, + j\, = \,n - 5} \\ \end{array}} \right.\;} {\left( \begin{array}{c} n - 1 \\ k_{\,1} ,\,k_{\,2} ,k_{\,3} ,4,\,j \\ \end{array} \right)} } \right) = \\ = \quad \cdots \\ \end{array} $$ and for the probability $$ P_{first} (n) = \frac{{N_{first} (n)}}{{5^n }} $$ and then the expected $n$ follows obviously.

score 0 · Answer 4 · answered Jun 18 '22 at 10:05

I will try to give (A) a solution on the lines first suggested by lulu short after the question shown up. It uses only elementary steps. We will get an explicit rational number $E$ as expectation after computing a finite sum obtained by simple combinatorial means: $$ \color{blue}{ E = \frac{14127973228249375}{380420285792256} =\frac{5^4 \cdot 11^3 \cdot 16983288629}{2^{31} \cdot 3^{11}} \approx 37.137801941415212874\dots } $$ The formula for the sum given in the sequel and leading to the above fraction explains without computations why the denominator should have that shape, it is a $\Bbb Z$-linear combinations of expressions like $ \left(\frac 1p-1\right)^{N_1} \left(\frac 1p-2\right)^{N_2} \left(\frac 1p-3\right)^{N_3} \left(\frac 1p-4\right)^{N_4} $, where $p=\frac15$ is the probability to get each ball in one / each bin, and the involved powers are (specific random) natural numbers.

It is nice and affordable to have (B) a confirmation of the $E$-value from a solution based on different ideas, awkward's solution is indeed awkward, i like it indeed, it's giving also a rational number after computing an integral on $[0,\infty)$, same number, the code is simpler, because this solution is structural. Finally, some simulation part (C) should convince the experimentally thinking reader that the mean is around $37.1$. Computer support will be needed on the road to have a "short answer", i will use sage with hopefully readable code.

(A)

The used modelling probability space $\Omega$ is the space of "paths" which are infinite words $\omega=w_1w_2w_3\dots$ with letters in the alphabet $A=\{1,2,3,4,5\}$. At time $n$ we see only the truncated word $\omega'=w_1w_2w_3\dots w_n$ from $\omega$. Denote by $|\omega'|$ the length of $\omega'$, which is $n$ in this last sample. Denote by $|\omega'|_k$ the number of letters $=k$ in $\omega'$. So $|\omega'|= |\omega'|_1 + |\omega'|_2 + |\omega'|_3 + |\omega'|_4 + |\omega'|_5$. The filtration of $\Omega$ is at time $n$ the $\sigma$-algebra generated by the events $E(\omega')$, $E(\omega')$ being the set of all $\omega$ paths starting with that $\omega'$ word of length $n$.

We consider only the paths having the first bin filled first with $5$ balls, then the second bin, then the third one, then the fourth one. The symmetric group is acting on the four bins to get at least five in time. So we have to multiply with $4!$.

So we are passing in order through the following "states" (which are events): $$ \boxed{0\ 0\ 0\ 0\ |\ 0} \overset{(1)}\longrightarrow \boxed{5\ a\ b\ c\ |\ \#}\overset{(2)}\longrightarrow \boxed{*\ 5\ d\ f\ |\ \#}\overset{(3)}\longrightarrow \boxed{*\ *\ 5\ g\ |\ \#}\overset{(4)}\longrightarrow \boxed{*\ *\ *\ 5\ |\ \#} \ . $$ They are each specific unions of $E(\omega')$ cylinders. The $k$.th component of a state counts the number of balls in the $k$.th bin, i.e. matches $|\omega'|_k$. The $\#$ stays for any natural number, the $*$ for any natural number $\ge 5$, the $5$ for the five, and
for this five we insist that we first get this fifth ball at last in $\omega'$. Explicitly: $$ \begin{aligned}[]{} \boxed{0\ 0\ 0\ 0\ |\ 0} &= E(\text{ empty word })=\Omega\ , \\ \boxed{5\ a\ b\ c\ |\ \#} &= \bigsqcup_{\omega'}E(\omega') ,\ &&|\omega'|_1=5 ,\ |\omega'|_2=a ,\ |\omega'|_3=b ,\ |\omega'|_4=c \ ;\ \omega'\text{ ends in }1\ , \\ \boxed{*\ 5\ d\ f\ |\ \#} &= \bigsqcup_{\omega'}E(\omega') ,\ &&|\omega'|_2=5 ,\ |\omega'|_3=d ,\ |\omega'|_4=f \ ;\ \omega'\text{ ends in }2\ , \\ \boxed{*\ *\ 5\ g\ |\ \#} &= \bigsqcup_{\omega'}E(\omega') ,\ &&|\omega'|_3=5 ,\ |\omega'|_4=g \ ;\ \omega'\text{ ends in }3\ , \\ \boxed{*\ *\ *\ 5\ |\ \#} &= \bigsqcup_{\omega'}E(\omega') ,\ &&|\omega'|_4=5 \ ;\ \omega'\text{ ends in }4\ . \end{aligned} $$

During the passage $(1)$ there are exactly $5$, $a$, $b$, $c$ balls falling respectively in the bins $1,2,3,4$, but there may be some "wasted" balls falling in bin $5$, let $j_1$ be their number. Let $N_1=5+a+b+c$ be the number of "useful" balls for this step.
During the passage $(2)$ there are exactly $a'=5-a$, $b'=d-b$, $c'=f-c$ balls falling respectively in the bins $2,3,4$, but there may be some "wasted" balls falling in bins $1,5$, let $j_2$ be their number. Let $N_2=a'+b'+c'$ be the number of "useful" balls for this step.
During the passage $(3)$ there are exactly $b''=5-d$, $c''=g-f$ balls falling respectively in the bins $3,4$, but there may be some "wasted" balls falling in bins $1,2,5$, let $j_3$ be their number. Let $N_3=b'+c'$ be the number of "useful" balls for this step.
During the passage $(4)$ there are exactly $c'''=5-g$ balls falling respectively in the last needed bin $4$, but there may be some "wasted" balls falling in bins $1,2,3,5$, let $j_4$ be their number. Let $N_4=c'''$ be an other alias the number of "useful" balls for this step.

Then we can write down the formula for the mean value $M$ of steps to get $5$ balls in each bin, in the bin order $1,2,3,4$: $$ \begin{aligned} M &= \sum_{ \substack{a,b,c;d,f;g\\j_1,j_2,j_3,j_4\ge 0} } \binom{N_1-1+j_1}{5-1,a,b,c,j_1}p^{N_1-1}\cdot p^{j_1}\cdot p \\ &\qquad\qquad\qquad\qquad \cdot \binom{N_2-1+j_2}{a'-1,b',c',j_2}p^{N_2-1}\cdot (2p)^{j_2}\cdot p \\ &\qquad\qquad\qquad\qquad\qquad\qquad \cdot \binom{N_3-1+j_3}{b''-1,c'',j_3}p^{N_3-1}\cdot (3p)^{j_3}\cdot p \\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad \cdot \binom{N_4-1+j_4}{c'''-1,j_4}p^{N_4-1}\cdot (4p)^{j_4}\cdot p \\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad \cdot \Big(\underbrace{N_1+N_2+N_3+N_4}_{=20}+j_1+j_2+j_3+j_4\Big) \\ % % % &= \sum_{ \substack{a,b,c;d,f;g\\j_1,j_2,j_3,j_4\ge 0} } \binom{N_1-1}{5-1,a,b,c} \binom{N_2-1}{a'-1,b',c'} \binom{N_3-1}{b''-1,c''}p^{20} \\ &\qquad\qquad \cdot \binom{N_1-1+j_1}{j_1} \binom{N_2-1+j_2}{j_2} \binom{N_3-1+j_3}{j_3} \binom{N_4-1+j_4}{j_4} \cdot p^{j_1}(2p)^{j_2}(3p)^{j_3}(4p)^{j_4} \\ &\qquad\qquad\qquad\qquad \cdot \Big(20+j_1+j_2+j_3+j_4\Big) \\ % % % &= \sum_{a,b,c;d,f;g} \binom{N_1-1}{5-1,a,b,c} \binom{N_2-1}{a'-1,b',c'} \binom{N_3-1}{b''-1,c''}p^{20} \\ &\qquad\qquad \cdot \frac 1{(1-p)^{N_1}} \cdot \frac 1{(1-2p)^{N_2}} \cdot \frac 1{(1-3p)^{N_3}} \cdot \frac 1{(1-4p)^{N_4}} \\ &\qquad\qquad\qquad\qquad \cdot \left(20 + \frac{N_1\cdot p}{1-p} + \frac{N_2\cdot 2p}{1-2p} + \frac{N_3\cdot 3p}{1-3p} + \frac{N_4\cdot 4p}{1-4p} \right) \ . \end{aligned} $$ The sum was splitted in pieces corresponding to the terms $20$, $j_1$, $j_2$, $j_3$, $j_4$. We have used the formula $$ \sum_{j\ge 0} j\cdot \binom{N-1+j}{j}q^j = \frac {Nq}{(1-q)^{N+1}} \ . $$ The last sum is finite and can be computed. To get the mean $E$ of step, considered without any restriction on the order of the bins first getting five balls, recalling the action of the symmetric group, we have $E=4!\; M$.

M = 0
p, q1, q2, q3, q4 = 1/5, 4/5, 3/5, 2/5, 1/5
for a, b, c in cartesian_product([[0..4], [0..4], [0..4]]):
    N1, C1 = (5+a+b+c), multinomial(5-1, a, b, c)
    for d, f in cartesian_product([[b..4], [c..4]]):
        N2, C2 = (5+d+f)-(a+b+c), multinomial(5-a-1, d-b, f-c)
        for g in [f..4]:
            N3, N4, C3 = (5+g)-(d+f), 5-g, multinomial(5-d-1, g-f)
        M += C1 * C2 * C3 * p^20 / q1^N1 / q2^N2 / q3^N3 / q4^N4 \
             * ( 20 + N1*p/q1 + N2*2*p/q2 + N3*3*p/q3 + N4*4*p/q4 )


E = QQ(24*M)
print(f'Computed value of M is:\nM = {M}')
print(f'Answer to the question is E = 24 M:\nE = {E} = {E.factor()}')
print(f'E ~ {E.n(200)}')

And the above code delivers:

Computed value of M is:
M = 14127973228249375/9130086859014144
Answer to the question is E = 24 M:
E = 14127973228249375/380420285792256 = 2^-31 * 3^-11 * 5^4 * 11^3 * 16983288629
E ~ 37.137801941415212874629304030929071265672012509384861160505

Explicitly: $$ \color{blue}{ E = \frac{14127973228249375}{380420285792256} =\frac{5^4 \cdot 11^3 \cdot 16983288629}{2^{31} \cdot 3^{11}} \approx 37.137801941415212874\dots } $$

(B)

As in the answer of awkward,

and as in the book Analytic Combinatorics, Philippe Flajolet, Robert Sedgewick, page 113 (out of more than 800), II.3 Surjections, Set Partitions, and Sets, Example II.9, formulas (21) and (22),

the usage of exponential generating functions (EGF) is a natural approach.

An early form of the book is / may have been Analytic Combinatorics, Symbolic Combinatorics, Philippe Flajolet, Robert Sedgewick, 2002, page 78, Example 8, Random allocations (balls-in-bins model).

For the convenience of the reader i will cite from either book.

Example. Random allocations (balls-in-bins model). Throw at random $n$ distinguishable balls into $m$ distinguishable bins. A particular realization is described by a word of length $n$ (balls are distinguishable, say, as numbers from $1$ to $n$) over an alphabet of cardinality m (representing the bins chosen). Let $\operatorname{Min}$ and $\operatorname{Max}$ represent the size of the least filled and most filled bins, respectively. Then, $$ \tag{$21$} $$ $$ \begin{aligned} \mathbb P\{\ \operatorname{Max} \le b\ \} &= n! \ [z^n]\ e_b\left(\frac zm\right)^m \\ \mathbb P\{\ \operatorname{Min} > b\ \} &= n! \ [z^n]\ \left(\exp\frac zm -e_b\left(\frac zm\right)\right)^m \ . \end{aligned} $$ The justification of this formula relies on the easy identity $$ \tag{$22$} \frac 1{m^n} [z^n] f(z) \equiv [z^n] f\left(\frac zm\right)\ , $$ and on the fact that a probability is determined as the ratio between the number of favorable cases (given by $(19)$) and the total number of cases ($m^n$).

Here, $e_b(z)$ is the truncated version of $\exp z$, the Taylor expansion of $\exp$ around $z=0$ stopping in degree $b$. The operator $[z^n]$ isolates from an analytic function $f$ the piece in $z^n$ in its Taylor series around zero. In our problem, we have to change slightly the second line in $(21)$, we use four times the factor $\left(\exp\frac zm -e_b\left(\frac zm\right)\right)$ with $b=4$, $m=5$ (strictly more than $4$ balls in the first four bins) and one more factor $\exp\frac zm=\left(\exp\frac zm -e_{-1}\left(\frac zm\right)\right)$ (strictly more than $-1$ balls in the last bin), which is $(*)$ from awkward's answer: $$ p_n:=\mathbb P[T\le n] =n!\; [z^n]\ \left(\exp\frac z5 -e_4\left(\frac z5\right)\right)^4\exp\frac z5 \ , $$ where $T$ is the random variable whose mean $E=\Bbb E[T]$ is wanted. The coefficients $(p_n)$ fished in running degree $n$ from the above analytic function converge increasingly to one, $\nearrow 1$. To get the mean we build $q_n=1-p_n\searrow 0$, and add them. As in awkard's answer: $$ \begin{aligned} E&=\Bbb E[T]=\sum q_n \\ &= \sum_{n\ge 0} n!\; [z^n]\ \exp z-\left(\exp\frac z5 -e_4\left(\frac z5\right)\right)^4\exp\frac z5 \\ &= \int_0^\infty \exp (-z)\left[ \exp z-\left(\exp\frac z5 -e_4\left(\frac z5\right)\right)^4\exp\frac z5 \right]\; dz\\ &= \int_0^\infty \left[ 1-\left(1-\exp\left(-\frac z5\right)e_4\left(\frac z5\right)\right)^4 \right]\; dz \\ &= \int_0^\infty \left[ 1-\left(1-e^{-y}\left(1+y+\frac{y^2}2+\frac{y^3}6+\frac{y^4}{24}\right)\right)^4 \right]\; 5\;dy \\ &= 5\int_0^\infty 4\cdot e^{-y}\left(1+y+\frac{y^2}2+\frac{y^3}6+\frac{y^4}{24}\right)\;dy \\ &\qquad - 5\int_0^\infty 6\cdot e^{-2y}\left(1+y+\frac{y^2}2+\frac{y^3}6+\frac{y^4}{24}\right)^2\;dy \\ &\qquad + 5\int_0^\infty 4\cdot e^{-3y}\left(1+y+\frac{y^2}2+\frac{y^3}6+\frac{y^4}{24}\right)^3\;dy \\ &\qquad - 5\int_0^\infty 1\cdot e^{-4y}\left(1+y+\frac{y^2}2+\frac{y^3}6+\frac{y^4}{24}\right)^4\;dy \ . \end{aligned} $$

The last expression can be computed also manually, by computing some powers of some polynomials, then replacing $y$-powers by factorials, using sage:

var('y');
e4 = 1 + y + y^2/2 + y^3/6 + y^4/24
f4 = exp(-y) * e4
E = 5*integrate( 4*f4 - 6*f4^2 + 4*f4^3 - f4^4, y, 0, oo )
E = QQ(E)
print(f'E = {E}\n  = {E.factor()}\n  ~ {E.n()}')

And we get:

E = 14127973228249375/380420285792256
  = 2^-31 * 3^-11 * 5^4 * 11^3 * 16983288629
  ~ 37.1378019414152

(C) Simulation.

import numpy as np
r = np.random.default_rng(int(1234567890))    # randomizer
E = 0.0
N = 10*6    # trials
for trial in range(N):
    a = r.integers(low=1, high=6, size=150)    # random array with entries among 1,2,3,4,5
    try:
        T = 1 + max( [ int( np.argwhere(       (a-2)(a-3)(a-4)(a-5) )[4] ),
                       int( np.argwhere( (a-1)      (a-3)(a-4)(a-5) )[4] ),
                       int( np.argwhere( (a-1)(a-2)      (a-4)(a-5) )[4] ),
                       int( np.argwhere( (a-1)(a-2)(a-3)      *(a-5) )[4] ), ] )
        E += T / N
    except:
        pass

print(E)

This time i've got:

37.13368599999

Here, a is an array of size $150$ with entries among $1,2,3,4,5$, and for instance a - 3 is the array obtained from a by subtracting $3$ from each component. The product (a-2)*(a-3)*(a-4)*(a-5) is also built componentwise, and then np.argwhere fishes the positions with non-zero values. So it fishes the positions of the 1. The pythonically fourth, humanly fifth position is then the index of the humanly fifth occurrence of 1 in the list a. This index is also pythonic, so we need to add one to get its human version. One can do better, but the above lazy code for the simulation is simpler to explain, and should be easier to digest.

Expected number of ball tosses to have at least 5 balls in 4 out of 5 bins (Skyrim application)

4 Answers4