Balls are placed into 3 urns. Expected time until some urn has 100 balls.

Question

We have $3$ urns. At each round a ball is placed is placed into one of them, at random, with uniform probability. The game stops when some urn has $100$ balls.

What is expected duration of the game (number of rounds)?

Results from a simulation:

In coding: you can design a program that randomly increments one of three ints until one of them is $100$. Record the number of increments, and rerun. You should see a convergence towards a particular number given a large enough number of simulations. — Andrew Chin, Sep 24 '19 at 15:32
What is the probability that enemy $i$ is hit if enemy $i$ has been chosen? — callculus42, Sep 24 '19 at 15:33
@AndrewChin sure I wrote this simulation but it is not helping me at all, sure that you can have number through simulations but I would like to get formula (if it's possible) — quester, Sep 24 '19 at 15:40
Does not look trivial, but probably has been studied here. Thinking. Numerically (not simulation) I get 274.918644 — leonbloy, Sep 24 '19 at 15:40
@quester I´ve calculated the sum by using wolfram alpha, here. The result is $\large{252.5}$ — callculus42, Sep 24 '19 at 16:45
@quester You wrote that you don´t need numbers. But you should start to analyze the problem. For instance, my result does not match the graph you´ve posted. My first question is why? — callculus42, Sep 24 '19 at 17:01
@callculus what do you mean? avg of this data is 274.9... which matches result of leonbloy — quester, Sep 24 '19 at 17:30
@quester I don´t know what the mean at the graph is. But it does not clearly matches my result where I used your formula. Do you think your formula is wrong? — callculus42, Sep 24 '19 at 17:32
formula is for calculating expected number of steps... although when I come to think of it I took it from multinomial distribtion and I ignored some vectors so... probably sumation of probabilities will give value $<1$ so there should be magic component $1/prob_{sum}$ — quester, Sep 24 '19 at 17:36
@callculus I wonder where the mistake in your formula is (I would calculate it the exactly like that) — Sudix, Sep 27 '19 at 19:21
@Sudix problem with formula is that there should be $\sum p = 1$, but unfortunately it is $\sum p < 1$ it should be summed up and then multiplied by inverse $\sum p = c => p_{new} = \frac{p}{c}$ then $EX = \sum x p_{new}$ — quester, Sep 28 '19 at 16:21
@quester The idea was to fix that the first bowl always gets 100 pulls, and then sum all tuples (0,0) to (99,99) for the other two bowls up: $$\sum _{j=0}^{99} \left(\sum _{i=0}^{99}(100+i+j) \frac{\binom{i+j+100}{100} \binom{i+j}{i}}{3^{i+j+100}}\right)$$ So, if anything, I expected that without the $(100+i+j)$, the sum would sum up to $1/3$, but instead it sums up to $\approx 0.89$. Also, even if I norm the probabilities of the events so their sum goes to1, I get as expected value $275.5308620$, which is too far off from the other result. Somewhere there is still a hole in my reasoning — Sudix, Sep 28 '19 at 17:20
let's say that I thought about this problem for a long time and I know it's not trivial, althougt very fun! — quester, Sep 28 '19 at 17:24
@sudix Those are the probabilities for the state (100,i,j). However, that does not mean that this 100 was reached in the last placement of the ball. You can reach the state (100,i,j) via (99,i,j) but also via (100,i-1,j) or (100,i,j-1). See in my answer where I compute the case that a an urn has 100 in it in the last ball placement by computing the probability that the urn has 99 in it (and no then you have 1/3 probability to reach 100 in the next step). — Sextus Empiricus, Sep 28 '19 at 21:16

score 4 · Answer 1 · edited Jun 12 '20 at 10:38

Computation of the distribution

Let $n$ be the number of balls to draw. Let $m$ be the number of urns. Let $k$ be the target number of balls when the game stops.

You could express the probability to reach a stop in $n$ balls in terms of the probability that the number of balls in each urn is $k-1$ or less (the cumulative distribution).

The number of ways to put $n$ balls into $m$ urns is $m^n$ (with or without reaching the stop condition).
The number of ways to put $n$ balls into $m$ urns but not having reached the stop condition (that is having at most $k-1$ in each of them) can be found by enumerating over the set $S$ of vectors $\vec{k}$ (the numbers $(k_i)$ depicting the the number of balls in each $i$-th urn) that satisfy the condition $$\sum_i k_i = n \quad \text{and} \quad \forall i:0 \leq k_i < k$$ And for each of the vector $\vec{k}$ (a set of numbers $k_1,k_2,k_3$) that satisfies these conditions we compute the the number of ways to distribute the balls into the urns with those numbers which is $$\text{number of ways to put $k_i$ balls in urn $i$} = \frac{n!}{\prod_i{k_i!}}$$ Then we take the sum over all of this $$P(N \leq n) = \frac{1}{m^n}\sum_{\vec{k} \in S} \frac{n!}{\prod_{k_i\in \vec{k}}{k_i!}} $$ where the sum over $\vec{k} \in S$ means the summation over all vectors with numbers $k_i$ that satisfy the conditions and the product over $k_i \in \vec{k}$ means the product with all $k_i$ in the $\vec{k}$.

See below for an implementation in R code:

# computation
n <- 99
sum <- rep(0,3*n+1)
for (k1 in 0:n) {
  for (k2 in 0:n) {
    for (k3 in 0:n) {
      t = (k1+k2+k3)
      sum[t+1] = sum[t+1]+exp(lfactorial(t)-lfactorial(k1)-lfactorial(k2)-lfactorial(k3))
    }
  }
}
x <- c(0:(3*n))
Xcum <- c(sum/3^x,0)
simulation
set.seed(1)
draw <- function() {
  s <- sample(c(1:3),size = 300, replace=TRUE)
  min(which((cumsum(s==1)==100) | (cumsum(s==2)==100) | (cumsum(s==3)==100)))
}
q <- replicate(10^5,draw())
computation using beta function
drn <- function(n,k) {
  a <- max(0,n-2k+1)
  b <- min(k-1,n-k)
  choose(n-1,k-1)  2^(n-k) / 3^(n-1) *
      ( zipfR::Ibeta(0.5,n-k-b+1,b+1)/beta(n-k-b+1,b+1) - 
        zipfR::Ibeta(0.5,n-(k-1)-(a-1),(a-1)+1)/beta(n-(k-1)-(a-1),(a-1)+1) )
  #choose(n-1,k-1) * 2^(n-k) / 3^(n-1) * (pbinom(b,n-k,0.5)-pbinom(a-1,n-k,0.5))
}
drn <- Vectorize(drn)
#plotting both together
h <- hist(q, breaks=c(0:298)+0.5, xlim=c(200,300),
          xlab = "N", ylab = "probability", freq = FALSE, main="")
lines(1:298,-diff(Xcum),col=2)
lines(c(100:298),drn(c(100:298),100),col=3)

multinomial distribution

You can view this as related to the multinomial distribution which has the pdf

$$\frac {n!}{k_1! k_2! ... k_m!} p_1^{k_1} p_2^{k_2} ... p_m^{k_m} $$

which becomes for equal $p_i = 1/m $ the following

$$\frac {1}{m^n}\frac {n!}{k_1! k_2! ... k_m!} $$

which shows similarity with the expresssion before. Then the probability that for $n$ draws you did not reach 100 yet is equal to the probability that in 100 draws each $k_i<100$. And you can see the computation of you probability density as the computation of the CDF for the multinomial distribution.

Expression in terms of regularized incomplete beta function

For the case of three urns we may write an explicit expression for the probability in terms of the regularized incomplete beta function.

The probability that there are in the $n$-th draw $k$ balls in the first urn and less than $k$ in the others is, is equal to the 1/3 of the probability that there are in the $n-1$ draw $l= k-1$ balls in the first urn and equal or less than $l$ in the others is :

$$\begin{array}{rcrl} P_{k_1=l=k-1,k_2 \leq l,k_3 \leq l \vert n-1} &=& &\sum_{a \leq k_2 \leq b} \frac {1}{3^{n-1}}\frac {(n-1)!}{l! k_2! (n-1-l-k_2)!} \\ & = & \frac{(n-1)!}{l! 3^{n-1}} &\sum_{a \leq k_2 \leq b} \frac {1}{k_2! (n-1-l-k_2)!} \\ & = & {{n-1}\choose{l}} \frac{2^{n-1-l}}{3^{n-1}}& \sum_{a \leq k_2 \leq b} \underbrace{{n-1-l\choose{k_2}} \frac{1}{2^{n-1-l}}}_{\text{this is a binomial distribution}} \\ & = & {{n-1}\choose{k-1}} \frac{2^{n-k}}{3^{n-1}} & \left( I_{1/2}(n-k-b+1,b+1) - I_{1/2}(n-k-a+2,a) \right) \end{array}$$

with $a = max(0,n-2k+1)$ and $b = min(k-1,n-k)$

Computation of the expectation value

In the first part we computed $P(n>k) = 1-P(n\leq k)$. To obtain the mean you can sum over all of these. $\mu = \sum 1-P(n\leq k)$. This will give:

$$\sum_{k_1=0}^{99}\sum_{k_2=0}^{99}\sum_{k_3=0}^{99} \frac{1}{3^{k_1+k_2+k_3}} \frac{(k_1+k_2+k_3)!}{k_1!k_2!k_3!} = 274.9186 $$

nice solution, specially in the first part. But $(k_1,k_2,k_3)$ is not a set, it is a triple (vector) — G Cab, Sep 27 '19 at 18:07

leonbloy · Accepted Answer · 2019-09-27T22:20:01.513

The expected time can be expressed in terms of the incomplete gamma function as follows (inspired by this paper and comments here):

In general: We want the expected value of the time to wait $T$ until one of the $3$ urns contains $n$ ($=100$) balls. Then

$$E_{n}[T] = \sum_{t=1}^\infty P(T\ge t) = \sum_{t=0}^\infty p_{n}(t) \tag1$$

where $p_{n}(t) $ is the probability that after $t$ rounds ($t$ balls) all $3$ urns have less than $n$ balls. But this is equivalent to

$$ \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1} \frac{1}{3^{x+y+z}} \frac{(x+y+z)!}{x! \, y! \, z!} \tag2$$

Further, we use a property of the (upper) incomplete gamma function:

$$\begin{align} \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 &= \left( e^{-a} \sum_{r=0}^{n-1}\frac{a^r}{r!} \right)^3 \\&= e^{-3a} \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1}\frac{a^{x+y+z}}{x! \, y! \, z!} \tag3 \end{align}$$

Integrating and using $\int_0^\infty \exp(-3a) a^p da = p!/3^{p+1}$ we get

$$ \int_0^\infty \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 da= \sum_{x=0}^{n-1}\sum_{y=0}^{n-1}\sum_{z=0}^{n-1} \frac{1}{3^{x+y+z+1}} \frac{(x+y+z)!}{x! \, y! \, z!} \tag4$$

and finally

$$E_{n}[T] = 3 \int_0^\infty \left( \frac{\Gamma(n,a)}{\Gamma(n)} \right)^3 da \tag5$$

Balls are placed into 3 urns. Expected time until some urn has 100 balls.

2 Answers2

Computation of the distribution

simulation

computation using beta function

multinomial distribution

Expression in terms of regularized incomplete beta function

Computation of the expectation value

Linked