Probability: the average times to make all the balls the same color

Question

Suppose there are n balls with different colors with each other in a bag. In one loop, One take two balls in sequence out of the bag and replace them with two balls with the same color of the first ball. Q: how many loops does it take to make all the balls the same color on average?

Nice but (for me) difficult problem! From where did you get this question? I can find solutions for $n=2,3$ and maybe $n=4$ but see no pattern that allows me to generalize in a fashionable way. — drhab, Aug 27 '14 at 08:12
@drhab Thanks for your interest. I got this question from a Math forum. I verified that the solution is (n-1)^2 for n<=4, but no more progress. — Peter, Aug 27 '14 at 08:47
@Travis Thanks for your method, it seems works. I will verify that. — Peter, Aug 27 '14 at 08:48
@Peter Could you please cite the origin of the problem? Thank you. — Sasha, Aug 28 '14 at 13:35
@Sasha I found this problem from a Chinese forum, where the author said that he got the problem from someone named '郝酒'. The link is http://www.newsmth.net/nForum/#!article/IQDoor/162521 — Peter, Aug 29 '14 at 03:44
@Sasha I'm '郝酒' and posted this problem on fxkz,a Chinese Science forum. A friend shared this problem to me. — yibotg, Aug 29 '14 at 14:29

Sasha · Accepted Answer · 2014-08-28T13:34:22.110

Very interesting problem, and this is rather an write-up of a computation experiment to explore it.

As it has been already pointed out in the comments, the dynamics of the urn can be described by a Markov chain on integer partitions of $n$. Let $\{m_1, \ldots, m_k \}$ be such a partition. Suppose $i$, $j$, such that $1 \leqslant i,j\leqslant k$, are the two types of ball drawn in a loop. If $i=j$, the urn remains in the same state with the probability $\frac{m_i}{n} \frac{m_i-1}{n-1}$, otherwise it transitions to a new partition with $m_i$ increased by one, and $m_j$ decreased by 1 with probability $\frac{m_i}{n} \frac{m_j}{n-1}$.

Here a Mathematica code that constructs such a finite Markov process:

computeProbabilities[p : {n_}, as_] := {{as[p], as[p]} -> 1}
computeProbabilities[p_List, as_] := 
Module[{n = Length[p], tot = Total[p], bag, pnew},
  bag = {};
  Do[
   If[i == j,
    If[p[[i]] > 1, 
       AppendTo[bag, {as[p], as[p]} -> p[[i]]/tot (p[[i]]-1)/(tot-1)]
    ],
    pnew = DeleteCases[p + UnitVector[n, i] - UnitVector[n, j], 0]; 
    pnew = Sort[pnew, Greater];
    AppendTo[bag, {as[p], as[pnew]} -> p[[i]] p[[j]]/tot/(tot - 1)]
   ], {i, n}, {j, n}];
  Normal[Total /@ GroupBy[bag, First -> Last]]
]

buildMarkovProcess[n_Integer?Positive] := Module[{ip, as, tm, lip},
  ip = IntegerPartitions[n]; 
  as = AssociationThread[ip, Range[lip = Length[ip]]];
  tm = SparseArray[Flatten[computeProbabilities[#, as] & /@ ip], {lip, lip}];
  DiscreteMarkovProcess[lip, tm]
]

The least number of loops needed to get all ball to have the same color is the first passage time distribution to reach partition $\{n\}$, which has index 1 in this code, and which is the absorbing state of the Markov chain.

StepsToSameColorDistribution[n_Integer?Positive] := 
 FirstPassageTimeDistribution[buildMarkovProcess[n], 1]

We can now ask for the mean number steps $K_n$ needed to reach the absorbing state:

In[392]:= Table[{n, Mean[StepsToSameColorDistribution[n]]}, {n, 2, 12}]

Out[392]= {{2, 1}, {3, 4}, {4, 9}, {5, 16}, {6, 25}, {7, 36}, {8, 
  49}, {9, 64}, {10, 81}, {11, 100}, {12, 121}}

Which conforms to the pattern $\mathbb{E}(K_n) = (n-1)^2$:

In[392]:= FindSequenceFunction[%, n]

Out[392]= (-1 + n)^2

The intriguing feature of the $K_n$ random variable is revealed by looking at its probability generating function:

enter image description here

This reveals that $K_n - (n-1)$ can be represented as a sum of $n-2$ independent geometric random variables with some specific distinct failure probabilities. Putting this into code:

toTransformedDistribution[n_Integer] := Module[{pgf, den, z, ps, fgm, xvec},
  fgm = FactorialMomentGeneratingFunction[StepsToSameColorDistribution[n], z];
  pgf = Factor[fgm];
  den = Denominator[pgf];
  If[FreeQ[den, z],
   TransformedDistribution[n - 1, Distributed[x, GeometricDistribution[1/2]]]
   ,
   ps = Part[Rest[FactorList[den]], All, 1];
   ps = Map[1 + Coefficient[#, z, 1]/Coefficient[#, z, 0] &, ps];
   xvec = Array[x, n - 2];
   TransformedDistribution[n - 1 + Total[xvec], 
      Distributed[xvec, ProductDistribution @@ Map[GeometricDistribution, ps]]]
  ]
]

We check consistency:

In[424]:= 
Table[FactorialMomentGeneratingFunction[toTransformedDistribution[n], 
    z] == FactorialMomentGeneratingFunction[
    StepsToSameColorDistribution[n], z], {n, 2, 8}] // Simplify

Out[424]= {True, True, True, True, True, True, True}

And now show the decomposition. $K_3 \stackrel{d}{=} 2 + X_1$, where $X_1 \sim \mathrm{Geo}\left(\frac{1}{3}\right)$:

In[431]:= toTransformedDistribution[3]

Out[431]= TransformedDistribution[
 2 + X1, Distributed[X1, GeometricDistribution[1/3]]

$K_4 \stackrel{d}{=} 3 + X_1 + X_2$, where $X_1 \sim \mathrm{Geo}\left(\frac{1}{2}\right)$ and $X_2 \sim \mathrm{Geo}\left(\frac{1}{6}\right)$, and $X_1$ and $X_2$ are independent:

In[432]:= toTransformedDistribution[4]

Out[432]= TransformedDistribution[
 3 + X1 + X2, {X1, X2} \[Distributed] 
  ProductDistribution[GeometricDistribution[1/2], 
   GeometricDistribution[1/6]]]

Likewise $K_5 = 4 + X_1 + X_2 + X_3$, where $X_1 \sim \mathrm{Geo}\left(\frac{3}{5}\right)$, $X_2 \sim \mathrm{Geo}\left(\frac{3}{10}\right)$, $X_3 \sim \mathrm{Geo}\left(\frac{1}{10}\right)$, and $X_1$, $X_2$ and $X_3$ are independent:

In[433]:= toTransformedDistribution[5]

Out[433]= TransformedDistribution[
 4 + X1 + X2 + X3, {X1, X2, X3} \[Distributed] 
  ProductDistribution[GeometricDistribution[3/5], 
   GeometricDistribution[3/10], GeometricDistribution[1/10]]]

Of course, if anyone can offer an insight into why such a decomposition should take place, I would tip my hat to the tune of a bonus.

Added: Further experimental math analysis reveals a pattern to the geometric distribution failure rates in the decomposition of $K_n$, specifically $$ K_n \stackrel{\mathrm{law}}{=} n - 1 + \sum_{i=1}^{n-2} X_i, \quad X_m \sim \mathrm{Geom}\left(\frac{m (m+1)}{n(n-1)}\right) \, \mathrm{ for } \,\, 1 \leqslant m \leqslant n-2 $$ Hence, denoting $p_i = i(i+1)/n/(n-1)$ $$\begin{eqnarray} \mathbb{E}\left(K_n\right) &=& n-1 + \sum_{i=1}^{n-2} \left(\frac{1-p_i}{p_i}\right) = n - 1 + \sum_{i=1}^{n-2} \left( \frac{n(n-1)}{i} - \frac{n(n-1)}{i+1} - 1 \right) \\ &\stackrel{\mathrm{telesc.}}{=}& (n-1) + \frac{n(n-1)}{1} - \frac{n(n-1)}{n-1} - (n-2) = 1 + n(n-1) - n \\ &=& \fbox{$\left(n-1\right)^2$} \end{eqnarray} $$

The question remains open as to why $K_n$ can be decomposed into this sum of independent geometric random variables?

A paper that solves this problem is found http://www.ma.huji.ac.il/hart/papers/n-colors.pdf — Peter, Apr 05 '15 at 12:46

Probability: the average times to make all the balls the same color

1 Answers1

Linked