3

Assume we have m groups of n balls and the balls in the same group have the same color. So there are m*n balls in total. Now, suppose we randomly choose k>(2*n) balls from the set of m*n balls. How much is the probability that the chosen k balls contain all balls of at least two different colors (entirely all balls of two groups)?

In other words, the unchosen set of balls contains the balls of different m-2 colors at most (instead of m colors).

To grasp better, notice the picture of 3*4 ( = n*m) balls. Each group of 3 balls has the same color. The probability I am looking for is to choose k balls containing the balls of two entire groups. For example, choosing balls 1, 5, 9, 3, 7, 11, 8 (contains all yellow and blue balls).

4 groups of 3 balls with the same colors: n = 3, m = 4

I hope I could explain the problem clearly. I have implemented a simulator for testing different scenarios. Then I tested the simulated results with different combinatorial/binomial solutions. But I get different results every time and now I'm lost.

This is my simulator in python testing different choices many times:

from random import sample
from collections import Counter
m = 4
n = 3
k = 5
it = 100000
balls = range(m*n)
cf = 0
for i in range(it):
    choices = sample(balls, k)
    samecolors = map(lambda x:x%m, choices)
    cnt = Counter(samecolors)
    mc = cnt.most_common(2)
    if (mc[-1][-1] == n): // if the second most common chosen color has *n* balls
        cf += 1
print(float(cf)/float(it))

2 Answers2

3

The overall number of ways to choose $k$ out of $mn$ balls is $\binom{mn}{k}$, where all balls are assumed to be distinguishable. Among them naively there are $$\binom mr\binom{mn-rn}{k-rn}$$ combinations consisting of at least $r$ full sets of balls of the same color. However if $k\ge(r+1)n$ the above expression will double-count all combinations consisting of more than $r$ full sets, which should be accounted for. The correct way for this is the generalised inclusion-exclusion principle: $$ \nu_r=\sum_{j\ge r}(-1)^{j-r}\binom jr\binom mj\binom{mn-jn}{k-jn}, $$ which gives the number of combinations with exactly $r$ full sets.

To obtain the number of combinations with at least $r$ full sets one should sum the above expressions: $$\begin{align} N_r=\sum_{i\ge r}\nu_i&=\sum_{i\ge r}\sum_{j\ge i}(-1)^{j-i}\binom ji\binom mj\binom{mn-jn}{k-jn}\\ &=\sum_{j\ge i}(-1)^j\binom mj\binom{mn-jn}{k-jn}\sum_{i\ge r}(-1)^{i}\binom ji\\ &=\sum_{j\ge r}(-1)^{j-r}\binom{j-1}{r-1}\binom mj\binom{mn-jn}{k-jn}. \end{align}$$

Thus, the probability in question reads (with $r=2$): $$ p_r=\frac {\sum_{j\ge r}(-1)^{j-r}\binom{j-1}{r-1}\binom mj\binom{mn-jn}{k-jn}}{\binom{mn}k}. $$

user
  • 26,272
  • Thanks for your answer! It matches perfectly with the simulation results! – Gongotar May 15 '20 at 05:55
  • @masoodgholami You're welcome! – user May 15 '20 at 07:08
  • I posted a new question with regard to this one (a little bit more general) here: https://math.stackexchange.com/questions/3675906/nnm-distinguishable-balls-with-m-different-colors I would be happy if you could take a look at the question. thanks – Gongotar May 15 '20 at 09:18
1

Using multinomial theorem, the total number of cases are the number of solutions of the equation $$x_1 + x_2 + x_3 + ...+ x_m = k, x_i \in \{0, 1, ... n\}$$ The number of solutions is coefficient of $x^k$ in $(1-x^{n+1})^m \times (1-x)^{-m}$. This will have to be evaluated depending on the value of $k$. Let this number be $A$

The number of required cases are the number of solutions of the equation $$y_1 + y_2 + ... + y_{m-2} = k-2n$$ Note that we have 2 complete sets of $n$ elements selected, so we are finding the number of ways to select the remaining $k-2n$ elements. The number of solutions is coefficient of $x^{k-2n}$ in $(1-x^{n+1})^{m-2} \times (1-x)^{-(m-2)}$. Let this number be $B$.

The net probability is $$P = \frac{B}{A}$$

$B$ and $A$ would have to be calculated based on a numeric value of $k$.

Aniruddha Deb
  • 4,345
  • 11
  • 30
  • I will test this in my simulator, however, I should first find a function which calculates B and A. I will write the results thereafter. In the meantime, could you please explain your approach a bit more? how did you come up with the equations of A and B? – Gongotar May 14 '20 at 17:32
  • "Note that we have 2 complete sets of n elements selected." Are you also counting the ways to select these 2 sets? That is, should the $B$ be multiplied by $\binom{m}{2}$ or am I missing something? – Vepir May 14 '20 at 20:16
  • Note that the number of solutions to $x_1+\dots+x_m=k,x_i\in{0,1,\dots,n}$ is given by: (See extended stars-and-bars problem(where the upper limit of the variable is bounded) answers.)

    $$ \sum_{q=\lceil\frac{k+m}{n+1}\rceil}^m (-1)^{m-q}\binom {m}{q}\binom{q(n+1)-1-k}{m-1} $$

    – Vepir May 14 '20 at 20:27
  • I wrote the eq(k,m,n) function that produces the result of your suggested equation (from the link), then by producing ( eq(k,m,n-1) + m*eq(k-n,m-1,n-1) ) / eq(k,m,n), I intended to get the probability of having no full colors at all in the choices + having one full color in the choices, then by dividing the result to the total number of choices I could have got the probability of zero or one full-colors in the choices, the complement of this number would have delivered me the needed probability in the problem. However, the results don't match the simulations. Am I doing something wrong? – Gongotar May 14 '20 at 21:22
  • To answer you question, yes the counting ways would be important and thus the result should be multiplied by (m 2). – Gongotar May 14 '20 at 21:22