1

I am trying to solve the following question:

Suppose $k$ red balls and $l$ black balls are placed uniformly at random in $n$ boxes, where $n$ is much larger than $k$ and $l$. What is the probability that at least one red ball ends up in the same box as a black ball?

My approach was to model the number of red and black balls in box $j \in [n]$ as random variables $R_j$ and $B_j$ respectively, and by the description assume that they are binomially distributed. Then I defined the event

$$ A = \{\omega \in \Omega | \exists j \in [n]: R_j \geq 1, B_j \geq 1 \} = \bigcup_{j=1}^n \{R_j \geq 1, B_j \geq 1 \} $$

and attempted to calculate the probability of $A$ with

$$ \mathbb{P}(A) = \mathbb{P}(\bigcup_{j=1}^n \{R_j \geq 1, B_j \geq 1 \}) $$

And tried to experiment with de Morgan i.e.

$$ \mathbb{P}(A) = 1 - \mathbb{P}(\bigcap_{j=1}^n (\{R_j = 0 \} \cup\{B_j = 0 \})) $$ which I couldn't resolve. I tried several ways to make this probability manageable but always walked against the wall that the unions are not disjoint and the rvs not independent in $j$.

Perhaps the entire modelling approach was flawed, how could I solve this?

lpnorm
  • 727

2 Answers2

1

For $~x \in \{1,2,\cdots,k\},~$ let $~f(x)~$ denote the probability that the $~k~$ red balls are placed in exactly $~x~$ boxes. That is, $~x~$ boxes each have at least one red ball in them, and none of the other $~(n−x)~$ boxes have a red ball.

Similarly, for $~y \in \{1,2,\cdots,l\},~$ let $~g(y)~$ denote the probability that the $~l~$ black boxes are placed in exactly $~y~$ boxes.

Then, the complementary probability that none of the boxes with a red ball intersect any of the boxes with a black ball is

$$\sum_{x=1}^k ~\left[ ~\sum_{y=1}^l ~\left( ~f(x) \times g(y) \times \frac{\binom{n-y}{x}}{\binom{n}{x}} ~\right) ~\right]. \tag1 $$

So, the problem reduces to computing $~f(x)~$ and $~g(y).$


To compute $~f(x)~$ consider that of the $~k^n~$ choices for the $~k~$ red balls, you need to compute $~\binom{n}{x}~$ times the number of distributions where each of boxes B-1,B-2,...B-x is non-empty of red balls, and all of the other boxes are empty of red balls. I would use Inclusion-Exclusion here.

See this article for an introduction to Inclusion-Exclusion. Then, see this answer for an explanation of and justification for the Inclusion-Exclusion formula.

With respect to the distribution of the red balls, let $~S~$ denote the set of distributions that do not use any boxes other than B-1,...B-x, but where some of the boxes B-1,...B-x may or may not be empty. For $~i \in \{1,2,\cdots,x\},~$ let $~S_i~$ denote the subset of $~S~$ where box B-i is empty. Then, you have that

$$f(x) = \binom{n}{x} \times \frac{|S| - |S_1 \cup S_2 \cup \cdots \cup S_x|}{n^k}, \tag2 $$

and $~g(y)~$ may be similarly computed.

Therefore, the entire problem has been reduced to computing
$~\displaystyle |S| - |S_1 \cup S_2 \cup \cdots \cup S_x|.$


$\underline{\text{Inclusion-Exclusion Intro for Problem}}$

Let $~T_0~$ denote $~|S|.~$

Let $~T_1~$ denote $~\sum_{i=1}^x |S_i|.~$

For $~r \in \{2,3,\cdots,x\},~$ let $~T_r~$ denote
$\displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq x} |S_{i_1} \cap S_{i_2} \cap \cdots \cap S_{i_r}|.$
That is $~T_r~$ denotes the sum of $~\displaystyle \binom{x}{r}~$ terms.

Then

$$|S| - |S_1 \cup S_2 \cup \cdots \cup S_x| = \sum_{r=0}^x (-1)^{r+1}T_r. \tag3 $$


$\underline{\text{Computation of} ~T_0}$

Each ball has $~x~$ choices for which of Boxes B-1,B-2,...B-x it goes into.

Therefore, $~\displaystyle T_0 = x^k.$


$\underline{\text{Computation of} ~T_1}$

To compute $~|S_1|,~$ note that each ball has $~(x-1)~$ choices for which of Boxes B-2,...B-x it goes into. Therefore, $~\displaystyle S_1 = (x-1)^k.$

Further, by considerations of symmetry, $|S_i| = |S_1| ~: ~i \in \{2,3,\cdots,k\}.$

Therefore,

$$T_1 = \binom{x}{1} (x-1)^k. \tag4 $$


$\underline{\text{Computation of} ~T_2}$

To compute $~|S_1 \cap S_2|,~$ note that each ball has $~(x-2)~$ choices for which of Boxes B-3,...B-x it goes into. Therefore, $~\displaystyle |S_1 \cap S_2| = (x-2)^k.$

Further, by considerations of symmetry, $|S_{i_1} \cap S_{i_2}| = |S_1 \cap S_2| ~: i_1,i_2 \in \{1,2,3,\cdots,k\}, i_1 < i_2.$

Therefore,

$$T_2 = \binom{x}{2} (x-2)^k. \tag5 $$


$\underline{\text{Computation of} ~T_r ~: ~r \in \{3,4,\cdots, x\}}$

To compute $~|S_1 \cap S_2 \cap \cdots \cap S_r|,~$ note that each ball has $~(x-r)~$ choices for which of Boxes B-(r+1),...B-x it goes into.

Therefore, $~\displaystyle |S_1 \cap S_2 \cap \cdots \cap S_r| = (x-r)^k.$

Further, by considerations of symmetry,
$|S_{i_1} \cap S_{i_2} \cap \cdots \cap S_{i_r}| = |S_1 \cap S_2 \cap \cdots \cap S_r|$
where $\displaystyle i_1,i_2,\cdots,i_r \in \{1,2,3,\cdots,k\}, i_1 < i_2 < \cdots < i_r.$

Therefore,

$$T_r = \binom{x}{r} (x-r)^k. \tag6 $$


$\underline{\text{Final Summary}}$

The complementary probability that there is no intersection between the boxes containing red balls and the boxes containing black balls is

$$\sum_{x=1}^k ~\left[ ~\sum_{y=1}^l ~\left( ~f(x) \times g(y) \times \frac{\binom{n-y}{x}}{\binom{n}{x}} ~\right) ~\right]. $$

$$f(x) = \binom{n}{x} \times \frac{|S| - |S_1 \cup S_2 \cup \cdots \cup S_x|}{n^k}. $$

$$|S| - |S_1 \cup S_2 \cup \cdots \cup S_x| = \sum_{r=0}^x (-1)^{r+1}T_r.$$

$$T_r = \binom{x}{r} \times (x-r)^{k} ~: r \in \{0,1,2,\cdots,x\}.$$

Computation of $~g(y)~$ is similar to the computation of $~f(x).$

user2661923
  • 35,619
  • 3
  • 17
  • 39
  • Thank you for this detailed answer, I must admit that I struggle a bit to understand it, but what is already a huge relief is that it is apparent that the problem is not easy to solve. I found it in a book where it was left to the reader and I assumed there was a straightforward solution I was just not seeing. – lpnorm Oct 06 '23 at 10:56
1

$\newcommand{\stirling}[2]{\genfrac\{\}{0pt}{}{#1}{#2}} \newcommand{\mydef}[2]{\genfrac||{0pt}{}{#1}{#2}}$ I would like to give here a different approach which is somewhat simpler than in the previous answer. It starts exactly as in the previous answer with finding the number of ways to place $k$ (distinguishable) balls in exactly $m$ (distinguishable) boxes, so that any of these $m$ boxes contains at least one ball.

This is a very well-known problem and the solution is $$ \sum_{i\ge0}(-1)^i\binom mi(m-i)^k=m!\stirling km, $$ where $\stirling \bullet\bullet$ means the Stirling number of the second kind.

With this at hand the complementary probability in question is $$ \frac1{n^{k+l}}\sum_{i\ge1}\binom ni i!\stirling ki\, (n-i)^l =\frac{n!}{n^{k+l}}\sum_{i=1}^{n-1}\stirling ki\,\frac{(n-i)^l}{(n-i)!}, $$ where $\binom ni$, $(n-i)^l$ and $n^{k+l}$ stay for the number of ways to choose $i$ boxes that contain at least one white ball, to place $l$ black balls in the free $n-i$ boxes and for the overall number of ways to distribute $k+l$ balls in the boxes, respectively. Observe that $\stirling ki=0$ for $i>k$.

user
  • 26,272