1

I have $k$ distinct balls split into two sets $A$ and $B$, $A$ has $k_1$ balls and $B$ has $k_2$ balls, and $k_1, k_2 \geq 1$. I have a random label machine which prints numbers from {1, 2, ... n} ($n > k$) with uniform probability. I then label all the balls with this machine. What is the probability that every label appears only in at most one of $A$ and $B$? i.e. no label appears in both $A$ and $B$? I have a lower bound but I'm not very satisfied with it and I would like to know how the actual probability should be calculated.

I came up with the following lower bound by using this process: first label all the balls in $A$. We would have then used up at most $k_1$ labels. Then the remaining $k_2$ balls have to avoid those $k_1$ labels. The probability is therefore $\left(\frac{n-k_1}{n}\right)^{k_2}$. My main gripe with this lower bound is that it's not symmetric, e.g. setting $k_1=1, k_2=k-1$ should be the same probability as setting $k_1=k-1, k_2=1$, but it's not the same in this function. I also have no idea how you would calculate the actual probability as using this same process seems to be really difficult for doing so, as you have to consider how many labels were taken up in the first $k_1$ balls.

1 Answers1

0

Define $X$ to be the number of numbers taken from $\{1,2,\dots ,n\}$ which are missing among the $k_1$ balls in set $A$. We want to find the distribution of $X$. For that purpose, we will use a generalized inclusion-exclusion principle.

Let's say a set of balls has "property $i$" if there is no ball labeled $i$ in the set, for $1 \le i \le n$. Define $S_j$ to be the sum of the probabilities of all the sets with $j$ of the properties, i.e. with $j$ numbers missing among the balls' labels. Then for set $A$, $$S_j = \binom{n}{j} \left( \frac{n-j}{n} \right)^{k_1}$$ The generalized inclusion-exclusion states that the probability of an event with exactly $m$ of the properties, i.e. the probability that exactly $m$ numbers are missing among the labels on the balls in set $A$, is $$P(X=m) = \sum_{i=0}^{n-m} (-1)^i \binom{m+i}{i} S_{m+i}$$ for $0 \le m \le n$.

Finally, the probability that all of the numbers on the labels on the $k_2$ balls in set $B$ are among the missing numbers in set $A$ is $$\sum_{m=1}^n P(X=m) \left( \frac{m}{n} \right)^{k_2}$$

You can find proofs and references for the generalized inclusion-exclusion principle in the answers to this question: Generalized inclusion-exclusion principle. One of the references listed there is section IV.3, "The Realization of m Among N Events", in An Introduction to Probability Theory and Its Applications, Volume I, Third Edition by William Feller, p. 106.

awkward
  • 14,736