Have two arrays $\vec{x}$ and $\vec{y}$, both of length $N$. They are binary (filled with 1's and 0's). We know that
$\sum_i x_i = N_x$ and
$\sum_i y_i = N_y$
Let $perm(\vec{x})$ denote a random permutation of the elements of an array. Thus define
$\vec{x}' = perm(\vec{x})$ and
$\vec{y}' = perm(\vec{y})$
I am interested in finding an analytic expression for the probability $P[C = c]$ of the number of randomly-intersecting elements, namely
$C = \sum_i x_i' y_i'$
If exact expression does not have a closed form, a good approximation would also be helpful.
The origin of this problem is comes from optics. I have two multichannel recordings before and after I do something. I want to test whether the number of channels co-active in both situations can be explained by the null hypothesis that the exact channels active at every moment in time are completely random.
My Attempt No 1:
The problem can be reformulated as follows: Assume there are two urns:
- Urn $X$ has $N_x$ white and $N-N_x$ black balls
- Urn $Y$ has $N_y$ white and $N-N_y$ black balls.
We draw one ball from each urn without replacement, and check if both balls are white. Then repeat until all balls are drawn. We are interested in the probability that we will draw a pair of white balls exactly $C$ times.
Now, if we relax the problem and allow for draws with replacement, it is easy to see that $P[C=c] \sim Bin(c, N, p)$ is a binomial distribution with $p=\frac{N_x}{N} \cdot \frac{N_y}{N}$. Since the original problem requires us to draw without replacement, it seems that the answer might be some form of a hypergeometric distribution. However, original hypergeometric distribution deals with only 1 urn. I need an extension that deals with matching 2 urns.
\begin{align} p(N,N_x,N_y,c) =\hphantom{+}&\frac1{N_x}\frac1{N_y}p(N-1,N_x-1,N_y-1,c-1) \+&\frac1{N-N_x}\frac1{N_y}p(N-1,N_x,N_y-1,c) \+&\frac1{N_x}\frac1{N-N_y}p(N-1,N_x-1,N_y,c) \+&\frac1{N-N_x}\frac1{N-N_y}p(N-1,N_x,N_y,c) \end{align}
– Fimpellizzeri Jan 10 '20 at 15:21