1

Given set $S$, where $|S| = N$, and drawn randomly $k$ subsets of $S$: $a_1, a_2, a_3, ..., a_k$ with cardinalities: $n_1, n_2, n_3, ..., n_k$. What is the probability that the intersection of $a_1, a_2, a_3, ..., a_k$ will have the cardinality equal (or equal and greater) to p (which is parameter).

The problem I encountered is that when I try to count all possible subsets of given cardinality, for the denominator of probability I get:

$$\prod_{i=1}^k\binom{N}{n_i}$$

which I believe is correct. But when i try to count such examples that has cardinality of intersection at least $p$ (to put in in numerator) then I can come up only with this: $\binom{N}{p}$ which is number of possibilities to draw the intersection, and then $\prod_{i=1}^k\binom{N-p}{n_i-p}$ representing number of possibilities of drawing rest of the subsets. Unfortunately this is not correct approach because resultig formula:

$$\frac{\binom{N}{p}\prod_{i=1}^k\binom{N-p}{n_i-p}}{\prod_{i=1}^k\binom{N}{n_i}}$$

is sometimes bigger than 1.0 for some $p=0, 1, 2, 3, ..., N$. I think the reason is that when i first draw intersection (eg. {1,2,3}), and then draw rest of the set (eg. {4,5}) I get a result ({1,2,3,4,5}) which I count multiple times (eg. when I draw intersection {3,4,5} and rest {1,2}). How to correct the numerator of formula to count every case only once?

1 Answers1

1

What you've done thus far is fine, but you need to use inclusion/exclusion in your numerator. So instead of:

$$\frac{\binom{N}{p}\prod_{i=1}^k\binom{N-p}{n_i-p}}{\prod_{i=1}^k\binom{N}{n_i}}$$ you instead need:

$$\frac{\displaystyle\sum_{j=p}^N (-1)^{j-p} \binom{j}{p} \binom{N}{j} \prod_{i=1}^k\binom{N-j}{n_i-j}}{\displaystyle\prod_{i=1}^k\binom{N}{n_i}}$$

with the convention that ${a \choose b}=0$ for all $b < 0$.

Edit:

DANGIT! Realize I made a mistake, but cannot fix it right now. Inclusion/exclusion is the right way to go, but there is a missing factor in the numerator. I will fix later this evening, if another less confused combinatorialist does not fix it first.

Edit:

Shoot, that's what I get for trying to do this too quickly. Apologies. For each element $i$ of the index set $S$, define the set $A_i$ to be the set of all selections such that $i$ is contained in $a_1 \cap a_2 \cdots \cap a_k$. We are seeking the number of selections that are contained in exactly $p$ of these sets $A_i$.

We use a standard generalization of inclusion/exclusion (see for example joriki's excellent description) for this count. Using his notation, fulfilling a condition $A_i$ means having a selection of $a_1, a_2, \ldots a_k$ in the set $A_i$. The number of selections fulfilling $j$ conditions is uniform regardless of which conditions are chosen, namely $$\prod_{i=1}^k\binom{N-j}{n_i-j}$$

Therefore, we can use the last of joriki's formulae to give the number of ways to have an intersection size of exactly $p$ (note: The $j$ in joriki's formula is our $p$, and his $k$ is our $j$): $$\displaystyle \sum_{j=p}^N (-1)^{j-p} \binom{j}{p} \binom{N}{j} \prod_{i=1}^k\binom{N-j}{n_i-j}$$

This is the numerator of our probability.

Jeremy Dover
  • 1,592
  • I need some explanation – Przemek B Sep 25 '19 at 17:53
  • Will add to answer...just a moment. – Jeremy Dover Sep 25 '19 at 17:55
  • I have two questions to that. is this the "equal" case or "equal or greater"? By this I mean is this probability of intersection being exactly $p$ elements or being at least $p$? Is that exclusion/inclusion rule, which i dont much understand yet, to do the "equal" case? Could it get any simplier (with less computational complexity than $O(n^2))$ for "equal or greater" case? – Przemek B Sep 26 '19 at 08:09
  • This form of inclusion/exclusion does the "equal" case, i.e., intersection size is exactly $p$. I do not believe for this problem there is a simpler formula for "equal or greater", but there may be some intuition I don't see. – Jeremy Dover Sep 26 '19 at 12:26
  • using this python code to test the formula I get negative probabilities: https://pastebin.com/TmpQRYFQ for example for N=37, ni=[21, 27, 24, 7, 31, 21], p=1 i get prob~= -1.73 – Przemek B Sep 27 '19 at 10:02
  • I calculated a probability of about 0.41 using Magma, which does arbitrary precision integer arithmetic. Is Python using the Gamma function to approximate the binomial coefficients? If so, the values it produces for binomials with a negative on the bottom may not be 0. – Jeremy Dover Sep 27 '19 at 15:40
  • I made a operator precedence error writing python implementation, namely in numerator I did $-(1^{j-p})$ instead of $(-1)^{j-p}$. Now probabilities are in correct range and sums up to 1 for random batch I generated. I'm really grateful for your help and I already have marked your answer as proper solution. But can you sacrifice a couple more minutes and make me understand please? You already have my gratitude till the day I die but I offer you endless gratitude*

    *(if afterworld exists)

    – Przemek B Sep 29 '19 at 10:23
  • I'm glad to take a bit more time, but need to know which specific part is confusing. Is it the generalized inclusion/exclusion formula? The specifics of how it is applied to this problem? – Jeremy Dover Sep 29 '19 at 21:30
  • I understand denominator. Also I understand why cardinality of union of two (not necessarily disjoint) sets is $|a \cup b| = |a| + |b| - |a \cap b|$. Similarly I understand why cardinality of union of 3 sets is $|a \cup b \cup c| = |a| + |b| +|c| - |a \cap b| - |b \cap c| - |c \cap a| + |a \cap b \cap c|$. Also I think I know how this generalizes on greater number of sets. Still dont understand how to apply this to get our numerator. – Przemek B Sep 30 '19 at 05:46