12

Define $$S_1 = \sum_{i=1}^n P(A_i)$$ and $$S_2 =\sum_{1 \le i < j \le n}^n P(A_i \cap A_j)$$ as well as $$S_k =\sum_{1 \le i_1 < \cdots < i_k \le n}^n P(A_{i_1} \cap \cdots \cap A_{i_k})$$ Then for odd $k$ in $\{1,\ldots,n\}$ $$P\left(\bigcup_{i=1}^n A_i\right) \le \sum_{j=1}^{k}(-1)^{j-1} S_j$$ For even $k$ in $\{2,\ldots,n\}$ $$P\left(\bigcup_{i=1}^n A_i\right) \ge \sum_{j=1}^{k}(-1)^{j-1}S_j$$

More details of Bonferroni inequalities or Boole's inequality is here.

  • 2
    isn't this inclusion exclusion principle? For odd number of terms you overcount the probability, for even number of terms you undercount. – cactus314 Oct 07 '12 at 15:24

4 Answers4

6

A proof is there. The main idea is that this is the integrated version of analogous pointwise inequalities and that, for every $k$, $$ S_k=\mathbb E\left({T\choose k}\right),\qquad T=\sum_{i=1}^n\mathbf 1_{A_i}. $$ Hence the result follows from the stronger inequalities asserting that, for every positive integer $N$, $$ \sum_{i=0}^k(-1)^ia_i,\qquad a_i={N\choose i}, $$ is nonnegative when $k$ is even and nonpositive when $k$ is odd. In turn, this fact follows from the properties that the sequence $(a_i)_{0\leqslant i\leqslant N}$ is unimodal and that $\sum\limits_{i=0}^N(-1)^ia_i=0$.

Did
  • 279,727
5

Bonferroni inequality is closely related to the partial sum of alternating binomial coefficients.


Let's consider an element $w$ in sample space and literally count it in the left-hand side and right-hand side of the inequality. If $w$ belongs to none of $A_1$ to $A_n$, then it is not counted in $\bigcup_{i-1}^n A_i$, and it's not counted in any $A_i$, any $A_i\cap A_i$, ..., and any $A_{i_1}\cap A_{i_2}\ldots\cap A_{i_k}$.


If $w$, however, is contained in $r$ sets from $\{A_1, A_2, \ldots, A_n\}$, let's just say $w$ lies in $A_1, \ldots , A_r$. Then $w$ is counted exactly once in $\bigcup_{i-1}^n A_i$ (LHS), and counted ${r\choose 1}-{r\choose 2}+ \ldots +(-1)^{k-1}{r\choose k}$ times on the right-hand side, where for $k\gt r$, ${r\choose k}=0$. Now, let's compare the counts on both sides.

  1. If $k=r$, using bionomial theorem to exapnd $(1-1)^n$, we have $1={r\choose 1}-{r\choose 2}+ \ldots +(-1)^{r-1}{r\choose r}$. That is, $w$ is counted the same times on both sides.
  2. If $k<r$, again compare $1$ (LHS) and ${r\choose 1}-{r\choose 2}+ \ldots +(-1)^{k-1}{r\choose k}$ (RHS). Specifically, let's find $f(k) = 1-{r\choose 1}-{r\choose 2} \ldots (-1)^{k-1}{r\choose k}$ instead. In fact, $f(k)$ is the partial sum of alternating bionomial coefficients and has closed form $(-1)^k {r-1 \choose k}$. This can be easily proved by induction and Pascal's rule, see here. Now, it's easy to see when $k$ is odd, $f(k)$ is negative and hence $w$ is counted more times on RHS; when $k$ is even, $f(k)$ is positive and hence $w$ is counted less times on the RHS.

In summary, for odd $k$, $w$ is counted either equal or more times on the RHS and hence the first $k$ terms on RHS is an upper bound of $P\left(\bigcup_{i=1}^n A_i\right)$; and for even $k$, $w$ is counted either equal or fewer times on the RHS and hence the first $k$ terms on RHS is an lower bound of $P\left(\bigcup_{i=1}^n A_i\right)$. The alternating partial sum of binomial coefficients results in the alternating Bonferroni bounds.

Nicholas
  • 363
4

Here is a self-contained proof that expands on @Did's remarks.

The assertion is that $\Delta_k\le0$ when $k$ is odd, and $\Delta_k\ge0$ when $k$ is even, where $$ \Delta_k:=P\left(\bigcup_{i=1}^n A_i\right) +\sum_{j=1}^k(-1)^j S_j.\tag1 $$ To prove this, first observe that $S_j$ is the expected value of $$ \sum_{i_1 < \cdots <i_j} I(A_{i_1}\cap\cdots\cap A_{i_j}) = {T \choose j}\tag2 $$ where $I(\cdot)$ denotes an indicator random variable and $T$ is the integer-valued random variable $T:=\sum_{i=1}^n I(A_i)$. The reason is that $I(A_{i_1}\cap\cdots\cap A_{i_j})(\omega)=1$ if and only if $\omega$ belongs to each of the $j$ sets $A_{i_1},\ldots,A_{i_j}$. Thus the LHS of (2) counts the number of ways to select $j$ different $A$'s to which $\omega$ belongs, and so does the RHS of (2). (We follow the convention ${a \choose b}=0$ when $a<b$, so (2) holds even when $T<j$.)

From (2) we see that $\Delta_k$ is the expected value of $$ I(\cup A_i) +\sum_{j=1}^k (-1)^j {T\choose j} \stackrel{(3a)}=I(\cup A_i)\left[\sum_{j=0}^k(-1)^j {T\choose j}\right] \stackrel{(3b)}=I(\cup A_i)\left[(-1)^k{T-1\choose k}\right].\tag3 $$ To justify equality (3a), consider the cases $\omega\in\cup A_i$ and $\omega\not\in\cup A_i$. For (3b) we apply (pointwise) an identity about the truncated sum of alternating binomial coefficients. From this last expression we conclude that (3) is a non-positive random variable when $k$ is odd, and a non-negative random variable when $k$ is even, which implies the claimed result.

As a bonus, plug $k=n$ in (3). Since $T\le n$, the bracketed quantity will be zero, which implies $\Delta_n=0$, which is the inclusion-exclusion principle.

grand_chat
  • 38,951
1

First, let us prove a related numerical lemma.

Lemma: Let $n\in \mathbb N$, let $x_1,\dots,x_n$ be real numbers between $0$ and $1$, and let $m$ be a positive integer for which $m\le n$. For integers $k,r$ such that $1\le k\le r$, Let $e^r_k$ denote the $k^\text{th}$ elementary symmetric polynomial in $r$ variables evaluated at the first $r$ numbers $x_1,\dots,x_r$. That is, $$ e^r_k=\sum_{1\le i_1<i_2<\dots<i_k\le r}x_{i_1}x_{i_2}\cdots x_{i_k}$$ Furthermore, define $e^r_0=1$, and $e^r_{-1}=0$ for any $r\ge 0$. Then $$(-1)^m\prod_{i=1}^n(1-x_i)\le (-1)^m\sum_{k=0}^m e^n_k$$

Proof: We prove this by induction on $n$.

\begin{align}(-1)^m\prod_{i=1}^n (1-x_i) &= (-1)^m\prod_{i=1}^{n-1}(1-x_i)+(-1)^{m-1}x_n\prod_{i=1}^{n-1}(1-x_i)\\ &\le (-1)^m\sum_{k=0}^m (-1)^ke_{k}^{n-1}+(-1)^{m-1}x_n \sum_{k=0}^{m-1}(-1)^ke_{k}^{n-1}\\ &= (-1)^m\sum_{k=0}^m (-1)^k\big(e_{k}^{n-1}+x_n e_{k-1}^{n-1}\big)\\ &= (-1)^m\sum_{k=0}^m (-1)^ke_{k}^{n}\end{align} For the second step, we apply the induction hypothesis twice. In the last step, we use the rule $$e^{n}_k=e^{n-1}_k+x_{n}\cdot e^{n-1}_{k-1},$$ which is analogous to Pascal's rule, and is proven in the same way; take the summands defining $e^n_k$, and split them into groups, based on whether they have $x_n$ as a factor.


With this lemma, the Bonferroni inequalities are easy to derive. Let $X_i={\bf 1}(A_i)$ be the indicator random variable for $A_i$. From the Lemma, $$ (-1)^m\prod_{i=1}^n (1-X_i)\le (-1)^m \sum_{k=0}^m e^n_k(X_1,\dots,X_n) $$ If we negate both sides of this inequality, then add $(-1)^m$ of both sides, we get $$ (-1)^m\left[1-\prod_{i=1}^n (1-X_i)\right]\ge (-1)^m\sum_{k=\color{red}1}^me^n_k(X_1,\dots,X_n), $$ since $e^n_0(X_1,\dots,X_n)=1$. Finally, take the expected value of both sides.

  • On the LHS, note that $\left[1-\prod_{i=1}^n (1-X_i)\right]$ is exactly the indicator random variable for $\bigcup_{i=1}^n A_i$.

  • On the RHS, it is easy to see that the expected value of $e^n_k(X_1,\dots,X_n)$ is just $S_k$.

Thus, we have proved that $$ (-1)^mP\left(\bigcup_{i=1}^n A_i\right)\ge (-1)^m \sum_{k=1}^m S_k $$ For each $m$, this is exactly the $m^\text{th}$ Bonferroni inequality; the effect of the $(-1)^m$ is to switch the direction of the inequality when $m$ is odd.

Mike Earnest
  • 75,930