7

Let $A$ and $B$ be two disjoint events in a probability space and suppose that one of the two events has zero probability. According to the standard definition of independence, this means that $A$ and $B$ are independent. Unfortunately this definition seems very counter-intuitive to me:

If both events are non-empty, then I would instead define them to be dependent, as the occurrence of one event excludes the occurrence of the other. Why has the math community accepted a different definition? Just out of convenience? What would be the consequences if we changed the definition?

ryang
  • 38,879
  • 14
  • 81
  • 179
Filippo
  • 3,536
  • 6
    Independence means that $P(A\cap B)=P(A)\times P(B)$. If one or both of $A,B$ have probability $0$ then this clearly holds. Sure, you could modify the definition to exclude this case...but then you'd have an extra condition to test every time you wanted to show independence. – lulu Feb 07 '23 at 13:28
  • I think it is also about the choice of how events are defined. The idea of "disjoint events" in a "space" is rooted in a certain representation of certain kinds of events. – Alex K Feb 07 '23 at 13:33
  • Yes, empty event is independent from any event by definition of independence. – kludg Feb 07 '23 at 13:48
  • 1
    To be sure, excluding this case would make some "intuitive" statements true. Like "No event is independent of itself". Seems obvious, but of course events with probability $0$ are independent of themselves under the standard definition. In practice, I often exclude "degenerate cases" when discussing the intuitive aspects of independence. – lulu Feb 07 '23 at 13:51
  • Non-empty events can be independent, if their interaction has right size; if the interaction is empty, then the events are dependent. – kludg Feb 07 '23 at 13:56
  • I think you should think of independence as a property that concerns probabilities of occurrences and not so much as a property concerning occurrences. – drhab Feb 07 '23 at 14:10
  • @drhab I think if independence can be explained only in terms of probabilities, this is occasional independence. Quite often independence have intuitive meaning that does not need calculation, – kludg Feb 07 '23 at 14:33
  • @drhab Could you please elaborate? I am not sure that I get your point. – Filippo Feb 08 '23 at 16:00
  • Asking myself: is there independence? I discern two layers. The coarse one focusses on probabilities and the finer on events. If e.g. $\varnothing\neq A\subseteq B$ and $P(A)=0$ and $P(B)<1$ then according to the coarse one there is independence because $P(A\cap B)=0=P(A)\times P(B)$. According to the fine there is no independence because $P(B|A)=1\neq P(B)$. Which one should we go for? I would recommend the coarse then because our common and nice working definition of independence uses straight probabilities, and not conditional ones. – drhab Feb 09 '23 at 08:33
  • I understand that new questions (e.g. what is the definition of $P(B|A)$ in this context? Or: is $P(B|A)$ überhaupt defined? ) arise then, and to be honest I do not really have a covering answer on that. – drhab Feb 09 '23 at 08:35
  • @drhab Thank you for the comments. AFAIU you have provided a second example where the „fine“ definition and the „coarse“ definition do not agree. Nice. – Filippo Feb 10 '23 at 20:10
  • 1
    I posed a similar question here (Dartboard paradox), and concluded that the intuitive characterisation of independence is premised on not conditioning on a zero-probability event (which need not even be an empty event). This comment merely affirms, rather than actually answers, your question. – ryang Feb 21 '23 at 14:29

3 Answers3

2

Why do we define independence in a way that allows for an event of probability zero to be independent of another event?

The case you are referring to can be seen as degenerate. If one event has probability $0$ then there isn't a clear way to interpret the notion of independence intuitively (in order to know if an event occurring has some impact on another event, we first need that event to be possible). The reason that we define independence in the way that we do is simply out of convenience (as you speculate in the question). It is convenient to have one notion of independence so that we can apply lots of results that apply to independent events in as broad a way as possible. If we were to exclude the cases that you refer to, then this would just create additional work to prove that the standard results that follow for independent events also work in the case that you have excluded.

What would the consequences be if we modified the definition?

If we were to change the definition now, then there would be some real consequences. For example, in the standard proof of Kolmogorov's Zero-One Law, we use the fact that if the event $A$ is independent of itself then the probability of $A$ must be $0$ or $1$ (in other words: $P(A \cap A) = P(A)P(A)$ if and only if $P(A) = 0$ or $P(A)=1$) - this proof can be found in the book "Measure Theory" by Donald Cohn.

If we were to modify the definition of independence in the way that you suggest, then this proof breaks down as the event $A$ can no longer said to be independent. It is, of course, still possible to modify this proof, but it becomes unnecessarily complicated because we can no longer refer to the notion of $A$ being independent and we can no longer apply any of the standard results that follow from independence either (without additional justification).

This is, of course, one example, but there are many other proofs that use similar ideas to the one above and so the definition of independence that we currently have lends itself nicely to simplifying these types of proofs. The only downside is (as you also pointed out), that you lose some intuition when you think about these type of events logically. However, as examples involving events of this nature are degenerate, this isn't a big concern in the mathematical community.

Why do some definitions exclude edge cases?

There are, of course, some definitions that do exclude edge cases. However, there are usually much more serious reasons for this.

For example, one could ask why $1$ is not a prime number. We could easily have allowed $1$ to be a prime and number, and to some, this might be more intuitive (like in this case).

However, if we did modify the definition of prime numbers to include the number $1$, then we would run into problems. Numbers would no longer have unique prime factorisations

$$6 = \color{red}{3 \times2} \space = \space \color{blue}{3 \times 2 \times 1}\space = \space \color{green}{3 \times 2 \times 1 \times 1} = \space \space ... $$

This would create a lot more work for mathematicians and would also violate the Fundamental Theorem of Arithmetic (requiring it to be rewritten).

Therefore, in some cases, like this one, it is sensible to exclude an "edge case". However, in the case you describe, there are no serious ramifications of including $0$. Therefore, from the perspective of convenience it makes sense to include the degenerate cases to simplify our analyses and proofs.

FD_bfa
  • 3,989
  • "The case you are referring to can be seen as degenerate. If one event has probability 0 then there isn't really a clear way to interpret the notion of independence intuitively (as in order to know if an event occurring has some impact on another event, we first need that event to be possible)." - When we consider uncountable probability spaces, then usually all singletons have probability equal to zero, yet a singleton occurs every time we carry out the experiment, doesn't it? – Filippo Feb 22 '23 at 18:23
  • The same applies in this case. So if you have two disjoint events where at least one has probability $0$, then they are independent. It's a degenerate still a degenerate example, so what you end up with is something that doesn't have a particularly meaningful interpretation in a qualitative discussion - but, as mentioned in the answer, this can sometimes be a helpful tool in proofs @Filippo – FD_bfa Feb 22 '23 at 19:54
  • For example: if a number is selected uniformly between $0$ and $1$, the probability that we select any single number will be $0$. So it is only meaningful to discuss the independence of events such as the even that we select a number less than $0.5$ and the event that we select a number less than $0.2$ since both of these events occur with a probability greater than $0$. So independence isn't totally useless when we have uncountable probability spaces. But when we are looking at single numbers, then there is little we can say as you rightly point out @Filippo – FD_bfa Feb 22 '23 at 19:57
  • Does this answer your question? @Filippo – FD_bfa Mar 01 '23 at 22:47
  • Your answer is definitely very helpful. I decided to accept it. – Filippo May 31 '23 at 06:57
0

@FD_bfa offers a good answer: a definition should cover edge and degenerate cases with no modification.

Here's another way to look at it. Two events are independent if finding out that one has occurred tells you nothing about the probability of the other. That's clearly the case when one (or both) of the events has probability $0$.

FD_bfa
  • 3,989
Ethan Bolker
  • 95,224
  • 7
  • 108
  • 199
  • "Two events are independent if finding out that one has occurred tells you nothing about the probability of the other. That's clearly the case when one (or both) of the events has probability $0$." - I don't see why that is clearly the case. If an event $A$ has occurred, then I know for certain that all disjoint events have not occured. That is true even if $\mathbb P(A)=0$. So I would say that the occurrence of $A$ actually tells me a lot about the probability of all other events. – Filippo Feb 22 '23 at 15:04
  • If event $A$ with positive probability occurs you gain no information about another event $B$ that had probability $0$ before. It still has probability $0$. Of course you do get information about some other events. An event $A$ with $0$ probability never occurs, so the occurrence that never happens can't change the probability of any other event. – Ethan Bolker Feb 22 '23 at 15:41
  • "An event $A$ with $0$ probability never occurs, so the occurrence that never happens can't change the probability of any other event." - If the probability space is uncountable, then usually all singletons have probability zero and yet a singleton occurs everytime we carry out the experiment, doesn't it? – Filippo Feb 22 '23 at 17:50
  • Continuous probability densities are more subtle than that. When you do your real experiment it necessarily has some error so the result you get is really a small interval. The real line is only a model for the experiment. Singletons have probability $0$. – Ethan Bolker Feb 22 '23 at 20:14
0

Here is another perspective: In most cases, we don't really care wether events are disjoint or not. More specifically, we don't care about the underlying measurable space $(\Omega,\Sigma)$ from which events are taken that much, we care about the probability distribution itself. And insofar, we don't usually distinguish between events that are truly disjoint ($A\cap B=\emptyset$) and events whose joint probability vanishes ($P(A\cap B)=0$). Or in other words, wether an event is empty or just almost impossible (zero probability) is not of much consequence for the theory.

Also consider this: Two events $A,B$ being independent intuitively means that $P(A)=P(A|B)$: If $B$ has occured, the probability of $A$ hasn't changed, and vice versa. But if $P(B)$ has probability $0$, then the conditional probability can't even be calculated sensibly, since $P(A|B):=\frac{P(A\cap B)}{P(B)}$, so we would be dividing $0$ by $0$. So we can't reasonably say that the probability of one event doesn't change upon discovering that the other has occured, since the probability isn't even well defined now.

Vercassivelaunos
  • 13,226
  • 2
  • 13
  • 41
  • Thank you for your answer. Regarding the second paragraph: Why not set $P(A|B):=0$ whenever $A$ and $B$ are disjoint and $B$ is non-empty? – Filippo Mar 04 '23 at 10:48
  • Why do you think 0 would be the right choice? We can consider two cases giving different intuitive "right choices": If $B\subseteq A$, then shouldn't $A$ be a sure thing when $B$ occurs, so $P(A|B)=1$? But if the events are disjoint, then $A$ should be impossible when $B$ occurs, so $P(A|B)=0$ would be the right choice. Now what about any of the cases in-between? Now a "right choice" seems quite impossible, edge cases like finite $B$ excluded. In such a case it's better to just leave it undefined unless some context clearly makes some choice favorable, and then define it in that context only. – Vercassivelaunos Mar 04 '23 at 10:48
  • Also, referring back to the first paragraph: Probability theorists don't like to distinguish between empty and almost impossible events because it makes the theory finicky, and making definitions based upon this distinction goes counter to that. – Vercassivelaunos Mar 04 '23 at 10:52
  • 1
    No, you are wrong. If you pick a uniformly random point (x,y) in the unit square [0,1]×[0,1], then P( x∈[0,1/2] | y=1/3 ) = 1/2. Clearly, the conditional probability is completely sensible. It's your definition via division that is not the appropriate definition in general. – user21820 Apr 09 '23 at 04:15