3

I've been learning about more probability distributions that are not covered in introductory statistics, and I realized looking at Wikipedia that there are a lot that have been named.

This got me wondering, is there a countably infinite number of probability distributions?

Galen
  • 1,828
  • 3
    For every $c\in\mathbb R$ function $1_{[c,\infty)}(x)$ can be recognized as a CDF. And of course there a lots more. – drhab Sep 28 '16 at 17:07

1 Answers1

3

We'll show that the set of probability distributions over the reals has cardinality $2^{\aleph_0},$ the same as the cardinality of $\mathbb{R}.$

The cardinality of the set $S$ of probability distributions is the same as the cardinality of the set $T$ of cumulative distribution functions, since there's a clear one-to-one correspondence between these sets.

A cumulative distribution function is a function $f\colon \mathbb{R}\to\mathbb{R}$ such that the following four conditions hold:

(a) $\;f$ is monotonically increasing, that is, $x\le y\implies f(x)\le f(y),$

(b) $\;\lim_{x\to-\infty}f(x)=0,$

(c) $\;\lim_{x\to\infty}f(x)=1,$

(d) $\;f$ is right-continuous.

Let $M$ be the set of monotonic functions from $\mathbb{R}$ to $\mathbb{R}.$ We'll show that

$$2^{\aleph_0} \le \operatorname{card}(T) \le \operatorname{card}(M) \le 2^{\aleph_0}.$$

Since $S$ and $T$ have the same cardinality, that's sufficient to show that $S$ has the cardinality of the continuum. $$ $$ (1) $2^{\aleph_0} \le \operatorname{card}(T).$

Proof of (1): For each real number $r,$ the step function which maps any $x\lt r$ to $0$ and any $x \ge r$ to $1$ is a cumulative probability function, and these step functions are all distinct (so the map which takes $r$ to the corresponding step function is one-to-one).

$$ $$

(2) $\operatorname{card}(T) \le \operatorname{card}(M).$

Proof of (2): $T\subseteq M.$

$$ $$

(3) Any monotonic function from $\mathbb{R}$ to $\mathbb{R}$ has only countably many points of discontinuity.

Proof of (3):

Let $f$ be monotonic; without loss of generality, we'll assume that $f$ is monotonically increasing. Let $D$ be the set of points at which $f$ is discontinuous. For each $d\in D,$ $\lim_{x\to d^-}\lt \lim_{x\to d^+},$ so there is a rational number $q$ such that $\lim_{x\to d^-}\lt q\lt\lim_{x\to d^+}.$ Using some standard enumeration of $\mathbb{Q},$ let $j$ be the function which maps each $d\in D$ to the first rational number $q$ in the enumeration with the property that $\lim_{x\to d^-}\lt q\lt\lim_{x\to d^+}.$

We can see that $j$ is one-to-one on $D$ as follows: If $d_1\lt d_2$ are both in $D,$ then, by monotonicity of $f,$ any $x$ in the open interval $(d_1,d_2)$ satisfies $f(d_1)\le f(x) \le f(d_2).$ It follows that $\lim_{x\to d_1^{\,+}}f(x) \le \lim_{x\to d_2^{\,-}}f(x),$ so we have $j(d_1)\lt \lim_{x\to d_1^{\,+}}f(x) \le \lim_{x\to d_2^{\,-}}f(x) \lt j(d_2).$

We now know that $j$ maps $D$ one-to-one into $\mathbb{Q};$ $\mathbb{Q}$ is countable, so $D$ is countable, completing the proof of (3).

$$ $$

(4) $\operatorname{card}(M) \le 2^{\aleph_0}.$

Proof of (4):

Let $W$ be the set of all countable subsets of $\mathbb{R}\times\mathbb{R}.$ Define a function $K\colon M\to W$ by setting $K(f)=\lbrace \langle x,f(x)\rangle | x\in\mathbb{Q}\text{ or }f\text{ is discontinuous at }x\rbrace.$

For any $f\in M,$ $K(f)$ belongs to $W,$ by (3).

Since $W$ has cardinality $2^{\aleph_0},$ we just need to show that $K$ is one-to-one to conclude that $M$ also has cardinality less than or equal to $2^{\aleph_0}.$ But if $K(f)=K(g),$ then $f$ and $g$ are equal on the set $\mathbb{Q},$ which is dense in $\mathbb{R}.$ It follows that $f(x)=g(x)$ for every $x\in\mathbb{R}$ at which $f$ and $g$ are both continuous. On the other hand, if $f$ or $g$ is discontinuous at $x,$ then some ordered pair in $K(f)=K(g)$ has $x$ as its first coordinate. If $y$ is the second coordinate of this ordered pair, then $f(x)=y=g(x).$ So $f(x)=g(x)$ for all $x\in\mathbb{R}.$ This completes the proof that $K$ is one-to-one and hence that $M$ has cardinality at most $2^{\aleph_0}.$

Mitchell Spector
  • 9,917
  • 3
  • 16
  • 34
  • 1
    It's late, but aren't you only considering probability distributions over the reals here? – Clement C. Sep 29 '16 at 02:36
  • Doesn't "probability distribution" normally mean over the reals? (This also seems to be what the OP was referring to, based on the references to an introductory stats class and to the Wikipedia article on the subject.) – Mitchell Spector Sep 29 '16 at 02:39
  • 1
    Nothing in the definition of a probability distribution states anything about the reals... now, most of the ones with a name are most likely over integers or reals, but that's only because these have received more mainstream attention. – Clement C. Sep 29 '16 at 14:34
  • 1
    @ClementC. Sure -- a probability distribution like $P(\text{red ball})=P(\text{blue ball})=0.5$ is a distribution over $\lbrace\text{red ball, blue ball}\rbrace,$ not over $\mathbb{R},$ although it's clearly equivalent to a distribution over the real numbers. I've edited the answer to say that it's counting probability distributions over the reals; it's clearly what the OP meant anyway. – Mitchell Spector Sep 29 '16 at 18:16
  • What if we consider multivariate distributions? How does that affect cardinality? – Szymon Brych Nov 14 '20 at 16:59