Problems with any non-Kolmogorovian (frequentistic, subjective etc.) approaches to probability

Question

When introducing Kolmogorov's axiomatic approach probability, it is often claimed that this is are a way out of the problems associated with the following two interpretations of probability:

• frequentistic approach: probability is the limit of relative frequence of an event occuring in long strings of repetitions of an experiment

• subjective approach: the (hypothetical) monetary value that I would put on a bet that an event occurs

My question is: What exactly are the problems, associated to these interpretations?

[Careful! Long sentence ahead!] Are they merely philosophical (e.g. in case of the frequentistic approach: we can't associated a probability to all events, that we would like to associated probabilities to, since not all events can formulated within repeatable experiments; e.g. to estimate probability of a politician being elected, we can hold 1000 times an election, to approximate, as the fraction of those 1000 times in which he was elected, the probability of election) or are there ``hard'' mathematical obstacles that arise when one formalizes these other approaches and states them axiomatically (e.g. in case of the frequentistic approach: we assume that $(\Omega,\mathcal{F})$ is a measurable space, let $A\in\mathcal{F}$ be arbitrary and consider for each $k\in\mathbb{N}$ a finite sequence $(B_{i}^{k})_{i\leq k}\in\mathcal{F}^{k}$ such that $\lim_{k\rightarrow\infty}\frac{N_{k}((B_{i}^{k})_{i\leq k})}{k}$ exist (which then necessarily lies within $[0,1]$), where $N_{k}$$((B_{i}^{k})_{i\leq k}):=|\{i:B_{i}^{k}=A\}|$; the value of this limit is then called probability of $A$. For the subjective approach I don't know how to formalize this)?

I can’t cite any quotes right now but if you can ever get a hand on Probability with Martingales by D. Williams, the entire introduction chapter is devoted to examining why Kolmogorov’s measure theoretic approach is needed, via a re-examination of Branching processes and the problems that arise without it. — Nap D. Lover, Aug 27 '20 at 17:48
@NapD.Lover I looked at it! It seems to be that the introduction is actually quite advanced and tries to present advanced concepts in an intuitive manner, which makes it quite hard to read. If someone here could explain the gist of that introduction, and why that makes the measure theoretic approach necessary, that would help a lot. — temo, Aug 28 '20 at 17:55

score 3 · Accepted Answer · edited Sep 04 '20 at 17:41

You can easily dispose of random variables for elementary probability (on finite sample spaces), just look at Feller’s volume 1 which does not formally introduce them until over halfway through the book. However, in the theory of stochastic processes, especially in continuous time, the concept of random variable is indispensable.

In short the first half of this post addresses the fact that the elementary approach fails (somewhat dramatically!) when combinatorics no longer applies. The second half recounts a passage of Cramer essentially commenting that while other axiomatic approaches to probability theory are certainly valuable, none seem to be as amenable to actually doing mathematics as Kolmogorov's.

Allow me to try to digest the introduction I referenced as well as provide some quoted passages from H. Cramer's book on distributions:

First on the introductory chapter of Probability with Martingales. As often in probability models, we have some event $A$ we are interested in [in the chapter its the event of extinction of the Branching process], and a sequence of events $A_n$ that are associated with $A$ somehow. When, how, and why can we say $\mathbb{P}(A)=\lim_n \mathbb{P}(A_n)$? In the specific case $A_n=\{Z_n =0\}$, the event of extinction in the $n$-th generation and $A=\{Z_m = 0 \text{ for some } m\}$ the event of extinction, ever, and $\pi_n := \mathbb{P}(Z_n=0)$ and $\pi:=\mathbb{P}(Z_m = 0 \text{ for some } m)$. It is intuitive to posit that this limit holds in this case, but how do we prove it? And even if we do prove it, what if I have a different model now, or some slight variation?

In the elementary theory we can identify the sample space with the set of all outcomes without issue and combinatorics greatly aids in our work. Now recall for this branching process, for $n\in \mathbb{Z^+}$ and $r\in \mathbb{N}$, the RV $X_r^{(n+1)}$ represent the number of children in the $(n+1)$-st generation of the $r$-th animal from the $n$-th generation (if there is one). Then the total number in the $(n+1)$-th generation is given by $$Z_{n+1}=X_1^{(n+1)}+\dotsc +X_{Z_n}^{(n+1)}$$

The $X$ are non-negative integer valued RVs and the doubly-infinite sequence $X_r^{(s)}$ for $r,s\in \mathbb{N}$ is assumed to be IID. For a single copy of $X$ we might just as well say $\Omega = \{0,1,2,\dotsc\}$. But for this experiment, as long as we want to compute $\pi :=\mathbb{P}(Z_m = 0 \text{ for some } m)$, we need to know the results of the entire sequence $(X_s^{(r)}: r,s \in \mathbb{N})$ as a single outcome of our experiment. We aren't just interested in one generation or one animal, so our sample space needs to reflect that. Thus following the author now,

We could follow elementary theory in taking $\Omega$ to be the set of all outcomes, in other words, taking $\Omega$ to be the Cartesian product $$ \Omega = \prod_{r,s \in \mathbb{N}} \mathbb{Z}^+,$$ the typical element of $\Omega$ being $$\omega = (\omega_s^{(r)} : r,s \in \mathbb{N}),$$ [a doubly-indexed sequence of non-negative integers] and then setting $X_s^{(r)}(\omega)=\omega_s^{(r)}$. Now $\Omega$ is an uncountable set, so that we are outside the "combinatorial" context which makes sense of $\pi_n$, in the elementary theory. Moreover, if one assumes the Axiom of Choice, one can prove that it is impossible to assign to all subsets of $\Omega$ a probability satisfying the "intuitively obvious" axioms and making the $X$'s IIDRVs with the correct common distribution. So, we have to know that the set of $\omega$ corresponding to the event "extinction occurs" is one to which one can uniquely assign a probability (which will then provide a definition of $\pi$). Even then, we have to prove $\pi=\lim \pi_n$ [emphasis mine].

Kolmogorov's measure-theoretic treatement resolves this (or settles on a compromise, depending on your preferences). We know exactly which sets can be measured (those in $\Sigma$ or $\mathcal{F}$), and we have useful limiting properties like left-continuity of measures and $\sigma$-additivity for computations. Further this answer discusses why finite additivity is not enough, even for countable sample spaces. Finally, you might be interested in example 3.7 Coin Tossing on page 32, Chapter 3 of the same book. It seems similar/related to the example at the end of the OP.

Let me also add that it is hard to formulate and state results about existence of continuous modifications of stochastic processes, let alone prove them (or any other sample-path property for that matter) without random variables, for example Kolmogorov’s continuous modification theorem: en.m.wikipedia.org/wiki/Kolmogorov_continuity_theorem.

Note also the followingm with regards to the book Rogers and Williams Volume 1, p123: The canonical sample space is nice (then the sample-path is the outcome of the experiment, as it is in elementary theory), but "probability theory gets most of its depth from being able to construct (certainly non-canonical!) processes from other processes by time transformations, or as solutions of SDEs."
(This book, volume 1, opens up with a survey of results on Brownian motion, exemplifying nearly every property/concept in the general theory (martingale, gaussian process, markov process, infinitesimal generator etc). The next chapter reviews measure theoretic probability, stochastic processes (Daniel-Kolmogorov theorem), and discrete time and continuous time martingales. After is the general theory of Markov processes.)

Now I leave you with a moderately lengthy passage from Cramer's little book on the difficulties with other axiomatic approaches proposed then (first printed 1937, 2nd edition 1962):

The axiomatic basis of a theory may, of course, always be constructed in many different ways, and it is well known that, with respect to the foundations of the Theory of Probability, there has been a great diversity of opinions.

The type of statistical regularity indicated above [previous page contained a discussion on frequentist's perspective] was first observed in connection with ordinary games of chance with cards, dice, etc., and this gave occasion to the origin and early development of the theory. [...] This led to the famous principle of equally possible cases which, after having been more or less tacitly assumed by earlier writers, was explicitly framed by Laplace, as the fundamental principle of the whole theory. Throughout the whole century following the publication of Laplace's classical treatise, a large amount of work has been spent on the discussion of this principle.

During the course of this discussion, it has been maintained by various authors that the validity of the principle of equally possible cases is necessarily restricted to the field of games of chance, so that it is wholly incapable of serving as the basic principle of the theory. Attempts have been made$^1$ to establish the theory on an essentially different basis, the probabilities being directly defined as ideal values of statistical frequencies. The most successful attempt on this line is due to v. Mises who endeavors to reach in this way an axiomatic foundation of the theory in the modern sense.

The fundamental conception of the v. Mises theory is that of a "Kollektiv", by which is meant an unlimited sequence $K$ of similar observations, each furnishing a definite point belonging to an a priori given space $R$ of a finite number of dimensions. The first axiom of v. Mises then postulates the existence of the limit $$\lim v/n = P(S),$$ for every simple sub-set $S\subset R$ while the second axiom requires that the analogous limit should still exist and have the same value $P(S)$ for every sub-sequence $K'$ that can be formed from $K$ according to a rule such that it can always be decided whether the $n$-th observation of $K$ should belong to $K'$ or not, without knowing the result of this particular observation$^2$. [$v$ is the number of observations of the event $S$ out of $n$ trials, described on an earlier page]. It does, however, seem difficult to give a precise mathematical meaning to the condition printed in italics, and the attempts to express the second axiom in a more rigorous way do not, so far, seem to have reached satisfactory and easily applicable results [emphasis mine]. Though fully recognizing the value of a system of axioms based on the properties of statistical frequencies, I think that these difficulties must be considered sufficiently grave to justify, at least for the time being, the choice of a fundamentally different system.

The underlying idea of the system that will be adopted here may be roughly described in the following simple way: The probability of an event is a definite number associated with that event; and our axioms have to express the fundamental rules for operations with such numbers.

1 For history of these attempts, cf. Keynes, chaps VII-VIII

2 The second axiom as given by v. Mises is somewhat more complicated. It can, however, be shown that this is equivalent to the simpler statement given above.

from Random Variables and Probability Distributions 2nd edition by H. Cramer, p3-5.

This is awesome. Immediate +1 one. Let me digest it a bit more. Are you by any chance familiar with D. Knuth's second volume, of Art of Computer Programming, chapter 3.5, where he discusses which sequences of numbers are truly random ($\infty$-distributed)? Somehow, this seems to be tangentially related to what Cramer wrote. — temo, Sep 04 '20 at 17:29
@temo I am not but Knuth's book has certainly been on my to-read list! Kolmogorov's complexity theory is also fascinating though I am a novice in the topic. It would be interesting to get a hand on von Mises's original works where he attempted the theory. If I ever come across it, I'll edit them in the answer. — Nap D. Lover, Sep 04 '20 at 17:33
Yes, please edit it, if you come across it. I have attempted to integrate your comments, which seemed very useful, into your answer, so that we have all the info in one place. Do re-edit, if you feel that I made some errors. — temo, Sep 04 '20 at 17:42

Problems with any non-Kolmogorovian (frequentistic, subjective etc.) approaches to probability

1 Answers1