Issues with von Mises axioms of probability

Question

Is it possible to come up with a system of axioms that defines probabilities as limits, instead of the traditional Kolmogorov axioms? I know historically there was an attempt at this, mainly brought forward by von Mises, but it somehow didn't reach widespread acceptance (there seem to be some subtle issues with the concept of martingales that are formalizable in his system of axioms).

Has perhaps an improved variant of his axioms been published somewhere that is really equivalent to Kolmogorov axioms?

What is the state of the art for this regarding this approach to probabilities?

It feels as though this simulation approach to probabilities, that is encountered everywhere in computer science is much closer in spirit to von Mises approach to probabilities.

Probabilities as limits of what? Do you have a link to this attempt? — Misha Lavrov, Aug 27 '20 at 17:05
@Masacroso I remember I read a text that contained a long critique of all the problems in von Mises' approach from the link you provided, but I can't find the source right now (hence my question...) — temo, Aug 28 '20 at 17:57

Daniele Tampieri · Accepted Answer · 2023-01-05T08:18:53.887

Premise. More a long comment than an answer, but I felt compelled to post it since in my opinion this question deserve at least a possibly bad answer (I am not an expert in statistics nor in probability theory). Therefore I apologize from now if my language (or better the concepts I'll expose) will be somewhat hazy and mathematically imprecise.
Edit. Videtur the comments to this answer, I tried to improve it following temo's feedback: I hope to have succeeded in producing something at least minimally useful.

The three basic rigorous approaches. Historically, three different rigorous (in the sense of axiomatic) approaches to the theory of probability have been proposed:

The measure theoretical approach, by Andrei Kolmogorov. In this approach, probability is not defined in a direct way, but as a class of finite measures satisfying a handful of axioms. Thus it provides a means to identify probability distributions, not a direct path for their construction: if you get a set function in some unspecified manner, then you can check if it is a probability distribution or not.
The operational subjectivist approach, by Bruno De Finetti. This approach is constructive in the sense that its axioms aim to describe how to construct a probability. Precisely, in this theory, the probability is defined as the value a non biased and informed person could assign to whether a specific outcome is likely to occur. De Finetti proves the equivalence of his axiomatics to the classical ("Kolmogorov") probability theory in [1], which is unfortunately written in Italian: however, [2] is a nice technical review of a later work. In particular, a characteristic of the approach of De Finetti is the use of finitely additive measures, as described in [2].
The Frequentist approach was pursued by many scholars and Richard Von Mises was among them. The frequentist probability theorists define probability by means a limiting process on random samples which is reminiscent of the central limit theorem: Von Mises approach is based on the definition of some random sequences called kollektivs, according to [3], chapter 2.

Reference [3], especially chapter 2, is particularly pertinent to the our question since it describes why Von Mises approach has not been extensively pursued: the criticism of Paul Levy exposed at a conference on Probability theory held in Geneva in 1937, and his praise of Kolmogorov's approach, may have discouraged other scholars. On the other hand [3] also tries tho analyze Von Mises contribution in a deeper and less emotive way, so perhaps this is the right source to start with for an analysis of modern ramification of Von Mises's probability axiomatics.

Reference

[1] Bruno De Finetti, "Sul significato soggettivo della probabilità (On the subjective meaning of probability) ", (Italian), Fundamamenta Mathematicae 17, 298-329 (1931), JFM 57.0608.07, Zbl 0003.16303.

[2] D. A. Gillies, "Review: The Subjective Theory of Probability", The British Journal for the Philosophy of Science, Vol. 23, No. 2 (May, 1972), pp. 138-157.

[3] Michiel van Lambalgen, Random Sequences, Historical Dissertations HDS-08, Originally published: September 1987 (Amsterdam).

Hi, thanks for the answer. I was hoping though for a more mathematically grounded answer in why von Mises axioms are not ok. I was aware of the article you linked, which does not fully answer my question though. In particular, there were a number of attempts after the original publication of von Mises axioms to fix them, because the original ones had problems and I would like to know if perhaps they have been "fixed" after all, i.e. what the "state of the art" here is. (Also, if could you provide any reference for De Finetti's approach, it would be most welcome.) — temo, Sep 06 '20 at 09:54
@temo thanks for the feedback. I am working in order to improve the answer, even if it could hardly be worth of a bounty, mainly because my knowledge of the topic is the result of a lesson in an undergraduate course in the theory of applied probability. However I'll do my best to follow your indications where I can. — Daniele Tampieri, Sep 06 '20 at 10:55

Rivers McForge · Answer 2 · 2020-09-20T09:30:58.950

While von Mises' frequentist approach to probability--essentially, turning the law of large numbers from a theorem to a definition--can be made formally rigorous, it suffers from practical and conceptual difficulties relative to the more common Kolmogorov axiomatization. The Stanford link summarizes some of the relevant issues for frequentist approaches to probability in general--

Finite frequentism gives an operational definition of probability, and its problems begin there. For example, just as we want to allow that our thermometers could be ill-calibrated, and could thus give misleading measurements of temperature, so we want to allow that our ‘measurements’ of probabilities via frequencies could be misleading, as when a fair coin lands heads 9 out of 10 times. More than that, it seems to be built into the very notion of probability that such misleading results can arise. Indeed, in many cases, misleading results are guaranteed. Starting with a degenerate case: according to the finite frequentist, a coin that is never tossed, and that thus yields no actual outcomes whatsoever, lacks a probability for heads altogether; yet a coin that is never measured does not thereby lack a diameter. Perhaps even more troubling, a coin that is tossed exactly once yields a relative frequency of heads of either 0 or 1, whatever its bias....[this is an instance] of the so-called ‘problem of the single case’. ... The problem of the single case is particularly striking, but we really have a sequence of related problems: ‘the problem of the double case’, ‘the problem of the triple case’ … Every coin that is tossed exactly twice can yield only the relative frequencies $0$, $1/2$ and $1$, whatever its bias… A finite reference class of size $n$, however large $n$ is, can only produce relative frequencies at a certain level of ‘grain’, namely $1/n$. Among other things, this rules out irrational-valued probabilities; yet our best physical theories say otherwise. Furthermore, there is a sense in which any of these problems can be transformed into the problem of the single case. Suppose that we toss a coin a thousand times. We can regard this as a single trial of a thousand-tosses-of-the-coin experiment. Yet we do not want to be committed to saying that that experiment yields its actual result with probability 1.

--and to von Mises' approach in particular:

Some frequentists (notably Venn 1876, Reichenbach 1949, and von Mises 1957 among others), partly in response to some of the problems above, have gone on to consider infinite reference classes, identifying probabilities with limiting relative frequencies of events or attributes therein. Thus, we require an infinite sequence of trials in order to define such probabilities. But what if the actual world does not provide an infinite sequence of trials of a given experiment? Indeed, that appears to be the norm, and perhaps even the rule. In that case, we are to identify probability with a hypothetical or counterfactual limiting relative frequency. ... [T]here are sequences for which the limiting relative frequency of a given attribute does not exist... Von Mises (1957) gives us a ... restriction to what he calls collectives — hypothetical infinite sequences of attributes (possible outcomes) of specified experiments that meet certain requirements. Call a place-selection an effectively specifiable method of selecting indices of members of the sequence, such that the selection or not of the index $i$ depends at most on the first $i−1$ attributes. Von Mises imposes these axioms: 1)Axiom of Convergence: the limiting relative frequency of any attribute exists. 2) Axiom of Randomness: the limiting relative frequency of each attribute in a collective $ω$ is the same in any infinite subsequence of $ω$ which is determined by a place selection. The probability of an attribute $A$, relative to a collective $ω$, is then defined as the limiting relative frequency of $A$ in $ω$.

Although von Mises' definition is attractive, in the sense that it matches our intuition of empirical probabilities as "approximations" to the true limiting probability of some event, it has some unwelcome philosophical consequences:

Von Mises .... regards single case probabilities as nonsense: “We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us” (11). Some critics believe that rather than solving the problem of the single case, this merely ignores it. And note that von Mises drastically understates the commitments of his theory: by his lights, the phrase ‘probability of death’ also has no meaning at all when it refers to a million people, or a billion, or any finite number — after all, collectives are infinite. More generally, it seems that von Mises’ theory has the unwelcome consequence that probability statements never have meaning in the real world, for apparently all sequences of attributes are finite. He introduced the notion of a collective because he believed that the regularities in the behavior of certain actual sequences of outcomes are best explained by the hypothesis that those sequences are initial segments of collectives. But this is curious: we know for any actual sequence of outcomes that they are not initial segments of collectives, since we know that they are not initial segments of infinite sequences.

Basically, finite frequentism almost always gives the "wrong" answer for a probability, insofar as it supplies one at all (as it cannot in the case where an experiment is not performed):

[F]inite frequentism makes the connection between probabilities and frequencies too tight, as we have already observed. A fair coin that is tossed a million times is very unlikely to land heads exactly half the time; one that is tossed a million and one times is even less likely to do so! Facts about finite relative frequencies should serve as evidence, but not conclusive evidence, for the relevant probability assignments.

von Mises' infinite or hypothetical frequentism, meanwhile, is unable to tell us the probability of any event whatsoever, even were we able to somehow perform an infinite random sequence of experiments!

Hypothetical frequentism fails to connect probabilities with finite frequencies. It connects them with limiting relative frequencies, of course, but again too tightly: for even in infinite sequences, the two can come apart. (A fair coin could land heads forever, even if it is highly unlikely to do so.)

As a result, von Mises' approach to probability is useless practically:

[S]cience has much interest in finite frequencies, and indeed working with them is much of the business of statistics. Whether it has any interest in highly idealized, hypothetical extensions of actual sequences, and relative frequencies therein, is another matter. The applicability to rational beliefs and to rational decisions go much the same way. Such beliefs and decisions are guided by finite frequency information, but they are not guided by information about limits of hypothetical frequencies, since one never has such information.

(Emphases mostly mine throughout.)

Issues with von Mises axioms of probability

2 Answers2

Linked