7

In several probability textbooks I have found what amounts to the following argument:

Let A be an event in some probabilistic experiment. Let p=P(A) be the probability of this event occurring in n trials. Let $M$ be the fraction of time $A$ occurs in $n$ trials:

$M = \frac{X_1+...+X_n}{n}$

where $X_i$ is 1 whenever A occurs, and 0 otherwise; in particular $E[X_i]=p$. From simple properties of expectation and variance:

$E[M] = \frac{E[X_1+...+X_n]}{n} = \frac{E[X_1]+...+E[X_n]}{n} = \frac{np}{n} = p$ $Var[M] = \frac{Var[X_1+...+X_n]}{n^2} = \frac{Var[X_1]+...+Var[X_n]}{n^2} = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}$

So using Chebyshev's inequality:

$P(|M-p|>\epsilon) \le \frac{\sigma^2}{n\epsilon^2}$

And so:

$\lim_{n \to \infty} P(|M-p|) = 0$

It is often claimed that this derivation links the mathematical theory of probability with the concept of frequency, but I think is not true and the derivation is either pointless or tautological, for consider the following: if you proceed purely from mathematical axioms, the result holds true in an abstract sense, but there is no logical reason for the particular quantities to have the interpretations we give to them intuitively, e.g. one can not interpret M as a frequency of occurrence without adding an additional axiom specifying what P(A) is, at least this is how it seems to me.

On the other hand, if you choose the frequency interpretation of probability, the moment you say "let p=P(A) be the probability of A" the very same moment you make an assumption of existence of a single number p that is the limit of the relative frequency of occurrence of the event A, so what amounts to placing:

$\lim_{n \to \infty} P(|M-p|) = 0$

among the axioms.

Ideally I would like to know what someone familiar with mathematical logic or research in foundations of mathematics where such issues are examined thinks about this, while in areas like set theory there are volumes written about issues of this kind, in probability theory, while there are plenty of philosophical books about various ways of interpreting probability, I have not found a single work on the mathematical logic of the subject, besides Kolmogorov's Grundbegriffe. My questions are the following:

Is my reasoning correct?

Is there any reason I miss for this derivation to be important or interesting in some sense?

Are there any works that examine probability theory from the standpoint of mathematical logic, where issues of this kind are made more clear?

For reference, textbooks are either very mysterious about this, or altogether avoid motivating or interpreting the result. Jim Pitman's "Probability", page 101 this is called a "mathematical confirmation of our intuitive idea of probability as a limit of long-run sequences". In Bertsekas and Tsitskilis, page 270, M is called the empirical frequency, and it is said that "Loosely speaking, this allows us to conclude that em­pirical frequencies are faithful estimates of p. Alternatively, this is a step towards interpreting the probability p as the frequency of occurrence of A.". Mark Kac in "Probability and related topics in the physical sciences", page 4, writes:

Actually, the theorem says disappointingly little. All it says, in fact, is the following: If the probability of a certain event was calculated in accordance with certain assumptions and rules, then the probability (again calculated, according to the same assumptions and rules) that the frequency with which the event will occur in a large assembly of trials will differ significantly form the calculated probability is low.

In the notes for a probability theory course by Rota and Baclawski, the interpretation seems more similar to what I have written above:

This is essentially just a psychological theorem, for it does not provide the information necessary for concrete applications. The Central Limit Theorem is far more useful and in fact the law of large numbers is a consequence of the Central Limit Theorem. We leave the proof as an exercise.

In any case the law of large numbers is a purely mathematical theorem. In order for it to make sense we must already have the concepts of probability, random variables, means, variances, etc. We cannot use this as a definition of probability. But we cannot even use the law of large numbers as a justification of the frequentist point of view. This point of view says that probabilities represent a physically measurable quantity (at least in principle). But there is no concept of a physical "measurement" corresponding to the mathematical concept of the limit:

lim n->inf of (X_1+...+X_n)/n

The relationship between physical experiments and the theory of probability is much more subtle than the frequentist point of view would have one belive.

Finally, Grinstead and Snell write what seems also very reasonable, but not very precise:

The Law of Large Numbers, which is a theorem proved about the mathematical model of probability, shows that this model is consistent with the frequency interpretation of probability.

Nate Eldredge
  • 97,710
  • 1
  • Sorry, I am just looking for a satisfying answer, and I thought those were two completely distinct communities. – Jarosław Rzeszótko May 01 '14 at 07:18
  • You can take a look at Jan von Plato, Creating Modern Probability Its Mathematics Physics and Philosophy in Historical Perspective (1994). – Mauro ALLEGRANZA May 01 '14 at 16:01
  • Please delete this question, it was initially framed too vaguely to produce meaningful answers, and I made it more concrete here: http://math.stackexchange.com/questions/777493/do-the-kolmogorovs-axioms-permit-speaking-of-frequencies-of-occurence-in-any-me

    Sorry for a partial repost, but apparently I wasn't able to phrase my issue clearly enough here, and having seen the attempted answers and having thought more, I think I am able to formulate it in a way allowing a proper answer.

    – Jarosław Rzeszótko May 01 '14 at 20:34

6 Answers6

4

As the link provided by the previous poster also states, the LLN is a result relating the axiomatic concept of probability to the statistical concept of frequency. Probability theory per se does not deal with the physical meaning of probability.

From a philosophical perspective, you can either be a frequentist which implies that probability is only meaningful as a frequency derived from repeating your experiment; or you can be a naturalist (aka Bayesian) which implies that probability is a measure of the uncertainty in inherent nature and that frequency is a way to assess that uncertainty. And that is the main philosophical contribution of the LLN and Glivenko-Cantelli lemma.

I use the word naturalist because physicists were thinking about probability this way long before statisticians and their frequentist interpretation muddied the waters. Frequentism denies meaning to statements like the probability of rain tomorrow is 45% and also all of quantum mechanics.

I would suggest Cox's seminal paper Probability, Frequency, and Reasonable Expectation in American Journal of Physics 14:1-13 (1946)

firdaus
  • 703
  • 1
    The very point it that I think that the LLN can not relate the axiomatic concept of probability to the statistical concept of frequency, for in the purely mathematical framework, while the theorem is true, nothing in the axioms allows to talk of p or M as frequencies. – Jarosław Rzeszótko Apr 30 '14 at 16:10
  • Could you clarify what you mean by the statement : "in the purely mathematical framework, while the theorem is true, nothing in the axioms allows to talk of p or M as frequencies" ? – firdaus Apr 30 '14 at 17:16
  • Could you clarify what you mean by the statement : "in the purely mathematical framework, while the theorem is true, nothing in the axioms allows to talk of p or M as frequencies" ?. Starting with P([X<x] ) as axiomatic, we define E[X] purely through measure-theoretic axioms. So far - no appeal to frequency is made. It's only when we get to the LLN (Glivenko-Cantelli) that we make the relation between the limit of empirical mean (frequency) to the expectation (probability). So I'm not sure why we would want frequency / empirical-mean as an axiomatic concept in the definition of P(X) ? – firdaus Apr 30 '14 at 17:23
  • 1
    I think what is called the "empirical mean" can not from the logical point of view be thought of as a relative frequency if only the Kolmogorov axioms are assumed, IMO the LLN by itself does not establish any connection at all within the mathematical theory between frequency and probability, it only appears to make such connection because the concepts have such forceful intuitive interpretations. In other words, people who claim that LLN connects probability to frequency, make a tacit assumption about what the various variables represent, but nothing of the sort follows from the axioms. – Jarosław Rzeszótko Apr 30 '14 at 17:38
  • Think about the meaning of E[X_1+...+X_n] if p is not up front known to be the relative frequency in the limit of infinite number of trials. – Jarosław Rzeszótko Apr 30 '14 at 17:40
  • Basically M is not the number of successes in n trials if p is not already assumed to be a relative frequency. – Jarosław Rzeszótko Apr 30 '14 at 17:56
2

I finally found a particularly clear example in the book "A Treatise of Probability" by Keynes, that I think shows beyond any doubt that if p is anything but a number defined a priori to precisely satisfy $\lim_{n \to \infty} P(|M-p|) = 0$, the WLLN ceases to be interpretable as a valid statement about frequencies:

The following example from Czuber will be sufficient for the purpose of illustration. Czuber’s argument is as follows: In the period 1866–1877 there were registered in Austria

m = 4,311,076 male births

n = 4,052,193 female births

s = 8,363,269

for the succeeding period, 1877–1899, we are given only

m' = 6,533,961 male births;

what conclusion can we draw as to the number n of female births? We can conclude, according to Czuber, that the most probable value

n' = nm'/m = 6,141,587

and that there is a probability P = .9999779 that n will lie between the limits 6,118,361 and 6,164,813. It seems in plain opposition to good sense that on such evidence we should be able with practical certainty P = .9999779 = 1 − 1/45250 to estimate the number of female births within such narrow limits. And we see that the conditions laid down in § 11 have been flagrantly neglected. The number of cases, over which the prediction based on Bernoulli’s Theorem is to extend, actually exceeds the number of cases upon which the à priori probability has been based. It may be added that for the period, 1877–1894, the actual value of n did lie between the estimated limits, but that for the period, 1895–1905, it lay outside limits to which the same method had attributed practical certainty.

Maybe this is something obvious in retrospect, that additional assumptions might be needed to interpret probabilities and that those assumptions might overlap theorems of abstract probability theory, but I have browsed through literally dozens of textbooks and they all either do not provide any interpretation or motivation for this result, or say something false or "not even false" - too vague to even have any meaning. Richer with this insight I actually found the following statement in Kolmogorov's Grundbegriffe, in section 2 of chapter 1, "The Relation to Experimental Data":

We apply the theory of probability to the actual world of experiment in the following manner:

...

4) Under certain conditions, which we shall not discuss here, we may assume that the event A which may or may not occur under conditions S, is assigned a real number P(A) which has the following characteristics:

a) One can be practically certain that if the complex of conditions S is repeated a large number of times, n, then if m be the number of occurrences of event A, the ratio m/n will differ very slightly from P(A).

As far as I understand, this is Kolmogorov essentially saying the WLLN becomes an additional axiom if any attempts are made to give interpretation to his very general theory, at least in the particular interpretation he envisioned. Unless this is done, there is no ground for treating M as a relative frequency, it is a term of unknown interpretation, just as a line is an entity of unknown interpretation from within the axiomatic framework of geometry. WLLN is the very assumption that allows interpreting M as a relative frequency. This actually contradicts statements of the kind claimed in one of the answers and stated in some textbooks:

LLN is a result relating the axiomatic concept of probability to the statistical concept of frequency.

It is either a theorem that has nothing to do with frequency, or an assumption or additional axiom.

1

I believe my answer here might be what you're looking for. I'm also assuming the question asked there is close to what you're asking here.

Alex R.
  • 32,771
  • I have seen this thread before asking the question, it is a variation on the some topic, I however do realize the difference between probability as a mathematical theory and its real world models, and I think my question still stands with that out of the way. – Jarosław Rzeszótko Apr 30 '14 at 15:49
  • @JarosławRzeszótko: I'm not entirely sure what you're asking for then. When you say "$p$ is not defined" this is ambiguous. In probability theory $p$ is very well defined, it's whatever $P(A)$ is. Whereas as a frequentist, you define $p$ through something like the strong law of large numbers. – Alex R. Apr 30 '14 at 16:03
  • @JarosławRzeszótko: so for example, if you encounter a coin, then mathematically you can define it as a random variable with $p=0.50$. In the real world you'd have to confirm that fact somehow, perhaps by flipping it many times and then looking at the average. – Alex R. Apr 30 '14 at 16:04
  • @JarosławRzeszótko: The key is that rather than saying you are sure that $p=0.50$ in the example above, you can reasonsably discredit statements such as $p=0.51$ with enough trials. Naturally, there's no infinite accuracy that would allow you to conclude that $p=0.50$ but even with a few hundred throws, you can conclude say, $0.40<p<0.60$. – Alex R. Apr 30 '14 at 16:06
  • "not defined" is sloppy wording on my side, I meant it does not allow for an interpretation as a frequency. My question is about the overall significance of the result shown, and about the confusing comments in the textbooks, as in the comment above, many authors seem to state something to the effect that this law connects probabilities to frequencies, I think that you cannot conclude anything of the sort purely from the mathematical derivation of the law, and would like to have some confirmation on this. – Jarosław Rzeszótko Apr 30 '14 at 16:14
1

This became rather long for a comment.

I think we have here a model which was designed to be a good model of real-life problems. The real decision which is being made when using the ideas of probability is that this is a good model. And if there is reason to believe it is a good model, then we believe that the consequences we derive from the model apply to our real life situation.

Here this is not just that the limit is $p$ but also that the variation in outcomes is insufficient to prevent there being a limit. How would we measure probability in the real world without such a limit existing, you ask. Well a priori it is conceivable that our situation is wholly symmetrical between outcomes, with no reason to believe that one is preferred to another, yet the inherent variability of the process is so great that no limit exists. If our model applies, this cannot happen. And the model does apply, and constrains our ability to think what the world might be like if it didn't.

If the model didn't work so well, we'd be using a different model.

Mark Bennet
  • 100,194
0

Take a look at Burdzy's book,

The Search for Certainty: On the Clash of Science and Philosophy of Probability.

  • 1
    The book is titled "The Search for Certainty: On the Clash of Science and Philosophy of Probability" (for further reference) – Julien__ May 01 '18 at 18:55
0

As my answer on the other question, in a sense you are correct.

You are correct because obviously the mathematical axioms include the derivation of the LLN or WLLN (i presume this is what one could call a "weak" tautology), however it is a question if the other way follows too (meaning assuming LLN or WLLN, derive the axioms of probability) which can be answered in the negative (since the simple axioms of probability hold in other cases except LLN or WLLN).

Nikos M.
  • 2,158