In several probability textbooks I have found what amounts to the following argument:
Let A be an event in some probabilistic experiment. Let p=P(A) be the probability of this event occurring in n trials. Let $M$ be the fraction of time $A$ occurs in $n$ trials:
$M = \frac{X_1+...+X_n}{n}$
where $X_i$ is 1 whenever A occurs, and 0 otherwise; in particular $E[X_i]=p$. From simple properties of expectation and variance:
$E[M] = \frac{E[X_1+...+X_n]}{n} = \frac{E[X_1]+...+E[X_n]}{n} = \frac{np}{n} = p$ $Var[M] = \frac{Var[X_1+...+X_n]}{n^2} = \frac{Var[X_1]+...+Var[X_n]}{n^2} = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}$
So using Chebyshev's inequality:
$P(|M-p|>\epsilon) \le \frac{\sigma^2}{n\epsilon^2}$
And so:
$\lim_{n \to \infty} P(|M-p|) = 0$
It is often claimed that this derivation links the mathematical theory of probability with the concept of frequency, but I think is not true and the derivation is either pointless or tautological, for consider the following: if you proceed purely from mathematical axioms, the result holds true in an abstract sense, but there is no logical reason for the particular quantities to have the interpretations we give to them intuitively, e.g. one can not interpret M as a frequency of occurrence without adding an additional axiom specifying what P(A) is, at least this is how it seems to me.
On the other hand, if you choose the frequency interpretation of probability, the moment you say "let p=P(A) be the probability of A" the very same moment you make an assumption of existence of a single number p that is the limit of the relative frequency of occurrence of the event A, so what amounts to placing:
$\lim_{n \to \infty} P(|M-p|) = 0$
among the axioms.
Ideally I would like to know what someone familiar with mathematical logic or research in foundations of mathematics where such issues are examined thinks about this, while in areas like set theory there are volumes written about issues of this kind, in probability theory, while there are plenty of philosophical books about various ways of interpreting probability, I have not found a single work on the mathematical logic of the subject, besides Kolmogorov's Grundbegriffe. My questions are the following:
Is my reasoning correct?
Is there any reason I miss for this derivation to be important or interesting in some sense?
Are there any works that examine probability theory from the standpoint of mathematical logic, where issues of this kind are made more clear?
For reference, textbooks are either very mysterious about this, or altogether avoid motivating or interpreting the result. Jim Pitman's "Probability", page 101 this is called a "mathematical confirmation of our intuitive idea of probability as a limit of long-run sequences". In Bertsekas and Tsitskilis, page 270, M is called the empirical frequency, and it is said that "Loosely speaking, this allows us to conclude that empirical frequencies are faithful estimates of p. Alternatively, this is a step towards interpreting the probability p as the frequency of occurrence of A.". Mark Kac in "Probability and related topics in the physical sciences", page 4, writes:
Actually, the theorem says disappointingly little. All it says, in fact, is the following: If the probability of a certain event was calculated in accordance with certain assumptions and rules, then the probability (again calculated, according to the same assumptions and rules) that the frequency with which the event will occur in a large assembly of trials will differ significantly form the calculated probability is low.
In the notes for a probability theory course by Rota and Baclawski, the interpretation seems more similar to what I have written above:
This is essentially just a psychological theorem, for it does not provide the information necessary for concrete applications. The Central Limit Theorem is far more useful and in fact the law of large numbers is a consequence of the Central Limit Theorem. We leave the proof as an exercise.
In any case the law of large numbers is a purely mathematical theorem. In order for it to make sense we must already have the concepts of probability, random variables, means, variances, etc. We cannot use this as a definition of probability. But we cannot even use the law of large numbers as a justification of the frequentist point of view. This point of view says that probabilities represent a physically measurable quantity (at least in principle). But there is no concept of a physical "measurement" corresponding to the mathematical concept of the limit:
lim n->inf of (X_1+...+X_n)/n
The relationship between physical experiments and the theory of probability is much more subtle than the frequentist point of view would have one belive.
Finally, Grinstead and Snell write what seems also very reasonable, but not very precise:
The Law of Large Numbers, which is a theorem proved about the mathematical model of probability, shows that this model is consistent with the frequency interpretation of probability.
Sorry for a partial repost, but apparently I wasn't able to phrase my issue clearly enough here, and having seen the attempted answers and having thought more, I think I am able to formulate it in a way allowing a proper answer.
– Jarosław Rzeszótko May 01 '14 at 20:34