Is the Law of Large Numbers empirically proven?

Question

Does this reflect the real world and what is the empirical evidence behind this?

Wikipedia illustration

Layman here so please avoid abstract math in your response.

The Law of Large Numbers states that the average of the results from multiple trials will tend to converge to its expected value (e.g. 0.5 in a coin toss experiment) as the sample size increases. The way I understand it, while the first 10 coin tosses may result in an average closer to 0 or 1 rather than 0.5, after 1000 tosses a statistician would expect the average to be very close to 0.5 and definitely 0.5 with an infinite number of trials.

Given that a coin has no memory and each coin toss is independent, what physical laws would determine that the average of all trials will eventually reach 0.5. More specifically, why does a statistician believe that a random event with 2 possible outcomes will have a close to equal amount of both outcomes over say 10,000 trials? What prevents the coin to fall 9900 times on heads instead of 5200?

Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world?

EDIT: I do differentiate between the LLN and the Gambler's fallacy. My question is NOT if or why any specific outcome or series of outcomes become more likely with more trials--that's obviously false--but why the mean of all outcomes tends toward the expected value?

FURTHER EDIT: LLN seems to rely on two assumptions in order to work:

The universe is indifferent towards the result of any one trial, because each outcome is equally likely
The universe is NOT indifferent towards any one particular outcome coming up too frequently and dominating the rest.

Obviously, we as humans would label 50/50 or a similar distribution of a coin toss experiment "random", but if heads or tails turns out to be say 60-70% after thousands of trials, we would suspect there is something wrong with the coin and it isn't fair. Thus, if the universe is truly indifferent towards the average of large samples, there is no way we can have true randomness and consistent predictions--there will always be a suspicion of bias unless the total distribution is not somehow kept in check by something that preserves the relative frequencies.

Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?

NOTE: A good explanation would not be circular: justifying probability with probabilistic assumptions (e.g. "it's just more likely"). Please check your answers, as most of them fall into this trap.

Empirically and Proven are kind of opposite terms in a sense, aren't they? — barak manos, Jan 29 '15 at 15:44
Sure, but you know that by "empirical" I mean "well supported by evidence" :) — vantage5353, Jan 29 '15 at 15:45
A statistician will believe that a random event with 2 possible outcomes will have a close to equal amount of both outcomes only when both the outcomes have the same probability of happening. — Hayden, Jan 29 '15 at 15:47
Exactly, Hayden. But why would nature care to even out the amount of both outcomes even if they are equally likely every trial? — vantage5353, Jan 29 '15 at 15:50
"What prevents the coin to fall 9900 times on heads instead of 5200?" If the coin is fair, nothing prevents the 9900 result, it can happen. it's just much less likely to happen if you flipped 10,000 fair coins repeatedly. — , Jan 29 '15 at 15:50
trb456, but why would nature do that? Mathematically it's less likely, but why would physics comply? — vantage5353, Jan 29 '15 at 15:54
If you have infinitely many number of samples.. can you have? — Seyhmus Güngören, Jan 29 '15 at 15:54
Seyhmus, statisticians rely on this assumption with sample sizes as small as 1,000 trials. Reference: Wikipedia chart — vantage5353, Jan 29 '15 at 15:56
@user1891836 Exactly what trb456 said. It could happen, it could not happen. But as $n\rightarrow \infty$, the average will tend towards the expected value. All it takes is to do such an experiment, like the graphic you include. (At a macroscopic scale, i.e. ignoring quantum mechanics) why do things either happen or don't happen? Logic stills seems to apply rather well in everyday life, so why shouldn't we expect probability theory to model that well either? — Hayden, Jan 29 '15 at 15:56
Hayden, it is precisely my logic that can't substantiate the idea that things somehow balance out in the very long run. Except for Divine command, I don't see how you can support the position that equilibrium is a natural tendency. — vantage5353, Jan 29 '15 at 15:59
@user1891836 Please note that the law of large numbers does not suggest that "the amounts of heads and tails will eventually even out": with 2000 coin tosses, it is very unlikely to see tails 1000 times more than heads, but with 1,000,000 coin tosses it could very well happen. The law of large numbers only says that the deviation grows slower than the number of coin tosses, thus the proportions of heads and tails, not the amounts, will even out in the long run. — JiK, Jan 29 '15 at 16:08
There's no tendency for it to "even out" in the sense that if you saw more heads at the beginning, you'd see more tails at the end. Suppose you flipped a fair coin 10 times and it came up heads 10 times. If you flipped 100 more times, then the expected number of heads altogether would be 60, not 55. The coin doesn't secretly know that it came up heads too many times and fixes it up by coming up tails more later. — arsmath, Jan 29 '15 at 16:19
Finally, since gambling and insurance institutions rely on such expectations, are there any experiments that have conclusively shown the validity of the LLN in the real world? - Isn't the fact that insurances are really solid businesses with billions of dollars of revenues one of the best empirical proof you can imagine? I mean, try do conduct a study with hundreds of millions of people for 100 years in a laboratory :P — Ant, Jan 29 '15 at 16:44
@JiK I understand the Gambler's fallacy, and I know the probabilities are the same the 1st trial or the 1001st. I want to know *why the average of all trials tends towards the expected*. ;) — vantage5353, Jan 29 '15 at 17:08
If a visualisation would help, think of a Galton Board: http://i.ytimg.com/vi/oPCcOtQKU8M/hqdefault.jpg
Each path through the board has an equal probability. But a lot more of the paths end in the centre than at the edge! — Ben Aaronson, Jan 30 '15 at 15:06
@BenAaronson Great illustration. Thus, wouldn't it be fair to say that a ball falling at the edge more times than in other slots is harder than it falling in the center? — vantage5353, Jan 31 '15 at 10:45
You seem to have the intuition of a single coin flip down, so here's a way to think about it. Think of a set of 10,000 coin flips as a trial with two outcomes (just like a coin flip): either the fraction of heads in those 10,00 flips is (i) close to 50% (ii) not close. Unlike a coin flip, these two outcomes do not have the same probability. It is much more probable that the fraction of heads will be close to 50%. So, sure, there's no reason you couldn't flip 10,000 coins and get 60% heads. But you might have to do hundreds of trials of 10,000 coin flips to get just one set with 6,000 heads. — d_b, Jan 31 '15 at 11:28
@user37496 Sure, but we are trying to get to the physical reason for such probability to objectively exist in the first place. If the universe was truly indifferent towards all trials and all sample sizes, probability theory would be useless, because predicting a certain average would be just as likely as predicting another. What bothers me is that LLN assumes the universe is NOT indifferent towards large samples; that's why we expect the average to conform to our expectations. Thus, you can't have randomness without the universe "protecting" it in the long run. — vantage5353, Jan 31 '15 at 11:39
I'm surprised that you accepted a great answer yesterday, but are still arguing against it today. I'm surprised that you asked on math.SE, but keep asking for a physical explanation. The closest I can give you to a physical explanation is the many-worlds interpretation. Imagine a set of 16 universes in which the coin flips go: HHHH, HHHT, HHTH, HHTT, HTHH, HTHT, HTTH, HTTT, THHH, THHT, THTH, THTT, TTHH, TTHT, TTTH, TTTT. If you are born into one of these at random, there are 6 in which you get equal heads/tails but only 1 in which you get all heads. — DeveloperInDevelopment, Jan 31 '15 at 14:13
@imsotiredicantsleep Obviously, my question wasn't well formulated by me and was misunderstood by most. The true problem is not whether the axioms are right, but why the universe would care about a large sample's average. The combinatorial explanation is only half satisfactory because it doesn't address what preserves the overall randomness over lots of trials. There is no need to reiterate the obvious: there are more combinations with less strict criteria. — vantage5353, Jan 31 '15 at 15:01
"There is no need to reiterate the obvious" - actually, I think there is. The answers you already have are good, but perhaps they come from a position of familiarity with the material and a mind-set that you don't share. I don't think you will get any different answers, only restatements of answers already given - hopefully one framed within a paradigm you are comfortable with. You have the complete answer between the combinatorial answer (for whole repeated trials) and Erick Wong's second paragraph (for increasing N). If you don't follow these, they need restating. — DeveloperInDevelopment, Jan 31 '15 at 16:21
@user1891836: Trying to explain from a different angle: the universe 100% doesn't care about preserving overall randomness. There is no "elastic" pulling the mean back. In fact, imagine a coin-flip game where you score +1 for H and -1 for T. Your expected mean score is 0. Suppose on your first throws you get HHHH, scoring +4. Now your expected average for the whole game is actually +4 - not 0! But the expected average for the rest of the game is still 0. Over the course of, say, 1000 throws, your +4 advantage will contribute 4/1000 of the average, and the rest will count for 996/1000. — psmears, Jan 31 '15 at 18:19
To put it another way - the initial +4 imbalance gets reduced to 0 not because the universe somehow contrives to balance it with a -4, but because over a large number of trials it counts for less and less (4/10, 4/1000, 4/1000000...). A large average discrepancy always remains possible, but it becomes less and less likely, since it can only happen with a larger and larger number of heads, the probability of which decreases exponentially. — psmears, Jan 31 '15 at 18:19
With the coin flip the (near) 50/50 split is a consequence of the physical fact that the outcome of the flip is extremely sensitive to the tiny differences in the initial motion given to the coin by the flipping action. This is unlike the motion of, say, the football thrown by an experienced quarterback. In that case the variations to the motion are small enough so that a receiver can adjust. — Jyrki Lahtonen, Jan 31 '15 at 18:25

Erick Wong · Accepted Answer · 2018-05-19T21:17:13.247

70

Reading between the lines, it sounds like you are committing the fallacy of the layman interpretation of the "law of averages": that if a coin comes up heads 10 times in a row, then it needs to come up tails more often from then on, in order to balance out that initial asymmetry.

The real point is that no divine presence needs to take corrective action in order for the average to stabilize. The simple reason is attenuation: once you've tossed the coin another 1000 times, the effect of those initial 10 heads has been diluted to mean almost nothing. What used to look like 100% heads is now a small blip only strong enough to move the needle from 50% to 51%.

Now combine this observation with the easily verified fact that 9900 out of 10000 heads is simply a less common combination than 5000 out of 10000. The reason for that is combinatorial: there is simply less freedom in hitting an extreme target than a moderate one.

To take a tractable example, suppose I ask you to flip a coin 4 times and get 4 heads. If you've flip tails even once, you've failed. But if instead I ask you to aim for 2 heads, you still have options (albeit slimmer) no matter how the first two flips turn out. Numerically we can see that 2 out of 4 can be achieved in 6 ways: HHTT, HTHT, HTTH, THHT, THTH, TTHH. But the 4 out of 4 goal can be achieved in only one way: HHHH. If you work out the numbers for 9900 out of 10000 versus 5000 out of 10000 (or any specific number in that neighbourhood), that disparity becomes truly immense.

To summarize: it takes no conscious effort to get an empirical average to tend towards its expected value. In fact it would be fair to think in the exact opposite terms: the effect that requires conscious effort is forcing the empirical average to stray from its expectation.

edited May 19 '18 at 21:17

answered Jan 29 '15 at 16:24

Erick Wong

25,198
3
37
91

I'm aware of the Gambler's fallacy--that's why I discuss the mean of all tosses rather than any specific toss or series of tosses within the sample. I understand that you can't expect any one or series of tosses to correct, but the total average has a peculiar tendency towards the expected value as N increases. Your combinatorial example makes sense, and is something I didn't consider. Could you elaborate on the physics of "less freedom in hitting an extreme target than a moderate one"? – vantage5353 Jan 29 '15 at 16:35
6

@user1891836 You don't need physics for this result, just combinatorics. In this context "freedom in hitting a target" is just a count of how many ways you can possibly succeed in hitting that target. – Keen Jan 29 '15 at 16:39
1

@user1891836 To be clear, the physics do determine how each coin-flip plays out in the real world, yes, but we can abstract the physics away once we have a higher-level idea about how each coin-flip behaves. – Keen Jan 29 '15 at 16:42
OK. But why do more options necessarily make an outcome more likely in the real world? If every trial is independent of previous ones, why would the mean of any sample -- no matter how big -- tend towards the expected value? – vantage5353 Jan 29 '15 at 16:49
3

Essentially, because for very large numbers of coin flips, the probability of seeing anything far away from the mean is very, very small. – arsmath Jan 29 '15 at 17:06
4

@user1891836 The scenario you described includes the assumption that each coin-flip has an equal probability of producing the outcome heads as the outcome tails. That's for a single trial. When you look at two independent trials, you combine the two outcomes of two single trials to get a total of four possible outcomes: HH, HT, TH, TT. Each outcome has equal probability because the single-trial outcomes had equal probability and the trials are independent. Notice that half of the two-trial outcomes are balanced, even though zero of the single-trial outcomes are. Then imagine more trials. – Keen Jan 29 '15 at 17:57
@Erick Wong Still trying to take in all in, but if I understood correctly, it simply takes more effort to make things happen non-probabilistically than the other way round. As the range of possible combinations decreases, it becomes harder for a physical system to conform to the unexpected. Is this correct? – vantage5353 Jan 30 '15 at 10:29
@Cory Still trying to take in all in, but if I understood correctly, it simply takes more effort to make things happen non-probabilistically than the other way round. As the range of possible combinations decreases, it becomes harder for a physical system to conform to the unexpected. Is this correct? – vantage5353 Jan 30 '15 at 10:47
1

@user1891836 It's a little hard to translate that final question into concrete terms. Who's making this "effort", and what does it mean for it to be "harder"? Do you understand (intellectually and intuitively) the example with 4 coin flips? – Ben Aaronson Jan 30 '15 at 13:37
3

@user1891836 There's no effort involved. There is no such thing as harder or easier. The single basic fact is that each coin-flip is fair and independent, because you chose to ask about fair, independent coin-flips. If there are 1000 possible outcomes, and they're all equally probable, then every outcome has a 0.1% probability. If you group together 97 outcomes, the probability of getting an outcome in that group is 9.7%, the sum of the individual probabilities. Now suppose that 500 out of 1000 outcomes are balanced. What is the probability of a selected outcome being in this balanced group? – Keen Jan 30 '15 at 19:52
@BenAaronson I understand that multiple options increase the likelihood of the event, but this assumes the universe is not only indifferent towards what the outcome of a single trial is going to be, but also that is NOT indifferent towards any particular outcome happening too frequently relative to others. If LLN is valid, you can't go around this. – vantage5353 Jan 31 '15 at 10:51
@user1891836 If I ask you to flip a coin three times, there are the outcomes HHH,HHT,HTH,HTT,THH,THT,TTH,TTT. How many outcomes are there of different types. There are simply more for a closer split H-T. – Answer Jan 31 '15 at 13:14
1

@user1891836 The universe is completely indifferent towards any particular outcome - the whole idea relies on the assumption that two coin tosses are independent. If you have 100 coins, there are about 10^30 different ways to get between 45 and 55 heads. On the other hand there are only ~10^13 different throws that end up giving you 90-100 heads. Every single of those sequences of coin tosses is equally likely, but there are just way more that lead to the first result than the later. – Voo Jan 31 '15 at 15:12
1

@Voo, Yes, but more options don't guarantee an improbable event should not happen or should happen less frequently. And since LLN is so pervasively used, it assumes most of the time, most trials would result in the expected, which leads to the probabilist assumption that the universe somehow "prefers" expected averages more than unexpected. Otherwise, we'd say there is a bias or the phenomenon is not truly random. I.e. you can't say the universe never cares, yet believe in LLN. You can't have it both ways. – vantage5353 Jan 31 '15 at 15:17
3

@user1891836 The universe doesn't have to care. Let's try another way: Imagine a bag full of red and blue balls: A blue ball represents one way how you could get between 4 and 6 tails when tossing 10 coins, a red ball represents one way how you could get between 8 and 10 tails. In this rather large bag there will be 728 balls, but if you randomly grab one ball without looking your chances to get a blue ball are much bigger since there are 672 blue balls but only 56 red ones. – Voo Jan 31 '15 at 15:31
@Voo But if the universe doesn't care about somehow preserving the randomness in the long run, you won't have LLN! One time the average of 10,000 trials could be 50.67, yet another 73, yet another 64. This would usually lead one to think there is a bias, but in a truly indifferent universe, you wouldn't have a consistent average no matter how big the sample. There is no need to reiterate the obvious: there are more possibilities with less strict criteria. This shouldn't guarantee the actual results, however. – vantage5353 Jan 31 '15 at 15:38
3

@user1891836 If it's obvious that it's billion, billion times more likely to get a result close to the average than an edge case, what are you surprised about then? Your chances for getting an average of 73 is infinitesimally small compared to getting something close to 50 which is why you are unlikely to ever see it - it could happen though and for such a still relatively small number if you started now until the end of the universe it'd be even relatively likely to observe such a result. – Voo Jan 31 '15 at 15:47

score 20 · Answer 2 · 2015-01-31T19:22:43.863

Nice question! In the real word, we don't get to let $n \to \infty$, so the question of why LLN should be of any comfort is important.

The short answer to your question is that we cannot empirically verify LLN since we can never perform an infinite number of experiments. Its a theoretical idea that is very well founded, but, like all applied mathematics, the question of whether or not a particular model or theory holds is a perennial concern.

A more useful law from a statistical standpoint is the Central Limit Theorem and the various probability inequalities (Chebyshev, Markov, Chernov, etc). These allow us to place bounds on or approximate the probability of our sample average being far from the true value for a finite sample.

As for an actual experiment to test LLN. One can hardly do better than John Kerrichs 10,000 coin flip experiment-- he got 50.67% heads!!

So, in general, I would say LLN is empirically well supported by the fact that scientists from all fields rely upon sample averages to estimate models, and this approach has been largely successful, so the sample averages appear to be converging nicely for finite, and feasible, sample sizes.

There are "pathological" cases that one can construct (I'll spare you the details) where one needs astronomical sample sizes to get a reasonable probability of being close to the true mean. This is apparent if you are using the Central Limit Theorem, but the LLN is simply not informative enough to give me much comfort in day-to-day practice.

The physical basis for probability

It seems you still an issue with why long-run averages exist in the real world, apart from the theory of probability regarding the behavior of these averages assuming long-run averages exist. Let me state a fact that may help you:

Fact Nether probability theory nor the existence of a long-run averages requires randomness !

The determinism vs. indeterminism debate is for philosophers, not mathematics. The notion of probability as a physical observable comes from ignorance or absence of the detailed dynamics of what you are observing. You could just as easily apply probability theory to a boring 'ol pendulum as to the stock market or coin flips...its just that with pendulum's we have a nice, detailed theory that that allows us make precise estimates of future observations. I have no doubt that a full physical analysis of a coin flip would allow for us to predict what face would come up...but in reality, we will never know this!

This isn't an issue though. We don't need to assume a guiding hand nor true indeterminism to apply probability theory. Lets say that coin flips are truly deterministic, then we can still apply probability theory meaningfully if we assume a couple basic things:

The underlying process is $ergodic$...okay, this is a bit technical, but it basically means that the process dynamics are stable over the long term (e.g., we are not flipping coins in a hurricane or where tornados pop in and out of the vicinity!). Note that I said nothing about randomness...this could be a totally deterministic, albeit very complex, process...all we need is that the dynamics are stable (i.e., we could write down a series of equations with specific parameters for the coin flips and they wouldn't change from flip to flip).
The values the process can take on at any time are "well behaved". Basically, like I said earlier wrt the Cauchy...the system should not produce values that consistently exceed $\approx n$ times the sum of all previous observations. It may happen once in a while, but it should become very rare, very fast (precise definition is somewhat technical).

With these two assumptions, we now have the physical basis for the existence of a long-run average of a physical process. Now, if its complicated, then instead of using physics to model it exactly, we can apply probability theory to describe the statistical properties of this process (i.e., aggregated over many observations).

Note that the above is independent from whether or not we have selected the correct probability model. Models are made to match reality...reality does not conform itself to our models. Therefore, it is the job of the modeler, not nature or divine provenance, to ensure that the results of the model match the observed outcomes.

Hope this helps clarify when and how probability applies to the real world.

John Kerrichs's experiment is a fascinating example. If LLN is valid enough for 10,000+ coin tosses, there is an obvious link btw n and mean value, which is in total conflict with the unpredictability of a single or small number tosses. — vantage5353, Jan 29 '15 at 16:18
@user1891836 I'm sorry, I don't follow your reasoning. LLN is always mathematically valid, and yes, the observed average is quite close to that of a fair coin (assuming the coin John kerrich was using was indeed fair). There's a bit of chicken or the egg issue here...what are we assuming and what is being tested? — , Jan 29 '15 at 16:21
@user1891836 you appear mystified that some physical processes exhibit stability over the long term. If you interpret a probability as a frequency of occurrence of an event, then any process that exhibits periodic stability can be assigned meaningful probability statements. There is a more technical notion of "Ergodic" processes that extends this to non-periodic processes. I won't get into it, but I think taking a look at Chaos theory will help show you why we can use probability and why it works. — , Jan 29 '15 at 16:25
I'm interested in the physical evidence. For practical purposes, let's assume 10,000 tosses have a mean of 0.5067 as Kerrich empirically verified. Why would a statistician expect a similar result in another such trial. It could turn out that the mean is 0.58 or 0.67. In other words, what is the physical law that attracts the mean toward 0.5? — vantage5353, Jan 29 '15 at 16:27
@user1891836 simple example: you are observing a pendulum, then the probability that it forms an angle $\theta$ wrt the vertical is equal to the probability that it forms an angle $-\theta$. A process does not have to be unpredictable50.67 to have a frequentist probability. — , Jan 29 '15 at 16:27
@user1891836 as others here have stated, nothing is "attracting it". Considering 10,000 tosses - imagine getting an average >0.7 when its a fair coin...this would require getting substantially more heads than normal, so it won't occur as often. It still does, but, if you conducted many experiments of tossing 10,000 coins, most would be near the true probability of heads. how many trials we need is partly what keeps statisticians in business ;-) — , Jan 29 '15 at 16:32
I see what you are getting at. I guess I have a hard time swallowing that the mean is empirically close to expected value. :) — vantage5353, Jan 29 '15 at 16:39
@user1891836 no worries...probability is not intuitive. Also, the empirical justification for using statistics (or any applied mathematics), is that it gives good results! If it didn't, we'd have stopped writing equations and taking sample averages a long time ago. Sometimes you just have to try something out and see if it tends to work, then hope that whaetver was driving the dynamics does not significantly change. — , Jan 29 '15 at 16:42
@user1891836: One point I think you may not have understood. Just because John Kerrich got near 50% does not mean it was attracted there. Even according to our probability theory model of the real world, it is just one out of many possible outcomes. What one person gets does not in the slightest prove or disprove empirically the theory. If he had gotten 90%, it would have meant nothing for the theory, except that he ought to get suspicious of the coin even though it is not provably biased based on just the coin flips! — user21820, Jan 30 '15 at 02:57
@user1891836 I've updated this post too. Seems you still have some concerns. — , Jan 31 '15 at 19:35

score 16 · Answer 3 · 2015-02-02T15:14:03.950

16

This isn't an answer, but I thought this group would appreciate it. Just to show that the behavior in the graph above is not universal, I plotted the sequence of sample averages for a Standard Cauchy distribution for $n=1...10^6$!. Note how, even at extremely large sample sizes, the sample average jumps around.

If my computer weren't so darn slow, I could increase this by another order of magnitude and you'd not see any difference. The sample average for a Cauchy Distribution behaves nothing like that for coin flips, so one needs to be careful about invoking LLN. The expected value of your underlying process needs to exist first!

enter image description here

Response to OP concerns

I did not bring this example up to further concern you, but merely to point out that "averaging" does not always reduce the variability of an estimate. The vast majority of the time, we are dealing with phenomena that possess an expected value (e.g., coin tosses of a fair coin). However, the Cauchy is pathological in this regard, since it does not possess an expected value...so there is no number for your sample averages to converge to.

Now, many moons ago when I first encountered this fact, it blew my mind...and shook my confidence in statistics for a short time! However, I've come to be comfortable with this fact. At the intuitive level (and as many of the posters here have pointed out) what the LLN relies upon is the fact that no single outcome can consistently dominate the sample average...sure, in the first few tosses the outcomes do have a large influence, but after you've accumulated $10^6$ tosses, you would not expect the next toss to change your sample average from, say, 0.1 to 0.9, right? It's just not mathematically possible.

Now enter the Cauchy distribution...it has the peculiar property that, no matter how many values you are currently averaging over, the absolute value of the next observation has a good (i.e., not vanishingly small - this part is somewhat technical, so maybe just accept this point) chance of being larger (much larger, in fact) than n times the sum of all previous values observed...take a moment to think about this, this means that at any moment, your sample average can be converging to some number, then WHAM, it gets shot off in a different direction. This will happen infinitely often, so you're sample average will never settle down like it does with processes that possess an expected value (e.g., coin tosses, normally distributed variables, poisson, etc.). Thus, you will never have an observed sum and an $n$ large enough to swamp the next observation.

I've asked @sonystarmap if he/she would mind calculating the sequence of medians, as opposed to the sequence of averages in their post (similar to my post above, but for 100x more samples!) What you should see is that the median of a sequence of Caychy random variables does converge in LLN fashion. This is because the Cauchy, like all random variables, does possess a median. This is one of the many reasons I like using medians in my work, where Normality is almost surely (sorry, couldn't help myself) false and there are extreme fluctuations. Not to mention the sample median minimizes the average deviation from the mean, when it does exist.

Second Addition: Cauchy DOES have a Median

To add another detail (read:wrinkle) to this story, the Cauchy does have a median, and so the sequence of medians does converge to the true median (i.e., $0$ for the standard Cauchy.) To show this, I took the exact same sequence of standard cauchy variates I used to make my first graph of the sample averates, and then took the first 20,000 and broke it up into four intervals of 5000 observations each (youll see why in a moment). I then plotted the sequence of sample medians as the samep size approaches 5000 for each of the four independent sequence. Note the dramatic difference in convergence properties!

This is another application of the law of large numbers, but to the sample median. Details can be seen here.

enter image description here

edited Feb 02 '15 at 15:14

answered Jan 29 '15 at 21:42

1

Good point. You can't very well compare your average result with $\mu$ if $\mu$ is undefined. – KSmarts Jan 29 '15 at 22:32
What did you use to produce this graph? – detly Jan 30 '15 at 01:15
@detly I used R to produce the Cauchy variates, then did the averages and graph itself in Excel. – Jan 30 '15 at 01:35
1

@Eupraxis1981 Of what use is LLN with such an example?! – vantage5353 Jan 30 '15 at 09:48
3

@user1891836 Are you questioning (1) The general usefulness of LLN given that such an example exists, (2) how one would use the LLN in this example...if at all, or (3) the relevance of my example to your concerns? – Jan 30 '15 at 13:23
All three questions are interesting :) – Piotr Dobrogost Jan 31 '15 at 10:28
@Eupraxis1981 I am questioning how LLN can be valid with such an inconsistent average. I am also seriously bothered by what makes randomness random--i.e. what universal law protects the balance of outcomes in the long run. – vantage5353 Jan 31 '15 at 11:12
There is no universal law that balances the process. The process (e.g. coin flip) balances itself since every coinflip is independent of all prior flips, so it doesn't matter if the first 100 flips where head and not a single one was tail, in the long rund these 100 flips are irrelevant. – Thomas Jan 31 '15 at 11:46
Just to support what @Eupraxis1981 stated. Here is the mean of a Cauchy distribution for $N=1e8$. Apperently Matlab does the computation very fast, I can't go to $1e9$ due to insufficient memory. – Thomas Jan 31 '15 at 11:57
@sonystarmap Very nice! Am I correct in saying that you used a single draw of $10^{8}$ Cauchy variates then calculated the sequence of sample averages? This is an impressive number of trials. Would you mind making a similar graph, but this time plotting the sequence of medians (i.e., take the same sequence of Cauchy draws you used to produce the average plot and calculate the successive medians, e.g. $m_25$ would be the median of the first 25 draws in your sequence of Cauchy, $m_26$ would use the first $26$). – Jan 31 '15 at 18:13
@user1891836 given that you've accepted Erik Wong's excellent answer, I am concerned that you are still concerned about this. The LLN only applies when the underlying process actually has an average value...the Cauchy doesn't, it's fluctuations are too extreme, so the "dilution" effect does not happen. Ill add a little more about this in my post, below the graph – Jan 31 '15 at 18:29
@Eupraxis1981 I'm not sure I understand what you mean. I once sampled $N$ random uniform numbers and used Inverse Sampling to create $N$ Cauchy random numbers. The plot is then $\frac{1}{n}\sum_{i=1}^n x_i$ for $i=1...N$. If I understand your question correct, you want the average replaced by the median? – Thomas Jan 31 '15 at 19:01
@sonystarmap Ok, you did what I thought you did. Nice :) Yes, simply use the median instead of the average: $M_n=\text {Med}{x_i|i\in[1...n]}$ – Jan 31 '15 at 19:26
Let us continue this discussion in chat. – Thomas Jan 31 '15 at 19:27
@user1891836 I've updated this post. – Jan 31 '15 at 19:34
@Eupraxis1981 LLN relies upon is the fact that no single outcome can consistently dominate the sample average I think we're going in circle around this. If you reread my original question edits, I question this precise assumption, because it naturally follows something prevents an outcome from dominating the others. If the universe was truly indifferent, you'd have huge swings across averages. One average would be 0.51, then next could be .33 or .85 regardless of the sample size. LLN assumes this isn't the case for coin tosses, because large samples tend towards the expected value. – vantage5353 Feb 01 '15 at 09:02
@user1891836 see the edits to my other post regarding physical basis for probability... It's the reason why LLN applies to certain physical processes – Feb 01 '15 at 13:44
@user1891836 Also, given then large number of great responses you've received from folks here at Math.SE, it appears that you are not really concerned with the LLN, per se, but actually a more basic concern: why does the universe have laws at all? Your hypothesis that indifference $\implies$ anything goes says as much. Also, as I said in my other post...its not nature that decides to follow the LLN, but mathematicians that decide that a given physical process satisfies the requirements for the LLN...its an important distinction. – Feb 01 '15 at 18:26
@Eupraxis1981, I have to clarify. I started a chat if you care for a discussion – vantage5353 Feb 01 '15 at 19:36

score 10 · Answer 4 · edited Apr 13 '17 at 12:42

Based on your remarks, I think you are actually asking

"Do we observe the physical world behaving in a mathematically predictable way?"

"Why should it do so?"

Leading to:

"Will it continue to do so?"

See for example Philosophy stack exchange question.

My take on the answer is that, "Yes", for some reason the physical universe seems to be a machine obeying fixed laws, and this is what allows science to use mathematics to predict behaviour.

So, if the coin is unbiased and the world behaves consistently then number of heads will vary in a predictable way.

But please note that it is not expected to converge to exactly half. In fact, the excess or deficit will go as $\sqrt N$, which actually increases with $N$. It is the proportion of the excess relative to the total number of trials $N$ which goes to zero.

However, no-one can ever prove in principle whether, for example, the universe actually has a God who decides how the coin will fall. I recall that in Peter Bernstein's book about Risk the story is told that the Romans (who did not know probability as a concept) had rules for knucklebone based games that effectively assumed this.

Finally, if you ask which state of affairs is "well supported by evidence", the evidence available would include at least all of science and the finance industry. That's enough for most of us.

I agree. I might have made a mistake posting the question here, but probability theory is not a descriptive theory, yet we have a hard time disproving it in the real world. It seems, when it comes to chaotic, practically unpredictable events, the mean of all trials tends to go to expected value, even though there is no way you can predict any one or series of trials. I.e. we can reasonably predict the mean of large samples, but not any small ones. That's puzzling. — vantage5353, Jan 30 '15 at 10:17

score 9 · Answer 5 · answered Jan 29 '15 at 16:09

One has to distinguish between the mathematical model of coin tossing and factual coin tossing in the real world.

The mathematical model has been set up in such a way that it behaves provably according to the rules of probability theory. These rules do not come out of thin air: They encode and describe in the most economical way what we observe when we toss real coins.

The deep problem is: Why do real coins behave the way they do? I'd say this is a question for physicists. An important point is symmetry. If there is a clear cut "probability" for heads, symmetry demands that it should be ${1\over2}$. Concerning independence: There are so many physical influences determining the outcome of the next toss that the face the coin showed when we picked it up from the table seems negligible. And on, and on. This is really a matter of philosophy of physics, and I'm sure there are dozens of books dealing with exactly this question.

Would you care to elaborate on why we observe such tendency towards even amounts of the outcomes of a fair coin toss in the long run? — vantage5353, Jan 29 '15 at 16:13
@user1891836 Because as suggested by others, even with its inherent imperfections, the flipping of a physical coin is (typically) a very good approximation for the idealized coin-flip that mathematics defines and shows to have equal probabilities in heads vs. tails. — Coffee_Table, Jan 29 '15 at 18:06
If you are interested in real coins, search youtube for Persi Diaconis talks about tossing of real coins ... He is the real exopert on that! — kjetil b halvorsen, May 29 '17 at 01:07

score 5 · Answer 6 · edited Jan 29 '15 at 23:38

One has to distinguish between the mathematical model of coin tossing and the human intuition of it.

It is worthwhile to consider the following experiment.

A teacher divides his class into two groups. Then he gives a coin to each member of the one group. Each member of this group will flip his coin, say, 100 times. Everybody will jot down the results. The members of the other group will not have coins. They will simulate the coin flipping experiment by writing down imaginary results. Then everybody puts a secret mark on his paper. Finally the papers get shuffled and the children hand over the stack to the teacher. Surprisingly the teacher will be able to tell, with quite high certainty, who flipped coins and who just imagined the experiments. How? The average length of the consecutive heads (or tails) in the real experiments is way longer than in the case of the imaginary ones.

This demonstration, among other interesting examples, illustrates that the instinctive human understanding of random phenomena is quite unreliable.

So not only is it true that probability theory has nothing to do with reality, but it does not have anything to do with human intuition either. However, falsifying the predictions of probability theory is tiresome many times. (Validation of the same is impossible, of course. Regarding this latter feature probability theory is not special.)

How would you define a "truly random" event and if it is objectively unpredictable, why should probability theory be able to predict the average of a 1,000 such events? — vantage5353, Jan 29 '15 at 18:21
You stated that humans are not very good at dealing with random phenomena, which necessitates to define what is really random. Second, even though probability theory is just a mathematical model, it is pervasively used to predict all sorts of events (from future car accidents to roulette spins) in the aggregate in order to make decisions. It's just weird the large samples can be predicted, but small can't. — vantage5353, Jan 30 '15 at 09:40
I agree. My axiomatic answer to the first claim is: If $A$ is the set of things that humans cannot handle well then $Pr\in A$. — zoli, Jan 30 '15 at 16:54

score 4 · Answer 7 · answered Jan 29 '15 at 22:28

It looks like most of the answers are addressing the apparent (but maybe not actual) misunderstanding behind your question. I will try to give a more direct mathematical explanation. I know you said to "avoid abstract math," so I will try to explain what I'm doing.

Suppose we have a random variable $X$. Basically, this is an abstraction of a random or unpredictable event. It has multiple possible values, each with a probability that it is the result. We calculate the expected value of $X$, or $E(X)$, by multiplying each possible result by its probability and adding them together. This is the same as the sample mean, $\mu$.

We can also determine how "spread out" the possible values are, by calculating the variance. The variance, $\sigma^2$, is the expected value of the square of the deviation from the mean, which is how far the random variable is from its expected value. That is, the deviation is $X-\mu$, and the variance is $\sigma^2=E\left((X-\mu)^2\right)$. We also have standard deviation $\sigma$, which is the square root of the variance.

Intuitively, we can say that "most" of the time, the result of a random test will be "close" to the expected value. If we know the random variable's variance, we can define "close" in terms of the variance or standard deviation and make this a mathematical statement. In particular, $$P(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}$$ This is Chebyshev's Inequality, and it says that the probability that a random variable is $k$ or more standard deviations from the mean is less than or equal to $1/k^2$. While this exact result might not be obvious, but the idea should be clear: If there were more likely outcomes farther away, then the variance would be higher. From this, we can prove the (weak) Law of Large Numbers.

Let us take $n$ independent random variables $X_1,X_2,\ldots,X_n$ with the same distribution, with finite mean $\mu$ and finite variance $\sigma^2$, and define their average as $\overline{X}_n=\frac{1}{n}(X_1+\ldots+X_n)$. Then $E(\overline{X}_n)=\mu$, and $Var(\overline{X}_n)=Var(\frac1n(X_1+\ldots+X_n))=\frac{\sigma^2}{n}$

Obviously, for any positive real number $\epsilon$, $|\overline{X}_n-\mu|$ is either greater than, less than, or equal to $\epsilon$. There are no other possibilities, so \begin{equation} P(|\overline{X}_n-\mu|<\epsilon)+P(|\overline{X}_n-\mu|\ge\epsilon)=1\\ P(|\overline{X}_n-\mu|<\epsilon)=1-P(|\overline{X}_n-\mu|\ge\epsilon) \end{equation} Then, applying Chebyshev's Inequality (substituting $k=\frac{\epsilon}{\sigma}$) gives $$P(|\overline{X}_n-\mu|<\epsilon)\ge1-\frac{\sigma^2}{n\epsilon^2}$$ So as we take more trials, that is, as $n\to\infty$, this lower bound approaches $1$. And since probabilities cannot be greater than $1$, we have $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=1$$ Or, equivalently, $$\lim_{n\to\infty}P(|\overline{X}_n-\mu|<\epsilon)=0$$ This is the Weak Law of Large Numbers.

It is important for me to point out, for general understanding, that having a probability of $0$ is not quite the same thing as being literally impossible. What it means is that almost all tests (in a mathematical sense) will fail. In this case, there are an uncountably infinite number of infinite sets of random variables, but there are only countably many sets whose average differs from the expected value.

+1. I think it you point out a very important aspect, namely that probability of 0 does not mean an event can't happen. — Thomas, Jan 31 '15 at 12:03
@KSmarts Minor quibble: measure $0$ isn't the same as countable. In this case here the set of outcomes whose average deviates from the expectation is still uncountably infinite. — Erick Wong, Jan 31 '15 at 16:35

score 3 · Answer 8 · edited Jan 29 '15 at 23:38

The physical assumptions are that in each trial of tossing the coin, the coin is identical, and the laws of physics are identical, and the coin in no way "remembers" what it did before. With those assumptions, you can then say that there is some number between $0$ and $1$ that represents the probability of any given toss coming up heads.

Warning: that probability need not be $\frac{1}{2}$. In fact, a standard US penny will land on tails about 51% of the time.

Once you have that number, which we could call $p$, then it is meaningful to talk about the expected value of the number of heads arising in $1$ toss, which is that same $p$, and the expected value of the number of heads arising in $N$ tosses (the average result of $N$ trials) which is also $p$ because the tosses are completely independent.

Then the practical effect of LLN is to know that the likelihood of the average number of heads in an actual set of $N$ trials being "far" from $p$ becomes vanishingly small, provided that by "far" you mean more than a few times $\sqrt{1/N}$. And since for very large $N$, $\sqrt{1/N}$ becomes very small, we can say that with probability almost 1 the average of $N$ trials will lie in a small range about its in-principle value of $p$.

But why would the average of N tend toward p? Why would the universe obey? :) — vantage5353, Jan 29 '15 at 17:03
Because errors get washed out. This is fundamental. The definition of probability is not abstract nor arbitrary. It is chosen so that the law of large numbers holds. — Joshua, Jan 30 '15 at 00:18
I presume you are referring to the Diaconis, Holmes & Montgomery result. Even if you follow their line of argument (which is amusing) you need to be careful about their conclusion which is that 'Any coin that is tossed vigorously and high, and caught in midair has about a 51 % chance of landing with the same face up that it started with.' It does not conclude that tails are more probable, in fact their analysis doesn't distinguish either side (other than labeling). — copper.hat, Jan 31 '15 at 06:44

Reinstate Monica · Answer 9 · 2018-04-22T20:15:10.207

I think it is very helpful to redefine the Law of Large Numbers:

Wikipedia gives it as follows:

According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

However it's important to note that the law isn't necessarily describing a physical law, rather a mathematical law. It would be better stated as:

As more trials are performed, the probability that it will deviate from the mean will get smaller and smaller.

In other words within the mathematical framework of probabilities, a larger set has a larger probability of being closer to the mathematical probabilistic mean.

What seems to be bothering the OP is why any probabilities have a bearing on the physical world. As commented above, frequentist probability is just a description of possible outcomes and their ratios - it never explains why or what physical law keeps the world in sync with such a law. The OP's question is more a physics/philosophy question, (one that has bothered me for ages). It reminds me of the Is-ought problem.

As an example, given an infinite number of universes, there will be one universe where all random events follow the most unlikely probability. The poor fellow living in such a universe would be best off always taking the worst odds. Why should we assume that we are in a universe that happens to be one which will follow the most probable outcome. (Of course one will argue that according to probabilities, we should find ourselves in the universe that is closer to the mean. I just mean this as an example to bring out the problem of there being no physical necessity for the Law of Large Numbers to be true in real life.)

This is not the same as "why do physics stay the same" - even if we accept that the speed of light is constant, and that mass creates gravity, it's a much bigger stretch to say that there is a general physical law that minds probabilities in the real world, always keeping them in sync with mathematics. The difference is that the others laws apply in any given physical situation - mass will always create gravity, etc. Whereas probability by definition allows for variation - just claiming that the mean will eventually add up. (As I argued before, it really doesn't even claim this.) (From studying quantum physics and uncertainty, it really does seem as if the universe corrects itself over large samples of purely random events to match the mean).

Edit: I've found that the problem described - the empirical/logical meaning of probabilities - has already been addressed by David Hume in An Enquiry Concerning Human Understanding, Section VI: of Probability, and at length by Henri Poincaré in Science and Hypothesis. (An additional resource, though in Hebrew, is Sha'arei Yosher 3.2.3)

This is the weak law. The Wikipedia describes the strong law. — copper.hat, Jan 30 '15 at 08:39
@afuna according to probabilities, we should find ourselves in the universe that is closer to the mean. I just mean this as an example to bring out the problem of there being no physical necessity for the Law of Large Numbers to be true in real life. This is in tune with Erick Wong's answer. If I understand correctly, it takes more effort to make things happen non-probabilistically, but since most events follow the path of least resistance, probability predictions win in the long run--as they are more easily achieved (due to combinatorics). — vantage5353, Jan 30 '15 at 10:24
@copper.hat. I'm not sure why you say that. The article explains the law in general (and also has a subsection about the strong law - which applies over an infinite sample) — Reinstate Monica, Jan 30 '15 at 12:47
@user1891836: I personally don't understand why one should make any (probabilistic) assumptions about our universe - Erick Wong's answer hasn't convinced me about why the universe does what it does. If the universe is determinate, it is what it is and all the mathematical propositioning won't convince it otherwise. If it is truly random, there's no reason why it can't often turn out to favor the lesser probability - to argue that it probably will follow probability, is circular reasoning. — Reinstate Monica, Jan 30 '15 at 12:52
@afuna: You wrote that 'It would be better stated as...', I was just pointing out that this is a weaker statement that the statement than the Wiki statement that preceded it. — copper.hat, Jan 30 '15 at 16:39
@afune: "As an example, given an infinite number of universes, there will be one universe where all random events follow the most unlikely probability." - This is not necessarily true, as an infinite set does not necessarily equal to a set with everything. For example, the serie 1,3,5... is infinite but doesn't contain any negative or even numbers. — SylvainL, Jan 31 '15 at 18:29

hobbs · Answer 10 · 2015-01-31T17:46:19.597

Suppose you've tossed a fair coin ten times, and it has been heads nine times out of ten, for an observed $\frac{\mathrm{heads}}{\mathrm{flips}} = 0.9$. There is a 50% chance that the next toss will be heads, making 10/11 heads, and a 50% chance that the next toss will be tails, making 9/11 heads. The expected fraction of heads after the next toss is then $0.5 \frac{10}{11} + 0.5 \frac{9}{11} = \frac{19}{22} \approx 0.864$, which is closer to 0.5 than 0.9 is.

It's pure math. Given a fair coin with no memory, if the fraction of heads up until now is 0.5, then the expected number of heads after one more toss will remain 0.5. Otherwise, the expected number of heads after one more toss will become closer to 0.5. It doesn't take any physical effect, just the fact that every flip increases the denominator of your fraction, but only half of the flips will reinforce any "excess" number of heads or tails.

score 2 · Answer 11 · answered Jan 30 '15 at 22:57

2

There are plenty of correct answers here. Let me see if I can make the correct answer dead-simple.

The Gamblers Fallacy is the belief that a past trend in random events will tend to be balanced by an opposite trend in future random events. "If the last 10 coin flips have been heads, the next coin flip is more likely to be tails."

The Law of Large numbers is the observation that regardless of the nature or pattern of the variation, as your sample size gets larger, the significance of the variation (whether positive or negative) gets smaller. "If the last 10 coin flips have all been heads, that has a significant impact on the average of a sample of 50, but an insignificant impact on the average of a sample of 50,000"

answered Jan 30 '15 at 22:57

John Ross

21

Sure, this is the statistical explanation, but in order for the LLN to work over large samples, it requires that some objective law keeps randomness random. If you had a 70/30 distribution, you'd suspect the coin isn't fair, yet if the universe is truly indifferent towards any one or series of outcomes, there's nothing unlikely about this result. Still, when we speak of random phenomena like tosses, statisticians expect them to be close to 50/50 in the long run--in line with probability theory. This necessitates that sth keeps the total average in check, however. Else, you'd suspect a bias. – vantage5353 Jan 31 '15 at 11:18
@vantage5353 ,did you find any satisfactory answer since then, I'm having the same question? – Kashmiri Jan 06 '22 at 06:45

score 2 · Answer 12 · answered Jan 31 '15 at 14:26

It seems to me that the core of your question has nothing to do with the Law of Large Numbers and everything to do with why the physical universe behaves in the ways that mathematics predicts.

You might as well ask this: Whenever I have two of something in my left hand and three of something in my right hand, I find that I have five of that something altogether. I understand that mathematics predicts this, but why should the Universe obey?

Or: Mathematics tells me that for any numbers x and y, if I have x piles of stones with y stones in each pile, and you have y piles of stones with x in each pile, then we'll each have the same number of stones. What's the empirical evidence for this law? Why should we expect the Universe to behave this way just because mathematics says it should?

I don't know what answers to these questions you'd consider satisfactory, but I think you'll gain some insight if you concentrate on these much simpler questions, where the fundamental issues are exactly the same as in the question you're asking.

Ray Henry · Answer 13 · 2015-01-29T19:45:28.040

1

There is no physical law in play here, just probablities.

Assume that either result (heads or tails) is equally likely. For any number of flips in a trial, N, it is easy to compute the probability of getting H heads.

For N = 2, H(0) = 0.25, H(1) = 0.5, H(2) = 0.25 (Four possible outcomes, two of which are HT and TH)

For N = 6, H(0) = 0.016, H(1) = 0.094, H(2) = 0.234, H(3) = 0.313, H(4) = 0.234, H(5) = 0.094, H(6) = 0.016 (64 possible outcomes, 50 of which are 2H4T, 3H3T, 4H2T).

Notice that for 6 flips, that chance you will see 2, 3, or 4 heads is 78%. As N gets bigger, the probabilities of getting a number of heads in the vicinity of the halfway mark is very great, and the likelihood of seeing very many or very few heads will be very small.

There is no force pushing to the mean, it's just the probability that you're seeing one of the very unlikely outcomes is very very small. But again then you might see it someday.

Note that this is just a restatement of Erick Wong's answer.

Imagine that there are 2^N tables in a vast room, each with N coins laid out on the table in a unique combination. Each table has a chair and you are dropped from the ceiling into the room and land in a chair at one of the tables. That is the "trial" you just ran. Chances are that that table will have approximately N/2 heads. Remember that out of 2^N tables (e.g. for 1000 coins, there will be over 10^301 tables), there is only one with no heads.

edited Jan 29 '15 at 19:45

answered Jan 29 '15 at 19:38

Ray Henry

111

I appreciate the mathematical model. It's just hard for me to reconcile it with the laws of the physical world. After all, we use probability to make practical decisions, yet why should a 50% mathematical probability of tails correspond to a mean of 50% tails over 1,000,000 real-world trials. I see the reasoning behind the math, but not in the real world. Erick Wong hinted at it becoming physically harder to achieve a particular random result as options decrease, but I am still trying to wrap my head around this. – vantage5353 Jan 29 '15 at 20:24
It's very unlikely you will get exactly 500,000 tails in 1,000,000 trials. But all the numbers around 500,000 are much more likely in aggregate than the extreme values. – Ray Henry Jan 29 '15 at 20:34
1

Remember we only know what we see. We've never seen a million trials where all were heads, but it doesn't mean it can't happen. Your question about the laws of the physical world reminds me of those that ask why nature created a universe that is hospitable for humans, when the fact that we are here and this is the only universe we can observe turns the question on its head. – Ray Henry Jan 29 '15 at 20:42
@user1891836 (3 comments up) in the frequentist interpretation, that is more or less the definition of probability. – David Z Jan 30 '15 at 01:45

score 1 · Answer 14 · answered Jan 30 '15 at 08:34

Consider coin tosses. The strong law of large numbers says that if the coin tosses are independent and identically distributed (iid.), then for almost any experiment, the averages converge to the probability of a head.

The degree to which the result is applicable in the 'real' world depends on the degree to which the assumptions are valid.

Both independence and identically distributed are impossible to verify for real systems, the best we can do is to convince ourselves empirically by many observations, symmetry in the underlying physics, etc. (As a slightly related aside, sometimes serious mistakes are made, for example, read the LTCM story.)

The iid. assumption ensures that no experiment is favoured. For example, in a sequence of $n$ coin tosses, there are $2^n$ experiments and each is 'equi-probable'. It is not hard to convince yourself that for large $n$ the percentage of experiments whose average is far from the mean becomes very small. There is no magic here.

I think a combination of the central limit theorem and the observed prevalence of normal distributions in the 'real' world provides stronger empirical 'evidence' that the iid. assumption is often a reasonable one.

It is not hard to convince yourself that for large n the percentage of experiments whose average is far from the mean becomes very small. There is no magic here. Sure, but why does the universe tend to fall in line with LLN? What is physically the reason for the total average to get closer to expected value? — vantage5353, Jan 30 '15 at 09:52
When you make any binary valued observation, if the underlying process is iid. (or a reasonable approximation thereof) then the law of large numbers & central limit theorem apply. So, your question is, why do many measurements of aspects of the universe seem to iid? I don't know the answer, but would suppose that independence arises out of a lack of apparent 'communication' (for example, the coin has no state which it carries from one toss to the next) and identical arises from symmetry (why would a head be preferred over a tail?), or similar dynamics. — copper.hat, Jan 30 '15 at 16:36

score 1 · Answer 15 · answered Jan 30 '15 at 11:23

Please also consider this : Most human games are flawed. The head or tails depends on the coin and the way it is thrown. One man throwing the same coin will probably have something far from 50-50, be it because he's a cheater or just put always the same force on the same side, making the coin flip the same number of times in the air.

But let's say now that you are considering different people with different hands, then you'll very likely to hit near 50-50 quite quick.

When playing the lottery, some people think they should play numbers that don't come up as often as others, as the LLN will "have" to make them appear more often now to compensate. This is twice wrong.

As one already said, the law should not be understood as a magic hand that compensates for the first inequities. It just keeps a 50% chance on every try, and the first mistakes will just "dilute" into the number. There is no statistical reason to look at the previous throws, they don't impact the future ones.
The practical case is even worse : since the coin (or the lottery balls) is not perfect, this imperfection will likely play the same role every time, making the same result more probable. So the truth in lottery is to play precisely the numbers that already won !

Of course, knowing that, the lottery guys are changing balls now and then...

score 1 · Answer 16 · answered Jan 30 '15 at 18:30

Perhaps, a better way to understand the concept is to compute the probability of many trials coming out balanced. For example, if we flip a coin 10 times then the probability that the number of heads/tails will be within 10% of each other is only 24.6%. However, as we flip the coins more times the probability that the number of heads/tails will be close to each other (within 10%) increases:

100 trials: 38.3%

1000 trials: 80.5%

10,000 trials: 99.99%

Thus, there is no need to stipulate a "law", we can simply compute the probability of balance occurring and see that it increases as we do more trials. Note that there is always a chance of imbalance occurring. For example, after 10,000 coin flips there is a 0.007% chance that the number of heads will not be within 10% of the count of tails.

Gogi Pantsulaia · Answer 17 · 2015-02-05T05:10:58.060

Strong Mathematical explanation.

First I present another experiment which, in my sense, will be of your interest.

Let $x_1,x_2, \cdots$ be an infinite sample obtained by observation on independent and normally distributed real-valued random variables with parameters $(\theta,1)$, where $\theta$ is an unknown mean and the variance is equal to $1$. Using this infinite sample we want to estimate an unknown mean. If we denote by $\mu_{\theta}$ a linear Gaussian measure on ${\bf R}$ with the probability density $\frac{1}{\sqrt{2\pi}}e^{-\frac{(x-\theta)^2}{2}}$, then the triplet $$({\bf R}^N,\mathcal{B}({\bf R}^N),\mu_{\theta}^N)_{\theta \in R}$$ will be a statistical structure described our experiment, where ${\bf R}^N$ is a Polish topological vector space of all infinite samples equipped with Tychonoff metric and $\mathcal{B}({\bf R}^N)$ is the $\sigma$-algebra of Borel subsets of ${\bf R}^N$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\}=1 $$ for each $\theta \in {\bf R}$, where $\mu_{\theta}^N=\mu_\theta \times \mu_\theta \times \cdots$.

We must wait that by our infinite sample $(x_k)_{k \in N}$ and by the consistent estimator $\overline{X}_n= \frac{\sum_{k=1}^nx_k}{n}$ when $n$ tends to $\infty$, we get a "good" estimation of the unknown parameter $\theta$. But let look to the set $$ S=\{ (x_k)_{k \in N}: (x_k)_{k \in N}\in {\bf R}^N~\&~\mbox{exists a finite limit} \lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}\}. $$ It is a proper vector subspace of $R^N$ and hence is "small"(more precisely, is Haar null set in the sense of Christensen(1973)). This means that our 'good" statistic is not defined on the complement of $S$ which is a "big" set(more precisely, is prevalent in the sense of Christensen(1973)).

This means that for "almost every"(in the sense of Christensen) our "good statistic"-sample average $\overline{X}_n$ has no limit.

Now let $x_1,x_2, \cdots$ be an infinite sample obtained by coin tosses. Then the statistical structure described this experiment has the form: $$ \{(\{0,1\}^N,B(\{0,1\}^N),\mu_{\theta}^N): \theta \in (0,1)\} $$ where $\mu_{\theta}(\{1\})=\theta$ and $\mu_{\theta}(\{0\})=1-\theta$. By virtue of the Strong Law of Large Numbers we have $$ \mu_{\theta}^N(\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\})=1 $$ for each $\theta \in (0,1)$. Note that $G:=\{0,1\}^N$ can be considered as a compact group. Since the measure $\mu_{0,5}^N$ coincides with the probability Haar measure $\lambda$ on the group $G$, we deduce that the set $A(0,5)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=0,5\}$ is prevalence. Since each $A(\theta) \subset G \setminus A(0,5)$ for $\theta \in (0;1)\setminus \{1/2\}$, where $$A(\theta)=\{(x_k)_{k \in N}: (x_k)_{k \in N}\in \{0,1\}^N~\&~\lim_{n \to \infty}\frac{\sum_{k=1}^nx_k}{n}=\theta\},$$ we deduce that they all are Haar null sets.

My answer to the Question: "Why is the universe NOT indifferent towards big samples of coin tosses? What is the objective reason for this phenomenon?" is the following: The set of infinite samples $ (x_k)_{k \in N}\in G:=\{0,1\}^N$ for which exist limit of sample average $\overline{X}_n$ when $n$ tends to $\infty$ and is equal to $0,5$ is a prevalent in the sense of Christensen(1973), equivalently, has full Haar $\lambda$-measure. Hence, the Strong Law of Large Numbers is not empirically proven.

Is the Law of Large Numbers empirically proven?

Does this reflect the real world and what is the empirical evidence behind this?

17 Answers17

Linked