-1

For coin tosses the probability of head of tails is same and the coin flips are said to be independent. So, the probability of HHHH or TTTT or any combination of H and T should be same i.e. $2^{-n}$ But running a Monte Carlo Simulation shows that the events like all heads or all tails almost never happens for a sufficiently large $n$.

Explanations I have heard so far

  1. I have heard the dilution effect of law of large numbers. But I would say thats just recursive explanation. Why are we assuming that the next batch of events would have a distribution closer to the natural distribution?

  2. Another explanation I found, if we do not consider the position of head or tail, there are more ways to get a ~50% distribution. e.g. For two coin flips both $HT$ and $TH$ lead to natural distribution so, the probability of getting the natural distribution is 50%. While the probability of $HH$ or $TT$ is 25% each. But with this knowledge we can make an inference that if we have observed more $H$ than $T$ so far, then it is more likely that we are on a sequence which starts out with more $H$ and later adjusts with $T$s.

Can we say that the KL divergence between observed distribution and expected distribution approaches zero as the number of trials tends to infinity? If so, what is this invisible hand? Second law of thermodynamics? Is there a way to measure its push?

Other questions I have read so far

  • Is this really a mathematics question? If it were, I'd expect precise statements of what you mean by "law of large numbers" and "second law of thermodynamics". – kimchi lover Jun 01 '19 at 15:39
  • "Explanations I have heard so far": What is it that needs to be explained here? If $n$ is large then $2^{-n}$ is small -- you'd expect an event with small probability to happen very rarely... – David C. Ullrich Jun 01 '19 at 15:40
  • I feel like what would clarify this is some relationship between E[(sample mean - 1/2)^2] and the entropy of the distribution of the sequence of coinflips. I suspect that if the expected distance from 1/2 is large, that puts a limit on how high the entropy can be. Which means for the entropy to surpass that, the same mean has to become concentrated near 1/2. I don't know such a relationship off the top of my head, though. – user54038 Jun 02 '19 at 19:26

2 Answers2

2

For the (fair) coin experiment with $n$ independent tosses it holds: $$ \tag{1} \mathbb{P}(\text{a specific sequence is observed}) = 2^{-n}, $$ which tends to zero for large $n$. Note that this holds for any sequence (and not only for "HHH..H" and "TTT...T"). That is, any sequence, irrespective of the order (and number) of heads and tails, is extremely unlikely to be observed in the large $n$ limit. This fact has nothing to do with the law of large numbers.

The law of large numbers suggests the following (heuristically stated):

$$ \mathbb{P}(\text{a sequence is observed with number of heads} \approx n/2)\approx 1. $$ Note that the event considered here does not concern a sequence with a specific order of heads and tails (as was the case for the event considered in (1)) and does not contradict (1) as you seem to suggest in your question. The law of large numbers only states that out of all the (highly unlikely) sequences, the one that will actually be observed will have (with high probability) approximately $n/2$ heads. However, this "insight" cannot help you increase your chance of guessing the sequence of an experiment, since there is an extremely large number of sequences that fall in the category "a sequence with number of heads approximately equal to $n/2$" (in this case, roughly, $2^{n}$ such sequences exist).

With the same arguments, you can see that if the coin is biased (say, the probability of heads is $p>1/2$), the most probable sequence is the "all heads" sequence ($n$ heads), whereas the law of large numbers indicates that the observed sequence will most likely have approximately $np<n$ heads. Again, there is no contradiction here, since the probability of observing any specific sequence (even the most likely one) is extremely small in the large $n$ limit. Chances are that a sequence that is not the most likely one will be observed and the law of large number suggests that it will have approximately $np$ heads.

Stelios
  • 3,077
  • I am not trying to guess the exact sequence, but the distribution of the remaining portion of the sequence. If this distribution has more tails, I can infer that tails are more likely than heads in the next toss. – Souradeep Nanda Jun 02 '19 at 01:45
1

If X and X are independent and identically distributed, then (X+Y)/sqrt(2) has entropy at least as high, but the same mean and variance. Keep repeating this--essentially, keep doubling the length of the sequence, and calculating the sum over the square root of the length. The entropy will keep going up, and the mean and variance will stay the same.

The normal distribution is the maximum entropy distribution with a given mean and variance.

The central limit theorem, then, says: "the entropy will increase all the way up to its maximum." That is, it won't reach a horizontal asymptote below the maximum.

However, all the second law of thermodynamics says is that entropy goes up. It doesn't tell you where the horizontal asymptote is. So, to me, the central limit theorem is a stronger statement than just saying that entropy increases.

Incidentally, in the paper Solution of Shannon's problem on the monotonicity of entropy, the authors write "The main point of the result is that one can now see clearly that convergence in the central limit theorem is driven by an analogue of the second law of thermodynamics." So this is not a crazy way of thinking about it. (The authors' result is that entropy increases not just when you double the length of the sequence, but whenever you extend it.)

EDIT: I just realized how badly I misread the question, and that you're asking about the law of large numbers, not the central limit theorem.

user54038
  • 367
  • 1
  • 9
  • 1
    I think the second law of thermodynamics within the setting of statistical thermodynamics (as opposed to classical thermodynamics) makes the stronger statement that the equilibrium distribution in the thermodynamic limit maximizes the entropy under the constraints in question. This is how we come to things like the Boltzmann distribution for example. – Ian Jun 01 '19 at 18:04