For coin tosses the probability of head of tails is same and the coin flips are said to be independent. So, the probability of HHHH or TTTT or any combination of H and T should be same i.e. $2^{-n}$ But running a Monte Carlo Simulation shows that the events like all heads or all tails almost never happens for a sufficiently large $n$.
Explanations I have heard so far
I have heard the dilution effect of law of large numbers. But I would say thats just recursive explanation. Why are we assuming that the next batch of events would have a distribution closer to the natural distribution?
Another explanation I found, if we do not consider the position of head or tail, there are more ways to get a ~50% distribution. e.g. For two coin flips both $HT$ and $TH$ lead to natural distribution so, the probability of getting the natural distribution is 50%. While the probability of $HH$ or $TT$ is 25% each. But with this knowledge we can make an inference that if we have observed more $H$ than $T$ so far, then it is more likely that we are on a sequence which starts out with more $H$ and later adjusts with $T$s.
Can we say that the KL divergence between observed distribution and expected distribution approaches zero as the number of trials tends to infinity? If so, what is this invisible hand? Second law of thermodynamics? Is there a way to measure its push?
Other questions I have read so far
- Why don't previous events affect the probability of (say) a coin showing tails?
- Law of large numbers - almost sure convergence
- Is the Law of Large Numbers empirically proven?
- The Law of Large Numbers and the Probability of Bizarre Outcomes
- Gambler's fallacy and the Law of large numbers
- Betting: Gambler's Fallacy vs. Law of Large Numbers
- Bernoulli Trials: Law of Large Numbers vs Gambler's Fallacy, the N paradox
- Law of large numbers - almost sure convergence