This question relates to the YouTube video "How Lucky is Too Lucky?" by Matt Parker. In it, he poses the following question. Parker has published a list of $100$ tosses of a fair coin. A "malicious actor" wants to claim that in fact, Parker made the tosses up. He searches through the published tosses, and finds $12$ consecutive tosses comprising $2$ tails and $10$ heads. The probability of $2$ or fewer tails in $12$ tosses of a fair coin is less than $2\%$, so the actor claims that the tosses cannot possibly be legitimate.
Now Parker addresses the probability that such an anomalous string occurs. There are $5050=\binom{100}2+100$ sub-runs of length $1$ to $100$. What is the probability that at least one of them is unlikely? If we have $n$ tosses comprising $t$ heads and $h$ tails, then the run is unlikely if the probability of getting $t$ or fewer tails is $\leq.019$ or the probability of getting $h$ or fewer heads is $\leq.019$.
I should stress that the malicious actor does not choose $n$ in advance. He looks for an unlikely string of any length.
At about $16:25$ in the video, Parker says that the exact probability is $88.3\%$, and gives no indication how this number is arrived at. The problem of course, is that substrings overlap, so we don't have independent events.
Of course, he says throughout that he won't go into the technical details of the math, but I haven't figured out how this number was arrived at. It's easy to confirm by simulation, but I don't think Parker would have used the phrase "exact value" without a theoretical calculation to back it up.
$2^{100}$ is on the order of $10^{29}$ so generating and counting the admissible runs is infeasible. I've thought about trying to write a recurrence relation, but it seems hopeless, because there are too many possibilities. Usually, in this sort of problem, if $a_n$ is the number of admissible strings of length $n$, we have to break the recurrence up depending on the last characters of an admissible string of length $n-1$, $n-2$, and so on. There seem to be too many possibilities in this case.
An approximate calculation with a small error bound would be fine. Can you point me in the right direction?
Just in case I haven't described the problem comprehensibly, I append my simulation script:
from math import factorial
from random import choices
def choose(n,m):
return factorial(n)//(factorial(m)*factorial(n-m))
epsilon = .019
critical = { }
for n in range(6,101):
prob = 0
mu = 2*(-n)
for m in range(n+1):
prob += choose(n,m)mu
if prob > epsilon:
critical[n] = m-1
break
def test(trials):
success = 0
for _ in range(trials):
flips = choices(range(2), k=100)
success += anomalous(flips)
return success/trials
def anomalous(flips):
for m in range(6, 101):
for s in range(101-m):
run = flips[s:s+m]
tails = run.count(0)
if min(tails, m-tails) <= critical[m]:
return True
return False