7

I am trying to compute the probability of having 4 (or more) consecutive heads in 10 coin tosses.

I tried using recursion but it led to a complicated expression so i think i did not quite manage.

I saw similar questions asked here that were solved with difficult approaches, but this problem looks like it could be solved in a couple of lines so I must be doing something wrong.

If anyone could help me understand it or propose a different approach for solving the problem i would be very grateful.

Have a nice day!

Ruben
  • 513

2 Answers2

9

In general, this question has been answered here several times, eg here.

For these particular small numbers, you might group the total count by the position of the first successful run

So $N = N_1 + N_2 + \cdots N_7$

$N_1 = 2^6$ [HHHH******]

$N_2 = 2^5$ [THHHH*****]

$N_3 = 2^5$ [*THHHH****]

$N_4 = 2^5$ [**THHHH***]

$N_5 = 2^5$ [***THHHH**]

$N_6 = 2^5 -2$ [****THHHH*] minus [HHHHTHHHH*]

$N_7 = 2^5 -3$ [*****THHHH] minus ([HHHHTTHHHH] , [THHHHTHHHH] , [HHHHHTHHHH])

Which gives $N=64 + 6 \times 32 - 5 = 251$

So the probability is $N/2^{10} = 251/1024$

leonbloy
  • 63,430
  • hey @leonbloy Are we accounting for the fact that HHHHH has 2 possible 4-head sequences here? because we are supposed to do that. Anyway, why are you subtracting only in N6 and N7. Is it because otherwise we will be overcounting the combination obtained in N1 and N2? – Mining Sep 23 '21 at 13:54
6

Here we are looking for the number $f_N$ of binary strings of length $N$ which do not contain the substring $HHHH$. The probability $p$ is then $$p=1-\frac{f_{10}}{2^{10}}$$

The so-called Goulden-Jackson Cluster Method is a convenient technique to derive a generating function for problems of this kind.

We consider words of length $N\geq 0$ built from an alphabet $$\mathcal{V}=\{H,T\}$$ and the set $\mathcal{B}=\{HHHH\}$ of bad words which are not allowed to be part of the words we are looking for.

We derive a function $F(x)$ with the coefficient of $x^N$ being the number of wanted words of length $N$. According to the paper (p.7) the generating function $F(x)$ is \begin{align*} F(x)=\frac{1}{1-dx-\text{weight}(\mathcal{C})} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with the weight-numerator $\mathcal{C}$ with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[HHHH]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[HHHH])&=-x^4-\text{weight}(\mathcal{C}[HHHH])\left(x+x^2+x^3\right) \end{align*}

It follows: A generating function $F(x)$ for the number of words built from $\{H,T\}$ which do not contain the subword $HHHH$ is \begin{align*} F(x)&=\frac{1}{1-dx-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2x+\frac{x^4}{1+x+x^2+x^3}}\\ &=\frac{1+x+x^2+x^3}{1-x-x^2-x^3-x^4} \end{align*}

Since the generating function counting the number $2^N$ of all binary strings of length $N$ is \begin{align*} \frac{1}{1-2x}=1+2x+4x^2+\cdots \end{align*}

A generating function for the number binary strings of length $N$ which contains the string $HHHH$ is

\begin{align*} \frac{1}{1-2x}-F(x)&=\frac{1}{1-2x}-\frac{1+x+x^2+x^3}{1-x-x^2-x^3-x^4}\\ &=\frac{x^4}{(1-2x)(1-x-x^2-x^3-x^4)}\\ &=x^4+3x^5+8x^6+\color{green}{20}x^7+48x^8+111x^9+\color{blue}{251}x^{10}\\ &\qquad 558x^{11}+1224x^{12}+2656x^{13}+5713x^{14}+12199x^{15}+\cdots\tag{1} \end{align*}

The last line was calculated with the help of Wolfram Alpha and we see there are $\color{blue}{251}$ strings of length $10$ which contain the subword $HHHH$.

For example the $20$ strings of length $7$ containing the substring $HHHH$ are

\begin{array}{lllll} \color{green}{HHHH}HHH\quad&\quad \color{green}{HHHH}HHT\quad&\quad \color{green}{HHHH}HTH\quad&\quad \color{green}{HHHH}HTT\\ \color{green}{HHHH}THH\quad&\quad \color{green}{HHHH}THT\quad&\quad \color{green}{HHHH}TTH\quad&\quad \color{green}{HHHH}TTT\\ HHT\color{green}{HHHH}\quad&\quad HT\color{green}{HHHH}H\quad&\quad HT\color{green}{HHHH}T\quad&\quad HTT\color{green}{HHHH}\\ T\color{green}{HHHH}H\quad&\quad T\color{green}{HHHH}HT\quad&\quad T\color{green}{HHHH}TH\quad&\quad T\color{green}{HHHH}TT\\ THT\color{green}{HHHH}\quad&\quad TTH\color{green}{HHHH}\quad&\quad TT\color{green}{HHHH}T\quad&\quad TTT\color{green}{HHHH}\\ \end{array}

We finally conclude from (1): The probability of $4$ heads in $10$ coin tosses is $$\frac{251}{2^{10}}\doteq 0.2451$$

Markus Scheuer
  • 108,315