probability question related to pattern in coin tossing

Question

If I toss a fair coin $n$ times, calculate the probability that no pattern HHTHTHH occurs.

leonbloy · Accepted Answer · 2011-01-20T00:03:59.620

4

I ended doing something very similar to Moron.

Define a Markov chain with states i=0..7 as follows. Given a full sequence of n coins, if the pattern has not yet appeared, find the maximum length of the last tosses (sequence suffix) that coincide with the begginning of the expected pattern, so that

   i =  length of matching suffix , if pattern has not appeared
   i = 7 otherwise

For example: HTTHTTTTHT : i=0 , HTTTTHHT i=3

It's easy to write down the transition probabilities. (i=7 would be a absorbent state), and the probability of being in state 7 in step n.

So your anwser would be $ P(n) = 1- p_7[n]$ where $p[n] = M^n p[0]$ where $M$ is the 8x8 transition matrix and $p[0] = (1,0,0,0,0,0,0,0)$ is the initial state. I doubt you'll get a simpler closed form answer.

One could get a very coarse approximate solution by assumming that the patterns starting at each position are independent (obviously a false assumpion), and get $ P(n) \approx (1-2^{-7})^{n-6}$ ($n \ge 7$)

UPDATE: some Octave/Matlab code to test the approximation (it seems to work better than I'd expected)

N=200;
% approximation
pa = (1-1/128).^((1:N)-6);
pa(pa>1)=1;
% transition matrix:
M = [ 1,1,0,0,0,0,0,0 ;
      1,0,1,0,0,0,0,0 ;
      0,0,1,1,0,0,0,0 ;
      1,0,0,0,1,0,0,0 ;
      0,0,1,0,0,1,0,0 ;
      1,0,0,0,0,0,1,0 ;
      1,0,0,0,0,0,0,1 ;
      0,0,0,0,0,0,0,2 ]/2;
p=[1,0,0,0,0,0,0,0];
p7=zeros(1,N);
p7(1)=1;
for n = 1:N
 p = p * M;
 p7(n) = 1 - p(8);
endfor
plot(7:N,p7(7:N),7:N,pa(7:N));

>>> pa(5:9)
ans =

   1.00000   1.00000   0.99219   0.98444   0.97675

>>> p7(5:9)
ans =

   1.00000   1.00000   0.99219   0.98438   0.97656

>>> pa(N-5:N)
ans =

   0.22710   0.22533   0.22357   0.22182   0.22009   0.21837

>>> p7(N-5:N)
ans =
   0.22717   0.22540   0.22364   0.22189   0.22016   0.21844

edited Jan 20 '11 at 00:03

answered Jan 19 '11 at 12:49

leonbloy

63,430

@leonbloy: Is it possible for you to use this method to get a recurrence formula for the probability in question? No need to explicitly solve for a closed-form solution; a recurrence relation is enough. – Qiang Li Jan 19 '11 at 17:14
The recurrence you get is for the full probability vector: $p[n] = M p[n-1]$ . I doesn't give (I don't think it's possible) a recurrence in terms on the single probability in question ($p[7] or 1-p[7]$). – leonbloy Jan 19 '11 at 17:29
@leonbloy: the $P(n) \approx (1-2^7)^{n-7} (n>6)$ does not look correct. Can you explain what should it look like and why? Many thanks! – Qiang Li Jan 19 '11 at 22:51
I made some small corrections. Does it look good to you know ? At least it's exact for n=7 :-) – leonbloy Jan 19 '11 at 23:13
actually I have just tested the approximation and it seems to approximate very well the exact result – leonbloy Jan 19 '11 at 23:53
To understand the approximation, call q(i)=probability that, after tossing coin number i, the sequence is just completed (in positions i-7,i) (no necesarily for the first time) This is 0 for i<7, 2^=128 otherwise. The prob. of the sequence NOT having appeared after N tosses, is then prod(1-q(i))... if the events were independent. – leonbloy Jan 20 '11 at 00:23
I got an even closer asymptotic solution for $P(n) \approx 1.0490350800928752*0.9920630084175037^n$ – Qiang Li Jan 20 '11 at 04:09
@leonbloy: Your asymptotic approximation does not seem correct because it seems to work for every pattern of length 7. Clearly different patterns have different asymptotic probabilities. – Qiang Li Jan 20 '11 at 04:20
@Qiang Li: My approximation is not correct in the sense it's that: an approximation. The error introduced by the assumption of independence (which is -of course- false) will be different for different patterns. For this pattern, it gives a good approximation, specially in the range N=100-200. BTW, Your approximation has some theoretical basis or it's just a numerical fit? – leonbloy Jan 20 '11 at 11:16
@leonbloy: but yours does not seem to be a valid approximation based on yourr reasoning. Can you write more clearly your argument above since I do not see its validity. Can you also provide a similar argument for the pattern HHHHHHH if the expression is different from $P(n)\approx(1−2^{-7})^{n−6}$ when $n>6$ – Qiang Li Jan 20 '11 at 20:37
What part of my reasoning don't you understand? the (1-1/128) or the (n-6) ? seems pretty basic to me. My argument does not depend on the pattern. It will just work better from some patterns (when the independence assumption is more reasonable). – leonbloy Jan 20 '11 at 21:49
You might ask yourself: I throw 100 coins. What is the probability of finding the pattern (say) HHTHT starting in position (say) 30? It's 1/2^5 ,no? (independent of the pattern, right?). What is the probability of NOT finding it there ? 1-1/2^5. And of NOT find it in position 1? The same. And of NOT finding it in the full sequence? Well, that's the prob. of not finding it in position 1, AND not finding it in position 2 AND... in position 94,95,96. (In position 97 it cant be). Then the probability (ASSUMMING INDEPENDENCE, WHICH IS ONLY AN APPROXIMATION) is (1-1/2^5)^96. – leonbloy Jan 21 '11 at 01:39

Aryabhata · Answer 2 · 2011-01-19T11:13:55.213

Construct a deterministic finite automaton which detects the regular expression $.*(HHTHTHH).*$ such that the sole accepting state $A$ always transitions to a state $B$, which always transitions to itself. The starting state is $S$.

This is basically a directed graph and we can find the transition matrix $P$, each row of which gives the probabilities of getting from one state (i.e. vertex) corresponding to that row, to the next states (one vertex corresponding to each column).

You basically need the entries corresponding to starting state $S$ and ending state $A$, from each of $P, P^2, P^3, \dots , P^n$ and add them up to get the probability that the pattern $HHTHTHH$ occurs (or more simply the $(S,B)$ entry of $P^{n+1}$ will do). Subtract that from $1$ and you are done.

The $(S,A)$ entry of $P^k$ gives the probability that we find the pattern exactly at the end of $k$ tosses, and no tosses before that.

not completely get your solution. Is it possible for you to use this method to get a recurrence formula for the probability in question? — Qiang Li, Jan 19 '11 at 17:13
@Qiang: It gives you multiple recurrences (one for each state). You can diagonalize the matrix (if that possible) to compute the powers easily. — Aryabhata, Jan 19 '11 at 18:43

score 1 · Answer 3 · answered Jan 20 '11 at 12:39

1

I don't know about exact solution, but numeric approximation is $P(n)=1-e^{-\frac{n}{125}}$. alt text

Matlab\Octave Code

function test_script_1()
x=7:30:600;
y=tsh([],5000,7:30:600);
er=@(p)sum(((p(1)+p(2)*exp(p(3)*x/500))-y).^2);
an=fminsearch(er,[0,0,0]);
anr=round(an);
disp(anr);
plot(x,y,x,anr(1)+anr(2)*exp(anr(3)*x));
legend('calculated','1-e^{-x/125}',4)
end
function result=tsh(a,b,n)
% HHTHTHH (H -> 1 , T -> 0)
if isempty(a)
    a=logical([1 1 0 1 0 1 1]);
end
if isempty(b)
    b=1000;
end
if isempty(n)
    n=100;
end
if any(size(n)>1)
    result=zeros(size(n));
    for pa=1:length(n)
        result(pa)=test_script_1(a,b,n(pa));        
    end
    return
end
la=length(a);
t=false(1,b);
for i=1:b
    v=raspr(0,1,n);
    for j=1:n
        if all(a==v(j:j+la-1))
            t(i)=true;
            break
        end
    end
end
result=nnz(t)/b;
end
function resu=raspr(a1,b1,s)
resu=round((b1-a1+1)*rand(1,s)+a1-0.5);
end

Somebody know why is that?

answered Jan 20 '11 at 12:39

Alexander

93

Because you can treat the problem as a rare event problem, i.e. the probability of encountering the sequence HHTHTHH is small. Therefore the probability distribution for the waiting time to encountering the sequence will be approximately exponential. So you only need to compute find the expectation value of the waiting time to have a nice exponential approximation. I guess that can be found by a simple recursion relation. – Raskolnikov Jan 20 '11 at 13:11
this is wrong, as $n \to \infty$, $P(n)\to 0$. Yours goes to 1. – Qiang Li Jan 20 '11 at 20:34
Yes, the law is expected to be exponential. In fact, if you see my approximation, it's practically the same (127.5 instead of 125, and a -6 shift that is unimportant for large n. – leonbloy Jan 20 '11 at 21:45
2

@Qiang Li - you right, i have read the question wrong. But it still correct for inverse problem – Alexander Jan 20 '11 at 22:17

probability question related to pattern in coin tossing

3 Answers3

Linked