I am using a letter set of four letters, say {A,B,C,D}, which is used to output a random string of letters. I want to calculate the expected output length until the word ABCD is obtained; that is, the letters A B C D appearing consecutively in that order.
I have referenced this question (Expected Number of Coin Tosses to Get Five Consecutive Heads), but have found a complexity in our case; when we obtain, say, ABA, then we can't say that the chain resets, since we have the next potentially successful chain already being started.
I have tried the approach below, but am not sure if it is completely correct.
I would be grateful for assertion that this approach is ok, as well as for any alternative methods to approach this issue.
Let e be the expected number of output letters needed to get the target string ABCD. Also, let f be the expected number of output letters needed to get the target string ABCD given we obtained the letter A.
The table for expected length and probability for e would be
| | Exp Len | Prob |
|--------------------------|---------|------|
| if first letter is [BCD] | e+1 | 3/4 |
| if A then [CD] | e+2 | 1/8 |
| if A then A | f+1 | 1/16 |
| if AB then [BD] | e+3 | 1/32 |
| if AB then A | f+2 | 1/64 |
| if ABC then [BC] | e+4 | 1/128|
| if ABC then A | f+3 | 1/256|
| if ABCD | 4 | 1/256|
---------------------------------------------
and a similar table for f after we obtained the letter A would be
| | Exp Len | Prob |
|-----------------------|---------|------|
|if first letter is [CD]| e+2 | 1/2 |
|if first letter is A | f+1 | 1/4 |
|if B then [BD] | e+3 | 1/8 |
|if B then A | f+2 | 1/16 |
|if BC then [BC] | e+4 | 1/32 |
|if BC then A | f+3 | 1/64 |
|if BCD | 4 | 1/64 |
------------------------------------------
The expected length e is equal to the sum of each (Probability)*(Expected Length) product set from the first table, giving
$$
e\, =\, \frac{3}{4}(e+1)\, +\, \frac{1}{8}(e+2)\, +\, \frac{1}{16}(f+1)\, +\, \frac{1}{32}(e+3
)\, +\, \frac{1}{64}(f+2)\, +\, \frac{1}{128}(e+4)\, +\, \frac{1}{256}(f+3)\, +\, \frac{1}{256}(4) \\-----\\
e\, \, =\, \frac{117}{128}e\, +\, \frac{21}{256}f\, +\, \frac{319}{256} \\\\
22e\, =\, 21f\, +\, 319 \: \: \: ---(1) \\
44e\, =\, 42f\, +\, 638 \: \: \: ---(1')
$$
A similar approach for f yields
$$
f\, =\, \frac{1}{2}(e+2)\, +\, \frac{1}{4}(f+1)\, +\, \frac{1}{8}(e+3)\, +\, \frac{1}{16}(f+2
)\, +\, \frac{1}{32}(e+4)\, +\, \frac{1}{64}(f+3)\,+\, \frac{1}{64}(4) \\-----\\
f\, \, =\, \frac{21}{32}e\, +\, \frac{21}{64}f\, +\, \frac{127}{64} \\\\
43f\, =\, 42e\, +\, 127 \: \: \: ---(2)
$$
Combining these, we obtain
$$
(2)-(1')\Rightarrow f\, =\, -2e\, +\, 765 \: \: \: ---(3)\\
(3)\rightarrow (1)\Rightarrow 22e = 21(-2e+765)+319 \\
e=256 \\
f=253
$$
So the expected length seems to be 256 letters output.
I notice this is exactly what we would expect from the naive approach, from the fact that each letter has a 1 in 4 chance appearing each time, and after any four letters' output, the chance of ABCD appearing is $$ \left( \frac{1}{4} \right) ^ 4 = \frac{1}{256} . $$ which is slightly worrying, since the question about five consecutive heads has a probability of 1/32, but a differing number of 62 for the expected length.
2014/09/16 addition:
After the above, I also calculated the expected length until I obtain either of TWO target strings; I used ABCD and CDBA as my targets, if it matters. The result was not the intuitive 128, but was 136 instead, by methodology similar to that above.
Using the answers provided, I will also try to check this result using new tactics proposed in the answers.