2

A random process has three possible outcomes: $A$, $B$, and $C$. At each step, the outcome is decided randomly, and is uncorrelated with previous outcomes. The outcomes occur with probabilities $p_A$, $p_B$ and $p_c$ (of course, $p_A + p_B + p_C = 1$).

In a sequence of length $N$ generated using this process, what is the probability that the an unbroken run of two of the variables of length $k$ or more (e.g. $ABABABA$ for $k=7$) occurs somewhere in the sequence at least once?

What should be the length of the sequence such that the probability of the run occurring is greater than or equal to $\frac{1}{2}$?

(Is this even a tractable problem? I am a beginner in the area, but from what I have seen, things start to get rather nasty quite quickly, even for runs in binary processes.)

MGA
  • 9,636
  • Hint: Do you know how to solve the problem when there are only two results E, F? The relevant question in that case is what is the probability of a run of a single variable. If you know this solution you simply call a pair A, B by a single name C, and the probability of either one coming up is $p_A+p_B$. There is a bit of cumbersome combinatorics when we consider that AAA can also be considered as part of the event of A, B repetitions as well as part of the event of A C repetitions. Are you concerned about the general problem of probability for runs, or more about the combinatorics? – F. Solis Jan 30 '13 at 19:40
  • @F.Solis Thanks for your comment. The idea of combining A and B as one event makes a lot of sense (the only limitation is that the run has to be of even length, but I can live with that.) I have come across a recursive relationship for the binary problem here link, but while the relationship seems correct, I can't make heads or tails of where it came from. – MGA Jan 30 '13 at 19:49
  • Just to clarify, do you want a repeating pair, or do you accept AABBAAA, any run that uses only 2 of the three variables? – F. Solis Jan 30 '13 at 20:14
  • I am specifically after the pattern ABABABAB... – MGA Jan 30 '13 at 20:15

2 Answers2

0

The event of n repetitions of the pattern AB, followed by a breaking character B or C is $p_A^np_B^n(p_B+p_C)$. We sum over all $n \geq 4$. $\sum_{n=4}^{n=\infty} p_A^np_B^n(p_B+p_C)=p_A^4 p_B^4 (p_B+p_C)\sum_{n=0} ^{\infty} p_A^np_B^n=\frac{p_A^4 p_B^4 (p_B+p_C)}{1-p_Ap_B}$. For the odd case, where the run is broken at A, followed by an A or C, we have:$\frac{p_A^4 p_B^3 (p_A+p_C)}{1-p_Ap_B}$. The result is the sum of these two cases.

F. Solis
  • 168
  • Hmmm... Not sure this is correct. You seem to be computing the probability of the event that one starts by the word ABABABA. – Did Feb 01 '13 at 10:29
0

Call $X_n$ the length of the longest prefix of the word you are interested in which ends the sequence at time $n$. For example, the word ABABABA and the sequence ABCACBCAB yield $(X_n)_{0\leqslant n\leqslant9}$ equal to $0120100012$.

Let us continue with this specific word ABABABA in mind. The question is to estimate $\mathbb P(T\leqslant n)$ for some fixed $n$, where $T=\inf\{n\geqslant0\mid X_n=7\}$. The key fact is that $(X_n)_{n\geqslant 0}$ is a Markov chain and that one can compute the generating function $\mathbb E(s^T)$.

Call $t_k(s)=\mathbb E(s^T\mid X_0=k)$ for $0\leqslant k\leqslant 7$. Then $t_7(s)=1$ and we look for $t_0(s)$. Examining the next letter, one sees that, if $X_n=k$, then $X_{n+1}$ is $k+1$ or $1$ or $0$. For example, $X_n=3$ means one just saw ABA. Thus $X_{n+1}$ is $4$, $1$, or $0$ when the next letter is B, A, or C. As a consequence, $$ t_3(s)=s(p_Bt_4(s)+p_At_1(s)+p_Ct_0(s)). $$ Proceeding like this for each $0\leqslant k\leqslant 7$, one gets for every fixed $s$ a Cramér system which $(t_k(s))_{0\leqslant s\leqslant 7}$ solves, with boundary condition $t_7(s)=1$. Solving this system yields $t_0(s)$ as a rational function of $s$, which one can then expand into a series $t_0(s)=\sum\limits_{k\geqslant7}a_ks^k$. This yields $$ \mathbb P(T\leqslant n)=\sum_{k\leqslant n}a_k. $$ Note finally that the behaviour of $\mathbb P(T\geqslant n)$ when $n\to\infty$ is ruled by the radius of convergence of the series $t_0(s)$, which can be read directly on the original Cramér system, without solving it. More precisely, the smallest positive root $\sigma$ of the determinant of this system is always at least $1$ and, when $n\to\infty$, $$ \mathbb P(T\geqslant n)=\sigma^{-n+o(n)}. $$

Did
  • 279,727
  • Thanks a lot for your answer, this is a wonderful branch of mathematics where seemingly simple problems lead to an intricate analysis. Just one small point, and I may very well be wrong: Could there a mistake when you define $t_k(s) = \mathbb{E}(s^T|X_0=k)$? Is $X_0$ not the count of how many (uninterrupted) characters of the word in consideration you have seen in the first step, and is that by definition not always zero? – MGA Feb 05 '13 at 23:49
  • The definition might be a little sloppy but there is no mistake. A more rigorous definition of $t_3(s)$, for example, is that $t_3(s)$ is the expectation of $s^{T-3}$ conditionally on the event that the 3 first outcomes are ABA (that is, the beginning of the word to reach). Likewise for every $t_k(s)$. – Did Feb 06 '13 at 07:28