5

Consider sequences of numbers 0, 1, 2 with length n. There are $3^n$ such sequences.

I want to know how many sequences there are that contain a k-run of 1's followed by 2. As a regular expression:

(^|.*[^1])[1]{k}[2].*

Even better would be to know the number of sequences that contain a maximal k-run of 1's followed by 2 i.e. that contain no other K-run with K > k.

Let $\#(k)$ be the number of sequences that contain a maximal k-run of 1's followed by 2.

These are obvious conditions, #(k) must fulfil:

  1. $\#(n-1) = 1$

  2. $\sum_{k = 0..n-1} \#(k)= 3^n$

I am looking for a closed form for $\#(k)$. If this is too hard to achieve, I would be happy with a closed form for the number of sequences that contain an arbitrary (not necessarily maximal) k-run of 1's followed by 2.

Abr001am
  • 746

3 Answers3

3

As an answer to questions of the type "how many words of a particular length avoid/contain such-and-such patterns", there is a general method called the Goulden-Jackson cluster method that efficiently produces a generating function for the aforementioned sequence. The main result of the method is that if $a_n$ is the number of words over an alphabet of $m$ letters that avoid a list of specified patterns, then the generating function $f(z)=\sum a_n z^n$ has the form $$f(z)=\frac1{1-mz-C},$$ where $C$ is the weight of the "clusters" (words formed from overlapping sequences of the specified patterns). The weights of clusters ending in the various patterns satisfy a system of linear equations that depend on how the patterns overlap. (The reference is a classic paper by Zeilberger and Noonan and is highly recommended reading.) In this problem, there is a single pattern, a run of $k$ ones followed by a two. It's immediate from the G-J method that $C=-z^{k+1}$. So, if we fix $a_n$ to be the number of words over the alphabet $0,1,2$ of length $n$ that avoid this single pattern, then $$f(z)=\sum_na_nz^n=\frac1{1-3z+z^{k+1}}.$$ You can expand this generating function first like a geometric series and then use the binomial theorem to obtain an exact expression for the coefficients $a_n$. $$\begin{eqnarray*} f(z)&=&\frac1{1-3z(1-\frac{z^k}3)}\\ &=&\sum_n3^nz^n(1-\frac{z^k}3)^n\\ &=&\sum_n3^nz^n\sum_j\binom nj(-\frac13)^jz^{kj}. \end{eqnarray*} $$ So, the number of sequences of length $n$ that contain a $k$-run of ones followed by a two is $$3^n-\sum_j (-1)^j3^{n-(k+1)j}\binom{n-kj}j.$$

This is consistent with @Giovanni Resta's result for $k=2$.

Edit: As to your second question about $\#(k)= $ the number of strings of length $n$ with $k$-runs but no $K$-runs for $K>k$, this is just a matter of subtraction, i.e., it coincides with the number of strings that avoid $k+1$ runs minus the number of strings that avoid $k$ runs. So, $$\#(k)=\sum_j (-1)^j3^{n-(k+2)j}\binom{n-(k+1)j}j-\sum_j (-1)^j3^{n-(k+1)j}\binom{n-kj}j.$$

Rus May
  • 2,087
2

This is a just a partial answer, but there was not enough space in a comment.

The problem seems not so easy. It is easy to see that the number of sequences of length $n$ that do contain "$12$" are $$ 3^n - F_{2n+2}\, $$ where $F_k$ denotes the $k$-th Fibonacci number.

The resulting sequence $0, 1, 6, 26, 99, 352, 1200, 3977,\dots$ is in the OEIS as Sequence A186314, i.e. Number of ternary strings of length n which contain 01. You can follow the link for more details.

Looking for $112$, things escalate quickly. Indeed, the resulting sequence is not in the OEIS, but the complement sequence ($3^n-a(n)$) is there, as Sequence A076264 aka Number of ternary (0,1,2) sequences without a consecutive '012'.

This sequence can be described easily with a recurrence, but can also be described with a sum. Putting together the OEIS info we got that the number of sequences of length $n$ that contain "$112$" are $$ 3^n-\sum_{k=0}^{\lfloor n/3\rfloor}(-1)^k {n-2k\choose k} 3^{n-3k} $$

Maybe you or somebody else can generalize upon these partial results. I don't know if looking for maximal subsequences makes the problem easier or harder.

2

Added 2016-03-13: This answer is just a starter to the second part of OPs question, namely finding words over the ternary alphabet $\{0,1,2\}$ having maximum run length $k$ and containing the string $1^k2$ but no other strings with $1$ and run length $k$.

A detailed answer based upon a two-step language decomposition \begin{align*} \left(\varepsilon+\mathcal{H}_0+\mathcal{H}_2\right)&\left(11^{<k-1}\left(\mathcal{H}_0+\mathcal{H}_2\right) +1^k\mathcal{H}_2\right)^*1^{<k}\\ &\qquad\qquad\text{with}\\ &\mathcal{H}_0=00^{<k}\left(22^{<k}00^{<k}\right)^*2^{<{k+1}}\\ &\mathcal{H}_2=22^{<k}\left(00^{<k}22^{<k}\right)^*0^{<{k+1}}\\ \end{align*}

and resulting in a generating function \begin{align*} A_k(z)&=(1+2H(z))\sum_{q=0}^\infty\left(z\frac{1-z^{k-1}}{1-z}2H(z)+z^kH(z)\right)^q\frac{1-z^k}{1-z}\\ &=\frac{\left(1-z^k\right)\left(1-z^{k+1}\right)}{1-3z+2z^{k+1}+2z^{k+2}-z^{2k+1}-z^{2k+2}} \end{align*}

is stated in this MSE question.

Note: We derive the number of words built from $\{0,1,2\}$ having a maximal run of $1$ of length $k$ and no other maximal runs of length $\geq k$. We do not consider the additional constraint that the maximal run of $1$ has to be followed by $2$.

This approach is based upon example III.24 Smirnov words from Analytic Combinatorics by Philippe Flajolet and Robert Sedgewick.

Smirnov Words: Words without runs

Smirnov Words are words having no consecutive equal letters. They can be related to unconstrained words and vice versa in the following way:

If an unconstrained word is given, we can collaps each run of consecutive letters into a single letter associating a Smirnov word this way. Conversely, starting from a Smirnov word and substituting each letter by a sequence of length $\geq 1$ of this letter we can get all unconstrained words.

Arbitrary words are derived from Smirnov words by a simultaneous substitution:

\begin{align*} W(v_1,\ldots,v_c)=S\left(\frac{v_1}{1-v_1},\ldots,\frac{v_c}{1-v_c}\right).\tag{1} \end{align*}

Relation (1) determines the generating functions $S$ for Smirnov words implicitely. Since the inverse function of $\frac{v}{1-v}$ is $\frac{v}{1+v}$ we find:

\begin{align*} S(v_1,\ldots,v_c)=W\left(\frac{v_1}{1+v_1},\ldots,\frac{v_c}{1+v_c}\right) =\left(1-\sum_{j=1}^{c}\frac{v_j}{1+v_j}\right)^{-1} \end{align*}

Hint: A more detailed explanation of Smirnov words is given in this MSE answer

Words with runs of length $< k$

Let's consider the alphabet $\{v_0,v_1,v_2\}$. Starting with a Smirnov word we can build from it words with runs of length $<k$ by substituting

$$v_j\mapsto v_j+\cdots+v_j^{k-1}=v_j\frac{1-v_j^{k-1}}{1-v_j}$$

Replacing $v_j$ with $z$ we obtain a generating function describing all words built from $\{0,1,2\}$ having a run of length $<k$ \begin{align*} G^{<(k,k,k)}(z)&=\left(1-3z\frac{1-z^{k-1}}{1-z^k}\right)^{-1}=\frac{1-z^k}{1-3z+2z^k} \end{align*}

In order to find the number of words with a maximal run of $1$ of length $k$ and no other runs with a length $\geq k$ we use \begin{align*} G^{<(k,k+1,k)}(z)&=\left(1-2z\frac{1-z^{k-1}}{1-z^k}-z\frac{1-z^{k}}{1-z^{k+1}}\right)^{-1}\\ &=\frac{1-z^k-z^{k+1}+z^{2k+1}}{1-3z+z^k+z^{k+1}+2z^{k+2}-2z^{2k+1}} \end{align*}

We conclude: The number of words of length $n$ from the alphabet $\{0,1,2\}$ with a maximum run of $1$ of length $k$ and with maximum runs of $0,2$ of length $<k$ is the coefficient of $z^n$ of

\begin{align*} G^{<(k,k+1,k)}(z)-G^{<(k,k,k)}(z) \end{align*}

$$ $$

Example $k=3$

With the help of Wolfram Alpha we obtain

\begin{align*} G^{<(3,3,3)}(z)&=\frac{1-z^3}{1-3z+2z^3}\\ &=1+3z+9z^2+24z^3+66z^4+180z^5+\mathcal{O}(z^6)\\ G^{<(3,4,3)}(z)&=\frac{1-z^3-z^4+z^7}{1-3z+z^3+z^4+2z^5-2z^7}\\ &=1+3z+9z^2+25z^3+70z^4+196z^5+\mathcal{O}(z^6)\\ \end{align*}

The difference of the generating functions is \begin{align*} G^{<(3,4,3)}(z)-G^{<(3,3,3)}(z)=z^3+4z^4+16z^5+\mathcal{O}(z^6) \end{align*}

and the words with runs of $1$ of length $k=3$ are

\begin{align*} z^3\quad&\quad111\\ z^4\quad&\quad0111\quad2111\quad1110\quad1112\\ z^5\quad&\quad00111\quad02111\quad10111\quad12111\quad20111\quad22111\\ &\quad01112\quad21110\quad01110\quad21112\\ &\quad11100\quad11101\quad11102\quad11120\quad11121\quad11122 \end{align*}

Markus Scheuer
  • 108,315