How many Binary Strings of length N contain within it the substring '11011'?

Question

I've been trying this for a long time. An example is:

N = 5. Then we can have 11011 which is only one substring.

But if N = 9,

We can have many binary strings in the form

11011xxxx 2^4 combinations

x11011xxx 2^4 combinations

xx11011xx 2^4 combinations

xxx11011x 2^4 combinations

xxxx11011 2^4 combinations

But this is the case of overcounting. (110110110) is one string, but is counted as 2.

How do we avoid this?

Note: this question came in the 2016 ZIO.

ZIO2016

You can construct an automata for it: Here you have a construction of a generating function of your numbers.(section 1.4) http://algo.inria.fr/flajolet/Publications/book.pdf — Phicar, Sep 09 '16 at 15:36

Markus Scheuer · Accepted Answer · 2016-10-05T14:28:50.190

Here we are looking for binary strings of length $N$ which do not contain the substring $11011$. The result is then $2^N$ minus this number.

The so-called Goulden-Jackson Cluster Method is a convenient technique to derive a generating function for problems of this kind.

We consider words of length $N\geq 0$ built from an alphabet $$\mathcal{V}=\{0,1\}$$ and the set $\mathcal{B}=\{11011\}$ of bad words which are not allowed to be part of the words we are looking for.

We derive a function $F(x)$ with the coefficient of $x^N$ being the number of wanted words of length $n$. According to the paper (p.7) the generating function $F(x)$ is \begin{align*} F(x)=\frac{1}{1-dx-\text{weight}(\mathcal{C})} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with the weight-numerator $\mathcal{C}$ with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[11011]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[11011])&=-x^5-\text{weight}(\mathcal{C}[11011])\left(x^3+x^4\right) \end{align*}

It follows: A generating function $F(x)$ for the number of words built from $\{0,1\}$ which do not contain the subword $11011$ is \begin{align*} F(x)&=\frac{1}{1-dx-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2x+\frac{x^5}{1+x^3+x^4}}\\ &=\frac{1+x^3+x^4}{1-2x+x^3-x^4-x^5} \end{align*}

Since the generating function counting the number $2^N$ of all binary strings of length $N$ is \begin{align*} \frac{1}{1-2x}=1+2x+4x^2+\cdots \end{align*}

We conclude: A generating function for the number binary strings of length $N$ which contain the string $11011$ is

\begin{align*} \frac{1}{1-2x}-F(x)&=\frac{1}{1-2x}-\frac{1+x^3+x^4}{1-2x+x^3-x^4-x^5}\\ &=\frac{x^5}{(1-2x)(1-2x+x^3-x^4-x^5)}\\ &=x^5+4x^6+12x^7+31x^8+75x^9+175x^{10}\\ &\qquad 399x^{11}+894x^{12}+1975x^{13}+4313x^{14}+9330x^{15}+\cdots \end{align*}

The last line (1) was calculated with the help of Wolfram Alpha and we see the number of solutions of strings with length up to $N=15$.

For example the $12$ strings of length $7$ containing the substring $11011$ are

\begin{array}{cccc} \color{blue}{00}11011\quad&\quad\color{blue}{0}11011\color{blue}{0}\quad&\quad11011\color{blue}{00}\\ \color{blue}{01}11011\quad&\quad\color{blue}{0}11011\color{blue}{1}\quad&\quad11011\color{blue}{01}\\ \color{blue}{10}11011\quad&\quad\color{blue}{1}11011\color{blue}{0}\quad&\quad11011\color{blue}{10}\\ \color{blue}{11}11011\quad&\quad\color{blue}{1}11011\color{blue}{1}\quad&\quad11011\color{blue}{11}\\ \end{array}

An exhaustive answer and of wide interest! always nice to learn from your posts, Markus, thanks — G Cab, Sep 09 '16 at 18:14
(+1). This is very nice work. You'll be happy to know that the DFA-method yields the same answer with the command factor(1/(1-2*z)-GFNC([[1,1,0,1,1]], 2, true));. — Marko Riedel, Sep 09 '16 at 20:11
@MarkoRiedel: Thanks a lot, Marko! I've also checked it with a little piece of R code. But, not that elegant as your part is. :-) — Markus Scheuer, Sep 09 '16 at 20:15
Code can be an instructive enrichment of mathematical posts, permitting effective communication of simple ideas that would otherwise require a great deal of notation. — Marko Riedel, Sep 09 '16 at 20:17
@MarkoRiedel: I fully agree! In fact I'm used to check my calculation activities since I'm working in a safety related business. Regards, — Markus Scheuer, Sep 09 '16 at 20:23
@MarkusScheuer There is a user at this MSE link who is asking for a proof of a theorem without using complex analysis, maybe you could have a look as you have presented a number of formal power series proofs lately. — Marko Riedel, Sep 15 '16 at 15:03
@MarkoRiedel: Many thanks for the hint, Marko. I will take a look at it. — Markus Scheuer, Sep 15 '16 at 15:21

user84413 · Answer 2 · 2016-09-11T20:56:56.603

Let $k$ be the number of blocks of 11011, let $j$ be the number of single overlaps of the blocks (giving $110111011$), and let $l$ be the number of double overlaps (giving $11011011$).

Then there are $\binom{k-1}{j}$ ways to choose the single overlaps, $\binom{k-1-j}{l}$ ways to choose the double overlaps,

$\binom{n-4k+l}{k-j-l}$ ways to choose the positions of the blocks (since there are $k-j-l$ dividers

and $n-5k+j+2l$ remaining digits), and $2^{n-5k+j+2l}$ ways to choose the other digits.

Using Inclusion-Exclusion, this gives $$\sum_{k=1}^{\lfloor\frac{n-2}{3}\rfloor}(-1)^{k+1}\sum_{j=0}^{k-1}\sum_{l=0}^{k-1-j}\binom{k-1}{j}\binom{k-1-j}{l}\binom{n-4k+l}{k-j-l}2^{n-5k+j+2l}$$.

How many Binary Strings of length N contain within it the substring '11011'?

2 Answers2

Linked