I am thinking the total number of possible strings is $2^8$ and the number of strings with $100$ at the beginning would be $2^8 - 2^3 = 2^5$. Now "$100$" can shift across the string $5$ times going to the right. Is the answer then $2^8 - 2^5 \times 5$?
-
You are overcounting the bad strings. Specifically, you count the strings $100100xx$ twice. – lulu Aug 13 '16 at 18:11
-
@lulu So we minus 2^2? – Dre_Dre Aug 13 '16 at 18:14
-
No. It's much worse than that...you have to add back every double counted string (since $8<3\times3$ you can't have triply counted strings). Thus you have to enumerate all the patterns in which $100$ appears twice. Perfectly possible, because $8$ is quite small. Hard to generalize this approach to longer strings though. – lulu Aug 13 '16 at 18:16
-
2^8 - 2^3 is not 2^5, the first expression is 248, the letter is 32. – miracle173 Aug 13 '16 at 21:28
6 Answers
As discussed in the comments, the straight forward approach as proposed in the question won't work because it multiply counts the bad strings in which $100$ appears more than once (indeed, it counts bad strings once for each appearance of $100$).
For short strings (like length $8$) a more careful count via the principle of Inclusion/Exclusion isn't impossible but it's not exactly easy and, as the length increases, this method gets harder and harder. I think it's easier to attack the problem recursively. Toward that end, define some sub-types of the "good" strings of length $n$. Specifically, let $A_n$ denote those good strings that end in $1$ and let $B_n$ denote those that end in $10$. Note that the total $T_n$ is then given by $$T_n=A_n+B_n+1$$ where the $1$ comes from the good string $0^n$ which ends in neither $1$ nor $10$.
Recursive, we note that $$A_n=A_{n-1}+B_{n-1}+1=T_{n-1}$$ since you get a good string of length $n$ by appending a $1$ to any good string of length $n-1$. Similarly $$B_n=A_{n-1}=T_{n-2}$$ Thus $$T_n=T_{n-1}+T_{n-2}+1$$
It is easy to see that $A_1=1$, $A_2=2$, $B_1=0$, $B_2=1$ whence $$\{T_n\}=\{2,4,7,12,20,33,54,88,\cdots\}$$
Consistency Check: Let's count $T_4,\;T_5,\;T_6$ directly. There are $16$ strings of length $4$ and the bad ones are $x100$ and $100x$, thus there are $4$ bad strings so $T_4=16-4=12$ as desired. Similarly the bad strings of length $5$ are $100xx$, $x100x$, $xx100$ so $T_5=32-12=20$ as desired. To count the bad strings of length $6$ we have to be a little careful...the patterns are $100xxx$, $x100xx$, $xx100x$, $xxx100$ but we have to add back $1$ for the double counted string $100100$. Thus $T_6=64-8\times 4+1=33$ as desired.
Induction shows that, in fact, $T_n=F_{n+3}-1$ where $F_i$ denotes the Fibonacci numbers $\{F_i\}_{i=1}^{\infty}=\{1,1,2,3,5,8,13,21,\cdots\}$

- 70,402
Using generating functions with $z$ for zero and $w$ for ones we get the generating function
$$(1+z+z^2+\cdots) \left(\sum_{q\ge 0} ((w+w^2+w^3+\cdots) z)^q\right) (1+w+w^2+\cdots).$$
This yields
$$\frac{1}{1-z}\left(\sum_{q\ge 0} z^q \frac{w^q}{(1-w)^q}\right) \frac{1}{1-w} \\ = \frac{1}{1-z}\frac{1}{1-w} \frac{1}{1-wz/(1-w)} \\ = \frac{1}{1-z} \frac{1}{1-w-wz}.$$
As we are only interested in the count we may drop the distinction between zeros and ones, getting
$$\frac{1}{1-z} \frac{1}{1-z-z^2} = \frac{2+z}{1-z-z^2} - \frac{1}{1-z}.$$
Extracting coeffcients from this yields in terms of Fibonacci numbers
$$2F_{n+1} + F_n - 1 = F_{n+1} + F_{n+2} - 1 = F_{n+3} - 1.$$
We can confirm these results using the DFA method which yields
> GFNC([[1,0,0]], 2,true); [[1, 0, 0]] Q[], 0, Q[] Q[], 1, Q[1] Q[1], 0, Q[1, 0] Q[1], 1, Q[1] Q[1, 0], 0, Q[1, 0, 0] Q[1, 0], 1, Q[1] Q[1, 0, 0], 0, Q[1, 0, 0] Q[1, 0, 0], 1, Q[1, 0, 0] 1 -------------------- 2 (z - 1) (z + z - 1)
This link includes an explanation of the Goulden-Jackson cluster method by @MarkusScheuer.
Using inclusion-exclusion we have for the location of the forbidden pattern when $n=8$ the possibilities $(1),(2),(3),\ldots,(6)$ and $(1,4),(2,5),(3,6)$ and $(1,5),(2,6)$ and $(1,6).$ We thus obtain
$$2^8 - 6\times 2^5 + 6\times 2^2 = 88.$$
We can generalize the inclusion-exclusion argument. Suppose we have $q$ instances of the pattern where $q\le\lfloor n/3\rfloor.$ This leaves $n-3q$ free slots that must be distributed in the $q+1$ spaces between / surrounding the patterns. By stars and bars this can be done in the following number of ways:
$${n-3q+q\choose q} = {n-2q\choose q}.$$
We thus obtain by inclusion-exclusion the closed form
$$\sum_{q=0}^{\lfloor n/3\rfloor} {n-2q\choose q} (-1)^q 2^{n-3q}.$$
We can evaluate this with the Egorychev method. Introduce
$${n-2q\choose q} = {n-2q\choose n-3q} = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n-3q+1}} (1+z)^{n-2q} \; dz.$$
Observe that this vanishes when $3q\gt n$ so we may extend the range of $q$ to infinity, getting for the sum
$$\frac{2^n}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} (1+z)^{n} \sum_{q\ge 0} \frac{z^{3q}}{(1+z)^{2q}} (-1)^q 2^{-3q} \; dz \\ = \frac{2^n}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} (1+z)^{n} \frac{1}{1+2^{-3}z^3/(1+z)^2} \; dz \\ = \frac{2^n}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} (1+z)^{n+2} \frac{1}{1+2z+z^2+2^{-3}z^3} \; dz.$$
Now put $z/(1+z) = w$ so that $z = w/(1-w)$ and $1+z = 1/(1-w)$ and $dz = 1/(1-w)^2 \; dw$ and
$$1+2z+z^2+2^{-3}z^3 = \frac{1}{8} \frac{(w-2)(w^2+2w-4)}{(1-w)^3}$$
which yields for the integral
$$\frac{2^n}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n+1}} \frac{1}{1-w} \frac{8(1-w)^3}{(w-2)(w^2+2w-4)} \frac{1}{(1-w)^2} \; dw \\ = \frac{2^n}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n+1}} \frac{8}{(w-2)(w^2+2w-4)} \; dw.$$
This is
$$2^n [w^n] \frac{8}{(w-2)(w^2+2w-4)} = [w^n] \frac{8}{(2w-2)(4w^2+4w-4)} \\ = [w^n] \frac{1}{(w-1)(w^2+w-1)} \\ = [w^n] \frac{1}{(1-w)(1-w-w^2)}.$$
This is the same generating function as what we obtained earlier and the argument is concluded.
Addendum. Wilf also succeeds here. We have the generating function
$$\sum_{n\ge 0} z^n 2^n \sum_{q=0}^{\lfloor n/3\rfloor} {n-2q\choose n-3q} (-1)^q 2^{-3q} = \sum_{q\ge 0} 2^{-3q} (-1)^q \sum_{n\ge 3q} z^n 2^n {n-2q\choose n-3q} \\ = \sum_{q\ge 0} 2^{-3q} (-1)^q \sum_{n\ge 0} z^{n+3q} 2^{n+3q} {n+q\choose n} = \sum_{q\ge 0} z^{3q} (-1)^q \sum_{n\ge 0} z^{n} 2^{n} {n+q\choose n} \\ = \sum_{q\ge 0} z^{3q} (-1)^q \frac{1}{(1-2z)^{q+1}} = \frac{1}{1-2z} \frac{1}{1+z^3/(1-2z)} \\ = \frac{1}{1-2z+z^3}.$$
This is the same generating function as before, done.

- 61,317
-
-
1This is more interesting than it might at first appear because it is an example of a binomial sum where coefficient extraction of formal power series alone does not suffice for evaluation and we need a substitution in the integral including the differential. – Marko Riedel Aug 14 '16 at 21:20
-
Yes, it's really somewhat more challenging than I thought! Thanks for pointing to it. – Markus Scheuer Aug 14 '16 at 21:28
-
A nice technique is the so-called Goulden-Jackson Cluster Method which is a convenient method to derive a generating function for problems of this kind.
We consider words of length $n\geq 0$ built from an alphabet $$\mathcal{V}=\{0,1\}$$ and the set $\mathcal{B}=\{100\}$ of bad words which are not allowed to be part of the words we are looking for.
We derive a function $F(x)$ with the coefficient of $x^n$ being the number of wanted words of length $n$. According to the paper (p.7) the generating function $F(x)$ is \begin{align*} F(x)=\frac{1}{1-dx-\text{weight}(\mathcal{C})} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with the weight-numerator $\mathcal{C}$ with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[100]) \end{align*}
We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[100])&=-x^3 \end{align*}
It follows:
A generating function $F(x)$ for the number of words built from $\{0,1\}$ which do not contain the subword $100$ is \begin{align*} F(x)&=\frac{1}{1-dx-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2x+x^3}\\ &=1+2x+4x^2+7x^3+12x^4+20x^5\\ &\qquad+33x^6+54x^7+88x^8+143x^9+232x^{10}+\cdots\tag{1} \end{align*}
The last line (1) was calculated with Wolfram Alpha and we see the coefficient of $x^8$ is $88$.
We conclude: out of $2^8=256$ binary strings of length $8$ there are precisely $88$ words which do not contain the substring $100$.
Of course, we can also calculate the result by hand by expanding the generating function as geometric series and extracting the coefficient of $x^8$.
In order to do so its convenient to use the coefficient of operator $[x^j]$ to denote the coefficient of $x^j$ of a series.
We obtain \begin{align*} [x^8]\frac{1}{1-2x+x^3}&=[x^8]\sum_{n=0}^\infty(2x-x^3)^n\tag{2}\\ &=[x^8]\sum_{n=0}^\infty x^n\sum_{j=0}^n\binom{n}{j}(-x^2)^j2^{n-j}\tag{3}\\ &=\sum_{n=0}^8[x^{8-n}]\sum_{j=0}^n\binom{n}{j}(-1)^j2^{n-j}x^{2j}\tag{4}\\ &=\binom{4}{2}(-1)^22^{4-2}+\binom{6}{1}(-1)^12^{6-1}+\binom{8}{0}(-1)^02^{8-0}\tag{5}\\ &=6\cdot 4-6\cdot 32+1\cdot 256\\ &=88 \end{align*} and the claim follows.
Comment:
In (2) we expand the geometric series.
In (3) we factor out $x^n$ and expand the binom using the formula $$(a-b)^n=\sum_{j=0}^n\binom{n}{j}(-b)^ja^{n-j}$$
In (4) we use the linearity of the coefficient of operator and apply the formula $$[x^p]x^qA(x)=[x^{p-q}]A(x)$$ Since the exponent of $x^{8-n}$ is non-negative we restrict the upper limit of the sum with $8$.
In (5) we select the coefficients of $x^{8-n}$. Since $0\leq j\leq n$ and the exponent of $x^{2j}$ is even, we need only to consider $n\in\{4,6,8\}$.

- 108,315
-
(+1). Nice to see a presentation of the best method, which really completes the page. Why is it that you reversed the forbidden pattern? (Will delete comment if it is obvious. The word $0100$ contains the forbidden pattern as per the OP but it does not contain $001$, and yet it should be forbidden.) – Marko Riedel Aug 14 '16 at 21:10
-
Consider 4 exclusive states a string can be in :
- (A) The string contains 100
- (B) The string ends in 10, but doesn't contain 100
- (C) The string ends in 1, but doesn't contain 100
- (D) None of the Above
Now consider a matrix representing transitions from the 4 states. For example, if a string is in state (B), and the next bit is a 0, then the next state of the string is (A). The transitions if the next bit is 0 are given by:
$$M_0 = \begin{array} {c|cccc} & A & B & C & D \\ \hline A & 1 & 0 & 0 & 0 \\ B & 1 & 0 & 0 & 0 \\ C & 0 & 1 & 0 & 0 \\ D & 0 & 0 & 0 & 1 \\ \end{array}$$
And the transitions if the next bit is a 1 :
$$M_1 = \begin{array} {c|cccc} & A & B & C & D \\ \hline A & 1 & 0 & 0 & 0 \\ B & 0 & 0 & 1 & 0 \\ C & 0 & 0 & 1 & 0 \\ D & 0 & 0 & 1 & 0 \\ \end{array}$$
And initially the string is empty, so it is in state (D):
$$V = \begin{array} {cccc} A & B & C & D \\ \hline 0 & 0 & 0 & 1 \end{array}$$
The states reachable from the string of length $n$ is given by:
$$V(M_0 + M_1)^n$$
So for example, the strings of length 8 will have states:
$$V(M_0 + M_1)^8 = \begin{array} {cccc} A & B & C & D \\ \hline 168 & 33 & 54 & 1 \end{array}$$
168 strings will contain 100, 33 will end in 10 but not contain 100, 54 will end in 1 but not contain 100, and there will be 1 more string (the string containing all zeroes). So there are $2^8 - 168 = 33 + 54 + 1 = 88$ strings not containing 100.

- 23,556
No formal mathematics, but I thought I post because it may help you or someone else anyway.
This is a short function in the programming language Python that gives the number of length
-bit binary strings not containing the binary substring substring
.
def notContaining(length, substring):
"""
Gives the number of 'length'-bit strings not containing 'substring'.
"""
n = 2**length # number of 'length'-bit strings
for i in range(0, 2**length): # for every number having (maximal) 'length' bits
if str(substring) in i: # if the substring is in the i-th 'length'-bit binary string
n -= 1
return n
For length = 1
and substring = '100'
it returns 88
:
>>> notContaining(8, '100')
88
If anyone notices me making a mistake here, it'd be thankful to know.

- 980
Let $C_n$ be the number of bit strings of length $n$ that Contain $100$, and let $D_n$ be the number that Don't contain it. Clearly $C_n+D_n=2^n$ and $C_1=C_2=0$. For $n\ge3$, we have the following recursion:
$$C_n=\sum_{k=0}^{n-3}2^{n-3-k}D_k$$
where we define $D_0=1$. The proof of the recursion is that if a string Contains $100$, then its first appearance will be preceded by a string of length at most $n-3$ that Doesn't contain $100$ and followed by an additional $n-3-k$ bits that can be anything.
We can now dispose of the $C_n$'s, writing $C_n=2^n-D_n$, so that
$$D_n=2^n-\sum_{k=0}^{n-3}2^{n-3-k}D_k$$
When $n$ gets larger, this becomes a cumbersome way to proceed, but for small values it's fairly efficient. In particular,
$$D_8=2^8-(2^5D_0+2^4D_1+2^3D_2+2^2D_3+2D_4+D_5)$$
We already have $D_0=1$, $D_1=2$, and $D_2=4$, and it's easy to see that $D_3=7$, so it suffices to compute
$$D_4=2^4-(2D_0+D_1)=16-(2+2)=12$$ and $$D_5=2^5-(4D_0+2D_1+D_2)=32-(4+4+4)=20$$ so that
$$D_8=256-(32+16\cdot2+8\cdot4+4\cdot7+2\cdot12+20)=88$$

- 79,832