1

I was trying to solve this problem: Count the number of bit-strings of length 8 with 3 consecutive zeros or 4 consecutive ones. and I thought I could use binomial coefficient. Number of bit strings with 3 consecutive zeros or 4 consecutive 1s

I figured since 3 consecutive 0s could be treated as one, I could find the number of strings via:

$C(6,1) + C(6,2) + C(6,3) + C(6,4) + C(6,5) + C(6,6)$

$$\frac{6!}{1!5!}+\frac{6!}{2!4!}+ \frac{6!}{3!3!}+ \frac{6!}{4!2!}+\frac{6!}{1!5!}+\frac{6!}{0!6!}$$ $$6 + 15+20+15+6+1 = 63$$

But I was expecting this to be 107?

  • Keep in mind that there are two choices for each element outside the block(s) of consecutive zeros or consecutive ones. Thus, there are $2^5$ bit strings of length that begin $000$. – N. F. Taussig Mar 17 '19 at 11:07

2 Answers2

1

Note, if three consecutive $0$'s are collapsed to one $0$, the resulting strings are not always uniquely obtained from a string of length $8$. For instance \begin{align*} 0001010,\qquad 0100010,\qquad 0101000 \quad\rightarrow\quad 01010 \end{align*} which explains that $63$ does not give all wanted strings.

The following answer is based upon the Goulden-Jackson Cluster Method. We consider the set of words of length $n\geq 0$ built from an alphabet $$\mathcal{V}=\{0,1\}$$ and the set $B=\{000,1111\}$ of bad words, which are not allowed to be part of the words we are looking for. We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of wanted words of length $n$.

According to the paper (p.7) the generating function $f(s)$ is \begin{align*} f(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[000])+\text{weight}(\mathcal{C}[1111])\tag{2} \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[000])&=-s^3-(s+s^2)\text{weight}(\mathcal{C}[000])\\ \end{align*} and get \begin{align*} \text{weight}(\mathcal{C}[000])&=-\frac{s^3}{1+s+s^2}\\ &=-\frac{s^3(1-s)}{1-s^3}\\ \text{weight}(\mathcal{C}[1111])&=-\frac{s^4(1-s)}{1-s^4} \end{align*}

and get according to (2) \begin{align*} \text{weight}(\mathcal{C})&=\text{weight}(\mathcal{C}[000])+\text{weight}(\mathcal{C}[1111])\\ &=-\frac{s^3(1-s)}{1-s^3}-\frac{s^4(1-s)}{1-s^4} \end{align*}

It follows from (1)

\begin{align*} f(s)&=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2s+\frac{s^3(1-s)}{1-s^3}+\frac{s^4(1-s)}{1-s^4}}\\ &=1 + 2 s + 4 s^2 + 7 s^3 + 12 s^4 + 21 s^5\\ &\qquad + 36 s^6 + 63 s^7 + \color{blue}{109} s^8 + 189 s^9 + +\cdots\\ \end{align*}

The last line was calculated with the help of Wolfram Alpha. The coefficient of $s^{8}$ shows there are $109$ words of length $8$ which do not contain $000$ and not $1111$. Since we want the number of words which do contain either $000$ or $1111$ we finally conclude the number of valid words is \begin{align*} 2^8-109=256-109=\color{blue}{147} \end{align*}

The $\color{blue}{147}$ valid words are: $$ \begin{array}{cccccc} 00000000&00000001&00000010&00000011&00000100&00000101\\ 00000110&00000111&00001000&00001001&00001010&00001011\\ 00001100&00001101&00001110&00001111&00010000&00010001\\ 00010010&00010011&00010100&00010101&00010110&00010111\\ 00011000&00011001&00011010&00011011&00011100&00011101\\ 00011110&00011111&00100000&00100001&00100010&00100011\\ 00101000&00101111&00110000&00110001&00111000&00111100\\ 00111101&00111110&00111111&01000000&01000001&01000010\\ 01000011&01000100&01000101&01000110&01000111&01001000\\ 01001111&01010000&01010001&01011000&01011110&01011111\\ 01100000&01100001&01100010&01100011&01101000&01101111\\ 01110000&01110001&01111000&01111001&01111010&01111011\\ 01111100&01111101&01111110&01111111&10000000&10000001\\ 10000010&10000011&10000100&10000101&10000110&10000111\\ 10001000&10001001&10001010&10001011&10001100&10001101\\ 10001110&10001111&10010000&10010001&10011000&10011110\\ 10011111&10100000&10100001&10100010&10100011&10101000\\ 10101111&10110000&10110001&10111000&10111100&10111101\\ 10111110&10111111&11000000&11000001&11000010&11000011\\ 11000100&11000101&11000110&11000111&11001000&11001111\\ 11010000&11010001&11011000&11011110&11011111&11100000\\ 11100001&11100010&11100011&11101000&11101111&11110000\\ 11110001&11110010&11110011&11110100&11110101&11110110\\ 11110111&11111000&11111001&11111010&11111011&11111100\\ 11111101&11111110&11111111\\ \end{array} $$

Markus Scheuer
  • 108,315
1

Here is a method to count which goes as much as possible with the one in the OP. We face only the part with three consecutive zeros, want to get that claimed $107$. (The question does not come precise, i suppose this is the question.)

So let us be more precise in the counting.

For each bit string with at least one occurrence of $000$, named "good" for short in the sequel, we replace the first occurence of $000$ by a red zero, and we consider it as $\color{red}0\sim 000$.

So a good string generates a $6$-bit sting with exactly a red zero constrained to the following:

  • $0\color{red}0$ is not a substring.
  • $000*\color{red}0$ is not a substring. (Here $*$ is a placeholder for a word with letters $0,1$.)

Then the sum in the OP corresponds to the good strings with the first zero colored in red. But there are also further solutions! We collect them. If the red zero is not in $0\color{red}0$, then it is placed like this $\color{blue}1\color{red}0\sim \color{blue}1000$. So this string must exist as a substring for all further solutions. We obtain the sum: $$ \begin{aligned} \sum_{1\le k\le 6}&\ (C(6,k)+(k-1)C(5,k)) \\ \qquad =&\ (C(6,1)+0\cdot C(5,1))\\ \qquad +&\ (C(6,2)+1\cdot C(5,2))\\ \qquad +&\ (C(6,3)+2\cdot C(5,3))\\ \qquad +&\ (C(6,4)+3\cdot C(5,4))\\ \qquad +&\ (C(6,5)+4\cdot C(5,5))\\ \qquad +&\ (C(6,6)+5\cdot C(5,6))\\ \qquad =&\ 63 + 49 = 112\ . \end{aligned} $$ where $C(6,k)$ corresponds to the choice of $\color{red}0$ at the first place (among the $k$ places), and for each $C(5,k)$ we mark red one of the $k$ zeros not in the first position (which explains its factor $(k-1)$), and insert a blue $1$ in front of it ( - the reason for having available only $5$ places.)

The above sum is going beyond $107$, because we have neglected the second condition. The $5$ superfluous strings are of the shape $*000*\color{blue}1\color{red}0*$ of length six, so only a star is not empty, and the corresponding words are:

  • $0000\color{blue}1\color{red}0$,
  • $1000\color{blue}1\color{red}0$,
  • $0001\color{blue}1\color{red}0$,
  • $000\color{blue}1\color{red}00$,
  • $000\color{blue}1\color{red}01$.

With the same argument, there are $$ \sum_{1\le k\le 5}(C(5,k)+(k-1)C(4,k)) = 48 $$ possibilities for strings containing $1111$.

The number of $8$-bit strings containing both $000$ and $1111$, i.e. strings of the shape $*000*1111*$ or reversed is $8$, because for the matching pattern $*000*1111*$ we have only the following explicit possibilities:

  • $00001111$,
  • $10001111$,
  • $00011111$,
  • $00011110$.

Putting all together, there are 17+48-8 = 147 possibilities for a bit string of length $8$ to contain either $000$, or $1111$ (or both) as a substring.


Computer check, here sage:

def match000(v):
    for k in range(len(v)-2):
        if (0, 0, 0) == (v[k], v[k+1], v[k+2]):
            return True
    return False

def match1111(v):
    for k in range(len(v)-3):
        if (1, 1, 1, 1) == (v[k], v[k+1], v[k+2], v[k+3]):
            return True
    return False

count000 = 0
count1111 = 0
count = 0

for number in [0..2^8-1]:
    v = number.digits(2, padto=8)
    if match000(v):
        count000 += 1
    if match1111(v):
        count1111 += 1
    if match000(v) and match1111(v):
        count += 1

print "Matches for 000  :: %s" % count000
print "Matches for 1111 :: %s" % count1111
print "Matches for both :: %s" % count

which delivers

Matches for 000  :: 107
Matches for 1111 :: 48
Matches for both :: 8
dan_fulea
  • 32,856