Regular Expression and NFA: each block of five consecutive symbols contains at least two 0's

Question

Consider the alphabet {0, 1}. How do I find the regular expression for the set of all strings such that each block of five consecutive symbols contains at least two 0's?

Here, by block I mean a given set of concurrent substrings. For example, for a string "0011010", the first five characters are "00110", which satisfies the above condition. The next block leaves out only the first character of the given string, giving me "01101", which again satisfies the condition.

I also need help in converting the regular expression into an NFA.

https://cs.stackexchange.com/q/1331/755 – D.W. Mar 24 '21 at 17:54 — D.W., Mar 24 '21 at 17:54
https://cs.stackexchange.com/q/137101/755 – D.W. Mar 25 '21 at 19:09 — D.W., Mar 25 '21 at 19:09

Steven · Answer 1 · 2021-03-24T15:40:57.463

2

The states of your DFA are pairs of integers $(i,j)$ with $1 \le i \le j \le 5$, plus a $\mathrm{REJECT}$ state. Intuitively state $(i,j)$ means that the two most recent $0$s have been encountered when $i$-th to last and $j$-th to last character were read.

The transition function $\delta$ satisfies $\delta(\mathrm{REJECT}, x)=\mathrm{REJECT}$ for $x \in \{0,1\}$ while, for any other state $(i,j)$, $\delta$ is defined as follows: $$ \delta( (i,j), x) = \begin{cases} (1, i+1) & \mbox{if } x=0 \mbox{ and } i<5\\ \mathrm{REJECT} & \mbox{if } x=0 \mbox{ and } i=5\\ (i+1, j+1) & \mbox{if } x=1 \mbox{ and } j<5\\ \mathrm{REJECT} & \mbox{if } x=1 \mbox{ and }j=5. \end{cases} $$

The initial state is $(1,1)$. All states, except for $\mathrm{REJECT}$, are accepting states.

edited Mar 24 '21 at 15:40

answered Mar 23 '21 at 17:33

Steven

29,419
2
28
49

What would be the regular expression in such a case? – Ramon Zarate Mar 24 '21 at 05:52
There is an algorithm for converting DFAs to equivalent regular expressions. – Yuval Filmus Mar 24 '21 at 08:49
@YuvalFilmus I tried it, but I am not able to solve it for multiple accepting states. Thanks for suggesting! – Ramon Zarate Mar 24 '21 at 10:22
@Steven upon reviewing your DFA, I found 1111 to be going to the REJECT state! Why is that so? – Ramon Zarate Mar 24 '21 at 11:39
Oh, you are right about that. In general if you encounter 1111 in a word of length $\ge 5$ you can immediately reject since there is no way to have $2$ zeros in a window of $5$ symbols ending at the last $1$. However, if the word is exactly 1111 then you should still accept (since there are no blocks of 5 consecutive symbols). I'll edit my answer. – Steven Mar 24 '21 at 12:07

score 1 · Accepted Answer · answered Mar 24 '21 at 09:36

Let's try to understand the structure of words in your language.

Consider any word in the language whose length is at least $5$ (we can enumerate over shorter words separately). We can write it as $$ 1^{i_1} 0 1^{i_2} 0 \cdots 0 1^{i_m}. $$ Our assumption on the length implies that $m \geq 2$, since otherwise the word is in $1^*$, so has to be shorter than $5$ letters. You can check that a word of this form is in the language iff $i_j + i_{j+1} \leq 3$ for all $j$.

Let $r$ be a regular expression for all words of the form above (but without the requirements that the length be at least $5$ and that $m \geq 2$) satisfying additionally the condition $i_j \leq 2$. Then a regular expression for your language is $$ (0+1+\epsilon)^4 + (11100+\epsilon)r(0011100r)^*(00111+\epsilon). $$ Let $s$ be a regular expression for all words of the form above satisfying additionally the condition $i_j \leq 1$. A regular expression for $r$ is $$ r = 11 + (110+\epsilon)s(0110s)^*(011+\epsilon). $$ Finally, a regular expression for $s$ is $$ s = (0+10)^*(1+\epsilon). $$

There are algorithms for converting regular expressions to NFAs. But it's much easier to take the DFA in the other answer. — Yuval Filmus, Mar 24 '21 at 10:28
another question. According to the answer mentioned above, the strings 11, 111 and 1111 are supposed to be accepted, but that's not the case in your regular expression. Why is that the case? — Ramon Zarate, Mar 24 '21 at 10:50

Regular Expression and NFA: each block of five consecutive symbols contains at least two 0's

2 Answers2

Linked