2

Consider the alphabet {0, 1}. How do I find the regular expression for the set of all strings such that each block of five consecutive symbols contains at least two 0's?

Here, by block I mean a given set of concurrent substrings. For example, for a string "0011010", the first five characters are "00110", which satisfies the above condition. The next block leaves out only the first character of the given string, giving me "01101", which again satisfies the condition.

I also need help in converting the regular expression into an NFA.

Ramon Zarate
  • 23
  • 1
  • 3

2 Answers2

2

The states of your DFA are pairs of integers $(i,j)$ with $1 \le i \le j \le 5$, plus a $\mathrm{REJECT}$ state. Intuitively state $(i,j)$ means that the two most recent $0$s have been encountered when $i$-th to last and $j$-th to last character were read.

The transition function $\delta$ satisfies $\delta(\mathrm{REJECT}, x)=\mathrm{REJECT}$ for $x \in \{0,1\}$ while, for any other state $(i,j)$, $\delta$ is defined as follows: $$ \delta( (i,j), x) = \begin{cases} (1, i+1) & \mbox{if } x=0 \mbox{ and } i<5\\ \mathrm{REJECT} & \mbox{if } x=0 \mbox{ and } i=5\\ (i+1, j+1) & \mbox{if } x=1 \mbox{ and } j<5\\ \mathrm{REJECT} & \mbox{if } x=1 \mbox{ and }j=5. \end{cases} $$

The initial state is $(1,1)$. All states, except for $\mathrm{REJECT}$, are accepting states.

Steven
  • 29,419
  • 2
  • 28
  • 49
  • What would be the regular expression in such a case? – Ramon Zarate Mar 24 '21 at 05:52
  • There is an algorithm for converting DFAs to equivalent regular expressions. – Yuval Filmus Mar 24 '21 at 08:49
  • @YuvalFilmus I tried it, but I am not able to solve it for multiple accepting states. Thanks for suggesting! – Ramon Zarate Mar 24 '21 at 10:22
  • @Steven upon reviewing your DFA, I found 1111 to be going to the REJECT state! Why is that so? – Ramon Zarate Mar 24 '21 at 11:39
  • Oh, you are right about that. In general if you encounter 1111 in a word of length $\ge 5$ you can immediately reject since there is no way to have $2$ zeros in a window of $5$ symbols ending at the last $1$. However, if the word is exactly 1111 then you should still accept (since there are no blocks of 5 consecutive symbols). I'll edit my answer. – Steven Mar 24 '21 at 12:07
1

Let's try to understand the structure of words in your language.

Consider any word in the language whose length is at least $5$ (we can enumerate over shorter words separately). We can write it as $$ 1^{i_1} 0 1^{i_2} 0 \cdots 0 1^{i_m}. $$ Our assumption on the length implies that $m \geq 2$, since otherwise the word is in $1^*$, so has to be shorter than $5$ letters. You can check that a word of this form is in the language iff $i_j + i_{j+1} \leq 3$ for all $j$.

Let $r$ be a regular expression for all words of the form above (but without the requirements that the length be at least $5$ and that $m \geq 2$) satisfying additionally the condition $i_j \leq 2$. Then a regular expression for your language is $$ (0+1+\epsilon)^4 + (11100+\epsilon)r(0011100r)^*(00111+\epsilon). $$ Let $s$ be a regular expression for all words of the form above satisfying additionally the condition $i_j \leq 1$. A regular expression for $r$ is $$ r = 11 + (110+\epsilon)s(0110s)^*(011+\epsilon). $$ Finally, a regular expression for $s$ is $$ s = (0+10)^*(1+\epsilon). $$

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503