7

I have a regular language consisting of such {0,1}^k sequences, in which every subsequence of length 5 has at least two 0's in it (which also means every sequence of length <5 is 'good'). I need to find a DFA which generates this language.

I can't say I tried everything, but each time I encountered the same problem, that is it always generated a subsequence of the form 11111... (I tried incorporating each subsequence of length 5 with two 0's in my DFA so that the last element returned to the beginning of the automaton, but this resulted in accepting 'bad' subsequences - for instance 00111 and 11100 were 'good', but 0011111100 wasn't).

Is there any way to do it?

Raphael
  • 72,336
  • 29
  • 179
  • 389
Jules
  • 337
  • 2
  • 13
  • 5
    Hint: use the states of the automaton to "remember" the last five input characters you saw. – David Richerby Mar 01 '15 at 20:07
  • 2
    @DavidRicherby Actually the states need to remember only the last four characters you saw. The fifth character will decide just to go on or to fail. – babou Mar 01 '15 at 20:16
  • 2
    @babou Good point. But computer science wouldn't be computer science without off-by-one errors. :-) – David Richerby Mar 01 '15 at 20:23
  • Off by 1 is not an error, only a typo. This exercise seems a bit smarter than the usual ones, with its sliding memory. That is a change. – babou Mar 01 '15 at 20:27
  • Hint: write the language as regular expression, transform. – Raphael Mar 02 '15 at 00:01
  • @Raphael I did not get it – babou Mar 02 '15 at 10:02
  • @babou Well, the hint may not point towards the easiest way (which is a counting automaton, a usual pattern). But if the OP has trouble with automata but is fluent in regular expressions, it's a valid approach. – Raphael Mar 02 '15 at 11:27
  • I still have absolutely no idea how to do it - I tried once again, but my solution didn't take 1111 into consideration... – Jules Mar 03 '15 at 05:58
  • @Jules My questions to you: (1) Can you draw DFA where each sub sequence of length 5 contains at-least one 0? -- if no, first try it, if yes then -- tell me ten smallest strings belongs language in your question. – Grijesh Chauhan Mar 03 '15 at 12:48
  • I'm trying, I'm trying – Jules Mar 03 '15 at 16:40
  • @Jules ok I will post an answer to demonstrate you to solve such problems. – Grijesh Chauhan Mar 04 '15 at 06:33
  • I'd be very grateful. I feel as though I was still missing something... (namely the proper solution) – Jules Mar 04 '15 at 16:47
  • Thanks for the upvote, Jules. Would it be okay with you if I edited your question so that it matched the example I gave? Doing so would make life a touch easier for subsequent visitors with the same question. If you want to keep your post unchanged, that'll be fine too. – Rick Decker Jun 20 '15 at 14:30
  • Sure, my post is yours to edit ;) – Jules Jun 20 '15 at 18:22

1 Answers1

11

There is a standard technique for problems like this, where the language can be specified in the form

All words over some alphabet $\Sigma$ where [some condition] is satisfied by all contiguous substrings of a fixed length, $k$.

The key idea here is to define a collection of states $s_i$ where $0\le i \le |\Sigma|^k$. Basically, you are using the states to "remember" the most recently-seen $k$ characters. These states will be part of the finite automaton we'll build for our language.

In what follows, I'll use a simpler language than yours, so that the FA will be sufficiently small to fit here and still be readable. Let $$ L=\{w\in\{0, 1\}^*\mid \text{every length-3 substring of $w$ contains at least 2 zeros}\} $$ We'll define states $s_0, s_1, \dotsc, s_7$ to represent the three most recently-seen characters. It's convenient to do this representation in lexicographic order, so $s_0=\mathtt{000}$, $s_1=\mathtt{001}$, $s_2=\mathtt{010}$, $s_3=\mathtt{011}$, $s_4=\mathtt{100}$, $s_5=\mathtt{101}$, $s_6=\mathtt{110}$, and $s_7=\mathtt{111}$. In other words, $s_N$ will correspond to the 3-bit binary representation of $N$.

Now we'll add some more states to get from the start state to the $s$'s. All it takes is to construct a complete tree (binary in this example) having the $s$ states as leaves, like this: enter image description here

All of the states in the FA above will be accepting states, except for the darkened ones. The $p_i$ states are all accepting, since we haven't yet seen a length-3 string, so we haven't violated the condition that defines the language $L$ and only the patterns $\mathtt{011}, \mathtt{101}$, $\mathtt{110}$, and $\mathtt{111}$ violate the condition of the language. These correspond to states $s_3, s_5, s_6, s_7$. Once we've entered one of these states, we reject the input word, so we might as well merge these into a single "dead" state, $d$, from which there will be no exit.

We're almost done; all that remains is to fill in the transitions among the $s_i$. Because of the way we chose the representations of the states, that's an easy task. If we're in state $s_i$, having just seen characters $b_1b_2b_3$, on input $b$ we'll pass to the state $s_j$, corresponding to the new pattern $b_2b_3b$. If you think about it for a moment, you'll see that $\delta(s_i, 0) = s_j$ where $j\equiv 2i\pmod8$ and $\delta(s_i, 1) = s_j$ where $j\equiv 2i+1\pmod8$. Thus, we'll have the following $s$ transitions (recall that we merged states $s_3, s_5, s_6, s_7$ into a single dead state, $d$): $$\begin{array}{c|cc} & 0 & 1 \\ \hline s_0 & s_0 & s_1 \\ s_1 & s_2 & d \\ s_2 & s_4 & d \\ s_4 & s_0 & s_1 \\ d & d & d \end{array}$$

completing the construction. The nice feature of this technique is that it's almost completely mechanical: the only hard part is making the $s$ transitions. The downside is that it produces a pretty big FA. Even in this simple example, we wound up with a 12-state FA, and if we had needed to look at the five most recent characters, we'd have a FA with 31 states just for the $p$s. In fact, if we apply the standard technique for DFA minimization, we would find that the minimal-state DFA for this example language would require only 7 states, with 6 of them being final.

Rick Decker
  • 14,826
  • 5
  • 42
  • 54