4

I have been following the book Introduction to Automata Theory, Languages, and Computation by John E. Hopcroft and Jeffery D. Ullman.

I came across the following topic titled Bad Case for Subset Construction (2.3.6). I cannot follow the example given over there, about NFA N that can accept strings with 1 at the $n^{th}$ position, and that the DFA formed from that NFA thereafter will have no equivalent with fewer than $2^n$ states.

They argue that The DFA $D$ must be able to remember last n symbols it has read. Since any of the $2^n$ subsets of the last n symbols could be 1, if D had fewer than $2^n$ states, then there would be some state $q$ such that $D$ can be in state $q$ after reading two different sequences of n bits, say $a_{1}a_{2}...a_{n}$, and $b_{1}b_2...b_n$.

Here is an extract from the book itself:

A Bad Case for the Subset Construction

I have been trying to comprehend the proof, including this line and the subsequent paragraph that follows, but I have not been able to.

Can someone please explain the approach ?

nerdier.js
  • 221
  • 3
  • 9
  • What specifically are you confused about? What's the first part that you find confusing? When you mention "this line", which line are you referring to? We prefer you ask a question about a specific aspect of the writing, not just ask "can you explain the whole thing to me?" -- if you didn't understand the explanation in the book, I worry we could write up another long explanation and you might just say "sorry, I didn't understand that either". The more you give us to work with, the more likely we can give you useful answers. – D.W. Mar 23 '16 at 22:39

2 Answers2

5

The proper way to do this proof is using Myhill–Nerode theory, which I will now explain. For a regular language $L$, say that $x,y$ are inequivalent if there exists a word $z$ such that $xz \in L$ and $yz \notin L$, or such that $xz \notin L$ and $yz \in L$. In every DFA accepting $L$, the state that is reached after reading $x$ is different from the state that is reached after reading $y$ (exercise). Hence, if we find a collection $x_1,\ldots,x_M$ of words such that $x_i,x_j$ are inequivalent for all $i \neq j$, then every DFA for $L$ must contain at least $M$ states (exercise).

For your language $L = \Sigma^* 1 \Sigma^{n-1}$, you can take as your collection of words the set of all words of length $n$. If $x,y$ are two different words of length $n$ then $x_i \neq y_i$ for some $i$; say $x_i = 0$ and $y_i = 1$. Then $x 0^{i-1} \notin L$ (since the $n$th from last bit is $x_i$ = 0), whereas $y 0^{i-1} \in L$ (since the $n$th from last bit is $y_i$ = 0). This completes the proof.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
  • Thanks for this precise answer. The book writes so much because, they did not use the term Myhill - Nerode anywhere in the entire text. Even in the section 4.4 Equivalence and Minimization of automata they use the term table filling algorithm but simply avoids the term Myhill - Nerode. Theorem 4.20: If two states are not distinguished by the table filling algorithm, then the states are equivalent. this is what they say... – Abhishek Ghosh Mar 11 '21 at 13:28
4

The text is arguing as Yuval does; they're just hiding any mention of Myhill-Nerode. Here's an unpacking of their argument:

There are clearly $2^n$ possible length-$n$ input strings over $\{0,1\}$. Assume for the sake of argument that there is a DFA, $D$ that accepts the "$n$-th from the end is $1$" language and that $D$ has $m<2^n$ states. We don't have enough states to assign a unique one to every possible length-$n$ input, so there must be some state $q$ where we wind up after reading two different strings $a_1a_2\dotsm a_n$ and $b_1b_2\dotsm b_n$.

Since $a_1a_2\dotsm a_n\ne b_1b_2\dotsm b_n$, they must differ in at least one place, say $a_i\ne b_i$. Suppose without loss of generality that $a_i=1$ and $b_i=0$.

If $i=1$, then since $1a_2a_3\dots a_n$ has its $n$-th character from the last equal to $1$, state $q$ must be an accept state. But since $0b_2b_3\dots b_n$ has its $n$-th character from the last not equal to $1$, state $q$ must be a non-accept state, a contradiction.

If $i=2$, we're in state $q$ after reading $a_11a_3\dotsm a_n$ and also after reading $b_10b_3\dotsm b_n$. Suppose that from state $q$ on input $0$ we pass to state $p$, then we'll be in state $p$ after reading $a_11a_3\dotsm a_n0$ and also after reading $b_10b_3\dotsm b_n0$. In the former case, $a_11a_3\dotsm a_n0$ has its $n$-th character from the end equal to $1$ so $p$ must be an accept state, but $b_10b_3\dotsm b_n0$ does not so $p$ must not be an accepting state, again a contradiction.

Continuing, it's easy to see that no matter where the two inputs differ, by appending $i-1$ zeros (or anything else) to the two strings we'll find ourselves in a state which must be both accepting and non-accepting, so our original assertion, that $D$ has fewer than $2^n$ states, must have been false, which is just what we wanted.

Rick Decker
  • 14,826
  • 5
  • 42
  • 54