1

Let's say our alphabet is 0 and 1.

How would you approach writing a regex to generate a language of words that do NOT contain 101.

The only regex operators allowed are concatenation , star * and OR |.

I am stuck now and I want to know if anybody has an idea.

yoyo_fun
  • 818
  • 1
  • 7
  • 17
  • 1
    make a finite state automaton for the language, and from that derive an expression. – Hendrik Jan Nov 26 '16 at 16:36
  • @HendrikJan isn't there a more simple rationalization that could be applied ? I am suppoae to do this without knowing what finite state automaton is. We did not reach that part yet. – yoyo_fun Nov 26 '16 at 16:44
  • This is already answered at our reference question on the subject, http://cs.stackexchange.com/q/1331/755. See specifically http://cs.stackexchange.com/a/44075/755, http://cs.stackexchange.com/a/11051/755, http://cs.stackexchange.com/a/10984/755. Also, very similar questions have been asked and answered before: http://cs.stackexchange.com/q/29933/755, http://cs.stackexchange.com/q/49047/755, http://cs.stackexchange.com/q/44181/755, http://cs.stackexchange.com/q/16735/755, http://cs.stackexchange.com/q/14837/755, http://cs.stackexchange.com/q/46988/755. Please search before asking. – D.W. Nov 26 '16 at 18:49

1 Answers1

3

If a string does not contain $101$, then every $10$ either terminates the string or is followed by $0$. So if $r$ is a regular expression for all strings not containing $10$, then a regular expression for all strings not containing $101$ is $$ (r100)^*r(10+\epsilon). $$ You take it from here. Note that there are also other solutions.


Complementing a regular expression can be a very costly operation. For example, the set of strings over $\{1,\ldots,n\}$ not containing all symbols has a regular expression of length $O(n^2)$ (exercise), but its complement requires a regular expression of exponential size $\Omega(2^n)$. This follows from the "fooling set" lower bound on NFAs, and the fact that regular expressions can be converted into NFAs with constant multiplicative overhead.

A fooling set for a language $L$ is a collection of pairs $(x_i,y_i)$ such that $x_iy_i \in L$ but for $i \neq j$, either $x_iy_j$ or $x_jy_i$ is not in $L$. If $L$ has a fooling set of size $n$ then every NFA for $L$ has size $n$ (exercise). In the case of all strings over $\{1,\ldots,n\}$ containing all symbols, there is a fooling set of size $2^n$: $\{(S,\overline{S}) : S \subseteq \{1,\ldots,n\}\}$, where we identify a set of symbols with the word containing these symbols in order.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503