Regular expression for all strings not containing $aba$

Question

This is my first post here. We are currently studying regular expressions, and I have been tasked to write a regular expression for the language of all words which do not contain the substring $aba$, for the alphabet $\Sigma=\{a,b\}$.

We were firstly tasked to write a regular expression for all words which do contain the substring $aba$, and I came up with:

$$(a+b)^*aba(a+b)^*$$

However, I don't know how to write the second one because I can't think of a way to formalize something which cannot be included in the regex.

https://cs.stackexchange.com/q/45570/755 – D.W. May 30 '22 at 19:00 — D.W., May 30 '22 at 19:00

score 2 · Answer 1 · answered Apr 14 '22 at 16:19

A word doesn't contain $aba$ if after every $ab$, the word either terminates or contains $b$. Imagine that you start reading your word from left to right. Denoting by $\newcommand{\eos}{\#}\eos$ the "end of string" symbol, one of the following must be a prefix of your string: $$ \eos \\ a\eos,aa\eos,aaa\eos,\ldots \\ ab\eos,aab\eos,aaab\eos,\ldots \\ abb,aabb,aaabb,\ldots \\ b $$ Furthermore, each of these prefixes $p$ not ending with $\eos$ satisfies the following: a word $w$ doesn't contain $aba$ iff $pw$ doesn't contain $aba$. This leads to the following unambiguous regular expression: $$ (a^+bb + b)^*(\epsilon + a^+ + a^+b) $$ You can simplify it further if you're fine with ambiguous regular expressions; I leave such simplifications for you to ponder, if you are so inclined.

Zulfiqar Chaudhry · Answer 2 · 2022-07-25T13:38:53.797

1

Think of all the possible combinations you can make which are not aba:

Whenever we get "ab" we must either end the string or add a "b" by force: a⁺bb
If we are starting from b then we can append as many a's as we want at the end: b⁺a^*bb
Joining both together: ( a⁺bb + b⁺a^*bb )* a*b*
The a* at the end is for the edge case where we have all a's or when we have ab.

edited Jul 25 '22 at 13:38

answered Jul 25 '22 at 12:16

Zulfiqar Chaudhry

11
2

score 1 · Answer 3 · answered Feb 01 '24 at 10:43

Such a word contains atleast 2 consecutive $b$'s whenever a $b$ occurs in the middle of the word, or the word ends with a single $b$. We thus replace the language of all words made of some number of $a$s or $b$s, represented by $(a^\ast + b^\ast)^\ast$, with the language of words made of $a$s or atleast two $b$'s, which is $$(a^\ast+ bb b^\ast)^\ast$$ However, we can optionally have a single $b$ at the end or the beginning, so we add that as $$ (\varepsilon+b) \cdot (a^\ast+ bb b^\ast)^\ast \cdot (\varepsilon+b)$$

Regular expression for all strings not containing $aba$

3 Answers3