Find regular expression for a given language

Question

I need to find a regular expression for the following language:
$$ \Sigma = {\{a,b,c}\} $$
define $L$ to be the language of all words over $\Sigma$ that contain the substring $aba$ odd number of times.

Any help is welcome, I would also like a tip of how do I even start question like this.
Thanks!

I tried to start like this:
We need atleast 1 $aba$, before him, we can have any expression that does not contain $aba$ and does not finish with $ba$, If I call this expression $R$, the solution should be something like:
$$R^*aba[R^*abaR^*aba]^*R^*$$

is this correct? if so, I cant think of this $R$.

Does the string ababa contain aba once or twice? – sdcvvc Mar 29 '15 at 17:52 — sdcvvc, Mar 29 '15 at 17:52

Brian M. Scott · Answer 1 · 2015-04-01T16:33:18.420

1

Revised.

Your idea almost works.

The symbol $b$ is the key to writing a regular expression that does not contain $aba$: $b$ can only appear initially, finally, and in the environments $bb$, $abc$, $cba$, and $cbc$. That is, each $b$ must be the first symbol of the word, the last symbol of the word, part of a block $abc$, or immediately preceded by a $b$ or a $c$. Let’s ignore the first two possibilities for a moment. Then each of these blocks can be preceded by any string of $a$s and $c$s, and the last one can be followed by any string of $a$s and $c$s. Thus, still ignoring the possibility of a single initial or final $b$, we have

$$(a+c+bbb^*+bc+cb)^*\;.$$

(I use $bbb^*$ instead of just $bb$ in order to get strings of odd numbers of $b$s.) Call this regular expression $R_0$. It can be preceded by $b$ or followed by $ab$, so

$$R_1=(b+\lambda)R_0(ab+\lambda)$$

covers all possibilities: anything else that ends in $b$ is already covered by $$(b+\lambda)(a+c+bbb^*+bc+cb)^*\;.$$ (I use $\lambda$ for the empty word.)

To get a word that contains $aba$ an odd number of times, we can certainly start with $R_1$, any word that contains $aba$ zero times, and this certainly has to be followed by $\color{brown}{aba}$. Now an $R_1$ word could end in $ab$, but this isn’t actually a problem if $ab\color{brown}{aba}$ counts as only one copy of $aba$. However, we can’t simply follow this by another $R_1$ word, because it might start with $ba$, and in that case we could get $ab\color{brown}{aba}ba$, with two copies of $aba$. To avoid this, let

$$R=ababaR_0+aba(b+\lambda)R_0\;,$$

and let’s try

$$(b+\lambda)R_0R(RR)^*\;.$$

Each $R$ generates at least one $aba$, and since $R_0$ cannot generate anything beginning $ba$ or $aba$, each $R$ generates only one $aba$. The first term of $R$ allows for $R_1$ strings that end in $ab$.

edited Apr 01 '15 at 16:33

answered Mar 30 '15 at 11:46

Brian M. Scott

616,228

Hi Brian, thanks! I can't figure out how the string bbabbaba is accepted here? – Genadi Mar 31 '15 at 12:48
@Genadi: You’re welcome! It’s covered by $R_0aba$, with $R_0$ covering the $bbabb$ part: no instance of $a+c$, then $bb$ from $bbb^$, then $a$ from $(a+c)^$, then $bb$ from $bbb^*$ again. – Brian M. Scott Apr 01 '15 at 03:15
Oh thats right! I am new to regular expressions. one last question - if you avoid from $R$ to begin with $b$ using $R_1$ how will you accept string like $ababc$ ? – Genadi Apr 01 '15 at 14:31
@Genadi: Very good catch: it wasn’t accepted with what I had. I’ve now made a small change to accept it, by putting $R$ at the end of the string instead of $R_1$. Now we have $$(b+\color{brown}{\lambda})R_1\color{brown}{aba}(R_1abaR_1aba)^* (\color{brown}b+\lambda)\color{brown}{R_0}(b+\color{brown}{\lambda});,$$ where the brown parts are used to match $ababc$. I had overlooked the fact that the part before the final $R$ must end in an $aba$ that gets counted, which means that what follows it can start $ba$: the $a$ before that $b$ is already in use, as in the string $ababa$. – Brian M. Scott Apr 01 '15 at 15:04
I think there is still a problem with $R_1$ - for example the string $ababcababcaba$ is not accepted. – Genadi Apr 01 '15 at 15:43
I have come up with an idea, based on your way of think - what do you think about it ? $R=b^(a+c+bbb^+bc+cb)^$ and the final regex will be - $Raba(RabaRaba)^R$ - my idea was preventing from R to end with ab. – Genadi Apr 01 '15 at 15:59
@Genadi: I was also thinking along those lines, and I believe that I’ve finally ironed out the details. The problem is avoiding accidentally generating $abababa$ strings, since $aba$-free strings can end in $ab$ and can also begin with $ba$. There may be an easier way than mine to do this, but I think that it works now. – Brian M. Scott Apr 01 '15 at 16:35

Find regular expression for a given language

1 Answers1