Build a regular grammar for a regular language

Question

The language considered is the infinite set of all chains that meet the following conditions.

Conditions:

1) They consist of symbols from the set {1,a,b}. 
2) They always start with the subchain '1a'.
3) They always include at least one subchain 'aa'.

For example:

1aa, 1abaa, 1aaab, 1aab1a, ... etc.

One regular expression for this language seems to be like this: $1a ((1+b)^* a)^* (a (1+b)^* )^* a (1+b+a)^*$

How to find a regular grammar for this language?

I've thought of many ways but it seems to be too complex for me. I tried the following as solution, but it is not correct, I guess.

G ({1,a,b}, {A,S}, P, S)
P:
S -> 1S|bS|aA
A -> 1A|bA|1a

The language chain is defined by the conditions above: every chain starts from a subchain '1a' only, every chain always contains at least 1 subchain 'aa' and is formed from symbols belonging to the set of {1,a,b} — Happy Torturer, May 15 '14 at 13:04
Because it won't allow the shortest chain '1aa'. (aa)* can be a void 'lambda', not? That means that the shortest chain for your regexp will be 1a. — Happy Torturer, May 15 '14 at 13:28
Sorry, my bad. I deleted my comment before I thought you'd see it. — Ratul Saha, May 15 '14 at 13:29
No problem. May be you have any ideas how to build a regular grammar? — Happy Torturer, May 15 '14 at 13:32
Your regular expression doesn't include $1aa$, which is in the language. — Rick Decker, May 15 '14 at 15:48
Why? My regexp is: 1a ((1+b)* a)* (a (1+b))* a (1+b+a)* ==> 1a lambda lambda a lambda ==> 1aa. — Happy Torturer, May 15 '14 at 15:57
No, I didn't. Look at the last parens at the end of that part of my expression in the main post. The last parens is shown with another font (the star just disappeared): a kind of technical problem of this site. — Happy Torturer, May 15 '14 at 16:16
1a ((1+b)* a)* (a (1+b)) a (1+b+a)* is currently written, in contrast to 1a ((1+b)* a)* (a (1+b))* a (1+b+a)* which has a star in the 3rd term — Luis Masuelli, May 16 '14 at 14:40
I've just put a space symbol in front of that parens and now the star-symbol is can be well seen. 1a ((1+b)* a)* (a (1+b)<space_here>)* a (1+b+a)*. If you could better help me with building that regular grammar... — Happy Torturer, May 16 '14 at 14:45
I edited your question (I should edit the first too, but it is very similar). You should look at the differences. It is not simply an issue of English, but also of logic. A language is usually defined as the set of all strings meeting some conditions. There are nearly always several regular expressions for a given regular language, so that you should say "a regular expression is ..." rather than "the regular expression is ...". The same is true for grammars or automata. — babou, May 16 '14 at 20:29
"There are nearly always several regular expressions for a given regular language" - that's great, it feels like fun on implementing some recursive function on Lisp in many different ways. But regular expressions are a bit harder to build. — Happy Torturer, May 16 '14 at 20:45
The accepted answer of this question is also applicable here. — FrankW, May 17 '14 at 07:54
FrankW, I didn't ask how to build a context-free grammar out of regular expression. I asked how to build a regular grammar out of regular expression. — Happy Torturer, May 17 '14 at 08:11
D.W., why should I prove the language is regular? I wanted to build a regular grammar. Your remark is also insignificant in this topic. — Happy Torturer, May 17 '14 at 08:15

score 4 · Answer 1 · edited Apr 13 '17 at 12:19

4

The expression always starts with 1a. There are two options, the third character in the string is a or not. If it is a, then you don't have the obligation of following 'it contains at least one a'. Otherwise, you need to. Thus the regular expression is:

    1aa(1+a+b)*+1a(1+a+b)*aa(1+a+b)* .

The conversion from regular expression to automata to regular grammar is standard. One possible algorithm is given here: https://math.stackexchange.com/questions/574571/build-regular-grammar-from-regular-expression

Here is an intuitive way to convert a regular expression to a right linear grammar:

Convert the regular expression to an NFA. This is standard (see page 102 of Automata Theory, language and computation, 3rd ed).
To convert the NFA to a right linear grammar, take the states to be non-terminal and the alphabets to be terminal. For a transition of the form S -> S' via a, add the production rule S -> aS'. Please note that you need to take care of the final states and epsilon transitions separately.

You can also convert the NFA to a DFA (again, standard subset construction method, see the aforementioned book) and then the method will be simpler.

edited Apr 13 '17 at 12:19

Community

1

answered May 15 '14 at 13:32

Ratul Saha

141
4

Your regexp is not correct. It will not allow 1abaa, for example. Still not clear for me how this stuff might pass through that algorithm that wasn't given in our theory. There has to be a clearer way to do it. – Happy Torturer May 15 '14 at 13:48
The 'dirty' way is to convert the regular expression to an automata (maybe a round of NFA->DFA) and then convert the DFA to a right linear grammar. I am sure any textbook in Theory of Computation includes all these steps. – Ratul Saha May 15 '14 at 13:51
So you don't know how to resolve the problem, I guess. – Happy Torturer May 15 '14 at 14:11
1

@user3470412 The regular expression does accept $1abaa$ (use the right-hand alternative, with the first $(1+a+b)^$ matching the $b$, $(aa)^$ (as written in Ratul's answer) matching $aa$ and the second $(1+a+b)^$ matching nothing. However, "$(aa)^$" should have been $(aa)$, with no Kleene star since the star allows zero repetitions, which lets in strings that do not contain $aa$. I've edited the answer to fix this. – David Richerby May 15 '14 at 15:17
Ok, but this still doesn't resolve my question. I have my own regexp which I suppose to be correct, but still I have no answer to my question. – Happy Torturer May 15 '14 at 15:29
@DavidRicherby, thanks for the edit. @ OP, I updated the answer to reflect your question. – Ratul Saha May 15 '14 at 15:38
@user3470412. Actually, you do have an answer to your question. It's found in the link that Rahul provided. – Rick Decker May 15 '14 at 18:45

score 2 · Answer 2 · answered May 16 '14 at 15:49

I suggest you approach this problem completely differently. The easiest way to get a regular grammar is to start from a DFA, not a regular expression. Can we make a DFA? Easily:

q    s    q'
---  ---  ---
q0   1    q1
q0   a    q2 // dead
q0   b    q2 // dead
q1   1    q2 // dead
q1   a    q3
q1   b    q2 // dead
q2   1    q2 // dead
q2   a    q2 // dead
q2   b    q2 // dead
q3   1    q4
q3   a    q5 // accepting
q3   b    q4
q4   1    q4
q4   a    q6
q4   b    q4
q5   1    q5 // accepting
q5   a    q5 // accepting
q5   b    q5 // accepting
q6   1    q4
q6   a    q5 // accepting
q6   b    q4

Basically, in pseudocode:

read the next symbol
if a '1', then
    read the next symbol
    if an 'a', then
        read the next symbol
        if an 'a', then
            read all remaining symbols
            accept
        otherwise, then
            while there are still symbols, do
               read the next symbol
               if an 'a', then
                   read the next symbol
                   if an 'a', then
                       read all remaining symbols
                       accept
                   endif
               endif
            loop
        endif
    endif
endif
reject

Once you have a DFA, getting the regular grammar is easy:

One non-terminal symbol for each state in the DFA.
One production of the form q := sq' (where q, s and q' are as in the table above)
One production of the form qA := <empty> for every accepting state qA.

If your definition of regular grammars doesn't allow productions leading to the empty string, replace rule 3 with:

One production of the form q := s whenever you have, from the second rule, a production of the form q := sq' where q' is an accepting state.

The one-to-one nature of regular grammars and DFAs has another implication:

Finding a regular grammar from a NFA (resp. regular expression) must be at least as hard as finding a DFA from a NFA (resp. regular expression).

So you lose nothing by going for the DFA first.

score 2 · Accepted Answer · edited Apr 13 '17 at 12:48

This language was already considered by the same OP in another question Build a regular expression to define a regular language. But then, posters are told not to ask two questions in the same post.

I already answered that a simple method is to consider the regular languages defined by conditions 1 and 2, $1a(1+a+b)^*$, and by conditions 1 and 3, $(1+a+b)^*aa(1+a+b)^*$.

You take the FA for them, both having 3 states (4 if you count a dead state), and construct the FA for the intersection. Which is quite easy, and gives a five-states automaton (6 states if you count a dead state).

From that FA you can get a regular expression for the language, which is $1(a+a(1+a+b)^*a)a(1+a+b)^*$.

But you can also get a regular grammar:

$S \rightarrow 1T$
$T \rightarrow aX \mid aY$
$X \rightarrow 1X \mid aX \mid bX \mid aY$
$Y \rightarrow a \mid aZ$
$Z \rightarrow 1 \mid a \mid b \mid 1Z \mid aZ \mid bZ$

which derives directly from the automaton.

In more details: Why do it as I did above?

Proving that a regular expression defines the language specified by a set of such conditions can be long and tedious. The same is true for a grammar or an automaton. This is why the proper way of answering such a question is to find a systematic way of building the desired result from elementary components for which things are obvious to prove.

Here, as I had already answered, elementary components are the regular languages defined respectively

by conditions 1 and 2: $1a(1+a+b)^∗$ ;
by conditions 1 and 3: $(1+a+b)^∗aa(1+a+b)^∗$ .

For each of these 2 languages, it is easy to find the regular expression (above) or to give a regular grammar or a finite-state automaton (FA). Each has a 3 states FA that recognizes it (4 states including a dead state).

To get a FA for the language meeting all three conditions, you only have to get the intersection of these two languages. There is a standard cross-product construction for building a FA recognizing the intersection of two regular languages. This construction is easy to apply by hand in our example. It produce a 5 states FA for the language defined by all three conditions (or a 6 states FA, including a dead state).

From this last FA, it is easy to build either the regular expression or the regular grammar given above.

Since the answer was built in a systematic way using well known techniques (that have been proved correct long ago), there is no need to prove that the resulting regular expression, grammar or FA are correct. Barring of course mistakes in the use of the constructions ... but then, you can also make mistakes in proofs.

I added the precision about dead states so that no one gets confused. But, except for some constructions, it is often simpler just to omit dead states and transitions leading to them.

Thank you, this is great! Just checked it using software I implemented and all the generated chains have '1a' in the beginning and include 'aa' subchain. It's correct! — Happy Torturer, May 16 '14 at 20:16
Thank you for the complete answer and interest to my question! — Happy Torturer, May 17 '14 at 08:35

Build a regular grammar for a regular language

3 Answers3

Linked

Related