1

I was wondering how to determine (with proof) whether the context-free language generated by the following context-free grammar $G$ is regular, where $S$ is the start variable and $a$, $b$ are the non-terminals.

$G:\ S\to aABb,\ A\to BAS \mid\epsilon,\ B\to a \mid b.$

It doesn't seem easy to determine exactly what set of strings the given language represents (e.g. there isn't some description like $\{a^i b^i : i\ge 0\}$ or the complement of some set that's easy to describe without explicitly listing the rules of a CFG).

The strings of $G$, where $G$ is the context-free grammar, seem to always end in some number of $Bb$'s and start with some number of $A$'s. Obviously every generated string must end in a b. There might be some relationship between the number of a's and b's of the language (e.g. maybe if a b appears then a certain number of a's must follow the b. The intuition is that the grammar doesn't generate strings with too many b's).

I think the language might be non-regular; the rules don't look like they'll translate to a regular expression easily. Two ways of showing a language is nonregular is to show that it fails the pumping lemma or it has infinitely many equivalence classes. Suppose $n$ is a pumping length for the language. The string $a(a^{3n} b^n)ab$ is in the language for every $n\ge 0$. It seems hard to verify whether a string is not generated by the grammar in cases other than the obvious ones (e.g. $b$ is not at the end of the generated string).

John L.
  • 38,985
  • 4
  • 33
  • 90
Fred Jefferson
  • 299
  • 1
  • 8

1 Answers1

2

As you suspected, $L(G)$ is nonregular.

A common strategy to prove a language $X$ is nonregular is to find a regular language $Y$ such that $X\cap Y$ is nonregular. When $X$ is context-free, this strategy becomes more attractive since $X\cap Y$ is still context-free, indicating that it might be easy to guess, understand, and verify $X\cap Y$. We can consider $Y$ among some simple regular languages, hoping $X\cap Y$ is nonregular.

Check this question for a couple of examples.


Claim: $L(G)\cap L(a^*b^*)=\{a^{2n-1}b^{2n}\mid n\ge1\}\cup\{a^{2n}b^{2n-1}\mid n\ge1\}$.

Proof: Substituting $BAS \mid \epsilon$ for $A$ in rule $S\to aABb$, we obtain a context-free grammar $G_1$ that is equivalent to $G$, $$G_1: \ S\to aBASBb \mid aBb, \ A\to BAS \mid \epsilon, \ B\to a \mid b.$$

Let us investigate $G_1$.
Suppose the derivation $S\to aBASBb$ will end up with a string in $L(a^*b^*)$. For any string in $L(a^*b^*)$, any symbol before an $a$ in that string must be an $a$ and any symbol after a $b$ in that string must be a $b$. Note that $S$ in $G_1$ will generate strings that start with $a$ and end with $b$. Look at the right hand side of $S\to aBASBb$.

  • $aBA$, which is before $S$ can only generate $a$'s. That means
    • the $B$ here can only use rule $B\to a$.
    • the $A$ here cannot derive $BAS$; otherwise, $aBA$ will generate at least one $b$ as $S$ will generate a string that ends with $b$. So the $A$ here can only derive $\epsilon$.
  • the last $B$, which is after $S$ can only generate $b$'s. That means the last $B$ can only derive $b$.

The analysis above means that $L(G_1)\cap L(a^*b^*)=L(G_2)$, where $$G_2: \ S\to aaSbb \mid aBb, \ A\to BAS \mid \epsilon, \ B\to a \mid b.$$ Substituting $a\mid b$ for $B$ in rule $S\to aBb$, we obtain grammar $G_3$ that is equivalent to $G_2$, $$G_3: \ S\to aaSbb \mid aab \mid abb, \ A\to BAS \mid \epsilon, \ B\to a \mid b.$$ Since all useful rules in $G_3$ are $S\to aaSbb \mid aab \mid abb$, it is easy to see $L(G_3)=\{a^{2n-1}b^{2n}\mid n\ge1\}\cup\{a^{2n}b^{2n-1}\mid n\ge1\}.$ Note that $$L(G)\cap L(a^*b^*)=L(G_1)\cap L(a^*b^*)=L(G_2)=L(G_3).$$


Corollary: $L(G)$ is not regular.

Proof: $$L(G)\cap L(a^*\left(b^2\right)^*)=(L(G)\cap L(a^*b^*))\cap L(a^*\left(b^2\right)^*)=\{a^{2n-1}b^{2n}\mid n\ge1\}.$$ Since $L(a^*\left(b^2\right)^*)$ is regular but $\{a^{2n-1}b^{2n}\mid n\ge1\}$ is not regular, $L(G)$ is not regular.

John L.
  • 38,985
  • 4
  • 33
  • 90