Is this language $L$ context free?

Question

$L$ is a language combined with the symbols $\texttt{a}$, $\texttt{b}$ and $\texttt{c}$ given by:

$$ L = \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w \}.$$

I tried to prove that it is not context-free using the pumping lemma with strings $\texttt{a}^{n}\texttt{c}\texttt{b}^{n}$ or $\texttt{a}^{n}\texttt{bcb}\texttt{a}^{n}$ but it didn't work. My experience tells me that it should be context free since $$L'=\{ vw \mid v \neq w \}$$ is context-free, but I still cannot find a context-free grammar to generate it. Can anyone kindly give some ideas please?

@UmbQbify-Key20-: $a,b$, and $c$ are simply the members of the alphabet over which $L$ is defined. — Brian M. Scott, Jul 21 '20 at 04:28
@Tawcher Bro I just noticed that you reposted this question. I'm not sure if it was closed because of the formatting. In any case, you can find a MathJax tutorial here: https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference — Kevin Aquino, Jul 21 '20 at 04:34
@Kevin López Aquino I am not sure why the first one was closed, maybe the title is too complicated? But thanks for your advice! — Tawcher Bro, Jul 21 '20 at 05:02
Consider using L' for the second grammar, since it is not equal. — , Jul 21 '20 at 08:52
@JCAA besides someone citing the paper which proves the OP's claim about $L'$, I don't see progress. Rain1's approach is not working IMO. If you have answer, please share it. — Ingix, Jul 21 '20 at 16:09
This is not a trivial question. I do not have an answer. The only suggestion I have is post the question to a comp. sci. site. — markvs, Jul 21 '20 at 21:25

score 3 · Answer 1 · answered Jul 22 '20 at 14:36

I think I've found a grammar $G_L$ that produces the $L$ from the problem (apologies to rain1, your approach does seem to lead to a solution):

S::=E|U
E::=AbM|BaM
A::=ZAZ|aMc
B::=ZBZ|bMc
U::=ZUZ|MZc|cZM
M::=epsilon|MZ
Z::=a|b

Note I'm not in expert in language theory, I may have made an error.

A global overview:

Note that any terminal word created by $G_L$ will contain exactly one $\texttt{c}$. That's because the expansion rules make sure that in U and after E is expanded, exactly one of $A$, $B$ and $U$ are in a word, and the expansion rules never increase that number, and finally each such non-terminal is consumed by an expansion that introduces exactly one $\texttt{c}$.

E and U stand for Equal and Unequal number of symbols left and right of $\texttt{c}$. While the words created from U will encompass exactly the words with one $\texttt{c}$ and an unequal number of symbols to the left and right of that $\texttt{c}$, words from E will encompass all words $\{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w \text{ and }|v|=|w|\}$, and some more that are also generated from U.

Note that Z will evaluate to a exactly $1$ terminal symbol, so even during intermediate steps when we talk about number of symbols that will not change further when considering Z's. Also M is just $\{ \texttt{a, b} \}^*$.

Lemma 1: The terminal words created from U ($L_U$) are exactly $\{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } |v|\neq|w|\}$.

Proof: As long as U is expanded as ZUZ, the number of symbols to the left and right of U remain equal. If U is then expanded as MZ$\texttt{c}$, there is now an additional Z on the left of $\texttt{c}$, and the number of symbols on the right of $\texttt{c}$ cannot increase further (only Z's or their terminal expansions there). So no matter how further expansions happen, there will always be more symbols on the left of $\texttt{c}$ than on the right of $\texttt{c}$.

Expanding U as $\texttt{c}$ZM works exactly the same, but now there will always be more symbols on the right of $\texttt{c}$ than on the left. Both cases together show that $L_U \subseteq \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } |v|\neq|w|\}$.

Let now $v\texttt{c}w \in \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } |v|\neq|w|\}$, let $l=|v|, r=|w|, m=\min(l,r)$.

Start with $U$ and expand it $m$ times as ZUZ ($m=0$ is no problem). If $m=l$, expand U as $\texttt{c}$ZM , if $m=r$, expand it as MZ$\texttt{c}$. The proof will be totally the same/symmetric for $m=r$, so I'll only show the case $m=l$. Up to now, the intermediate word looks like this:

$$\underbrace{Z\ldots Z}_{l \text{ times}}cZM\underbrace{Z\ldots Z}_{l \text{ times}}$$

Now exapand M $(r-l-1)$ times as MZ (possible, since $l$ is the minimum of $l$ and $r$, and $l\neq r$, so $r-l-1 \ge 0)$, then finally expand M as epsilon. The resultuing word is

$$\underbrace{Z\ldots Z}_{l \text{ times}}c\overbrace{Z\ldots Z}^{1 + (r-l-1) +l \text{ times}} = \underbrace{Z\ldots Z}_{l \text{ times}}c\underbrace{Z\ldots Z}_{r \text{ times}}$$

You can now expand each $Z$ to get exaclty $v$ to the left of $\texttt{c}$ and $w$ to the right of it. That shows $L_U \supseteq \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } |v|\neq|w|\}$ and concludes the proof of Lemma 1.

Now let's look at the expansions of E. Again there is a symmetry here. Expansion rules for M and Z remain the same if we exchange $\texttt{a}$ and $\texttt{b}$. If we also exchange A and B, the expansions rules of $A$ and $B$ become the other one. Finally, those exchanges exchange one possible target of E with the other.

Lemma 2: For each terminal word generated from A$\texttt{b}$M, there exists a natural index $k \ge 1$ such that the $k$-th symbol from the beginning is $\texttt{a}$ and the $k$-th symbol after $\texttt{c}$ is $\texttt{b}$.

Proof: We'll prove that the mentioned $\texttt{b}$ in Lemma 2 can be the original $\texttt{b}$ from A$\texttt{b}$M. That means we don't care about the expansions of M. So the only thing to do is to expand A, we do that $t$ times as ZAZ ($t \ge 0$) and then once finally as $\texttt{a}$M$\texttt{c}$, resulting in the word

$$\underbrace{Z\ldots Z}_{t \text{ times}}aMc\underbrace{Z\ldots Z}_{t \text{ times}}b\ldots,$$

were 3 dots indicate whatever may have happened to the original M from A$\texttt{b}$M. As noted earlier, expanding any Z doesn't change the number of symbols.

In addition, in the above word, the only M (which can become none, one or many terminal symbols) is between $\texttt{a}$ and $\texttt{c}$ and possibly already expanded to the right of $\texttt{b}$.

But that means any further substitutions will still have the $\texttt{a}$ as the $(t+1)$-st symbol from the start and $\texttt{b}$ as the $(t+1)$-st symbol after $\texttt{c}$. That proves Lemma 2, with $k=t+1$.

By the symmetry mentioned before the proof of Lemma 2, the next Lemma follows immediately:

Lemma 3: For each terminal word generated from B$\texttt{a}$M, there exists a natural index $k \ge 1$ such that the $k$-th symbol from the beginning is $\texttt{b}$ and the $k$-th symbol after $\texttt{c}$ is $\texttt{a}$.

Now Lemmata 1,2 and 3 show one part of what we need to show, namely that

$$L(G_L) \subseteq \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w\} \tag1 \label{sub}.$$

Indeed, starting from S the expansions immediately lead to U, A$\texttt{b}$M or B$\texttt{a}$M. Lemma 1 shows that words derived from U can't even have the same number of symbols before and after $\texttt{c}$, so certainly $v \neq w$.

Lemma 2 and 3 show that any words derived from A$\texttt{b}$M and B$\texttt{a}$M, resp., can't have the same word before and after $\texttt{c}$ either, as there is some $k$ such that their $k$-th symbol is different.

What needs to be done is to prove the other direction of the inclusion! With that helps

Lemma 4: Any word in $\{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w \text{ and }|v|=|w|\}$ can be derived from E.

Proof: Let

$$u\texttt{c}w \in \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w \text{ and }|v|=|w|\}.$$

$v$ and $w$ can't both be the empty word epsilon, as that would imply $v=w$. Since $v$ and $w$ have the same length, being unequal then means there must be a natural index $k$ with $1\le k \le |v|$ where $v$ and $w$ differ.

If the $k$-th symbol is $a$ in $v$ and $b$ in $w$, we'll show that $v\texttt{c}w$ can be derived from A$\texttt{b}$M. If it's the other way around, it can be derived from B$\texttt{a}$M in an exactly analogous way.

Expand the A in A$\texttt{b}$M $(k-1)$ times as ZAZ, then expand it as $\texttt{a}$M$\texttt{c}$, resulting in the word

$$\underbrace{Z\ldots Z}_{k-1 \text{ times}}aMc\underbrace{Z\ldots Z}_{k-1 \text{ times}}bM.$$

Setting $l=|v|=|w|$, expand the M between $\texttt{a}$ and $\texttt{c}$ $(l-k)$ times as MZ, then finally as epsilon, to get the word

$$\underbrace{Z\ldots Z}_{k-1 \text{ times}}a\overbrace{Z\ldots Z}^{l-k \text{ times}}c\underbrace{Z\ldots Z}_{k-1 \text{ times}}bM.$$

Since $k \le l$, this is possible. Do exactly the same for the M at the end after the $\texttt{b}$ and we have derived the word

$$\underbrace{Z\ldots Z}_{k-1 \text{ times}}a\overbrace{Z\ldots Z}^{l-k \text{ times}}c\underbrace{Z\ldots Z}_{k-1 \text{ times}}b\overbrace{Z\ldots Z}^{l-k \text{ times}}.$$

We now have exactly $l$ symbols before and after the $\texttt{c}$. We can substitute the Z's to arrive at $v$ and $w$ respectively. We know what their $k$-th symbol is, all other symbols can be chosen freely, when expanding Z. This concludes the proof of Lemma 4.

Now Lemmata 1 and 4 mean we have proved the other inclusion

$$L(G_L) \supseteq \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w\} \tag2 \label{sup}.$$

because

$$\{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w\} = \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } |v|\neq|w|\} \cup \{ v\texttt{c}w \mid v, w \in \{ \texttt{a, b} \}^*\text{ and } v \neq w \text{ and }|v|=|w|\},$$

and Lemma 1 proves the first subset is countained in $L(G_L)$ and the Lemma 4 proves it for the second subset.

\eqref{sub} and \eqref{sup} together prove what I stated at the beginning:

$L(G_L)=L.$

score 2 · Answer 2 · answered Jul 21 '20 at 08:54

2

A grammar $L'$ is given in the paper [1]:

S ::= E|U|epsilon
E ::= AB|BA
A ::= ZAZ|a
B ::= ZBZ|b
U ::= ZUZ|Z
Z ::= a|b

Perhaps it could be modified for $L$ as follows:

E ::= AcB|BcA
U ::= ZUZ|Zc|cZ

Edit: Actually U is a bit harder to modify, it is the case where we have an odd length string. c needs to be able to appear anywhere in the string.

[1] https://pdfs.semanticscholar.org/a8dd/2ef009df7601cdbc90332765a56a24c7821c.pdf

answered Jul 21 '20 at 08:54

But, how would this grammar generate strings like 'aacbb' or 'aaaacbbbb' ? It seems to ignore the case when both w and v are even length strings. – Tawcher Bro Jul 21 '20 at 12:36
I think the problem is backwards. Your new rules doesn't seem to be able to create $aacab$, for example, mostly because $A$ and $B$ will always create an odd number of terminal symbols. That wasn't a problem in the original paper, as that meant the resulting string was even, and no $c$ symbol needed to be in the middle. Generally, the "even/odd" distinction doesn't really work here, as $accb$ has an even length, but still needs to be rejected, as it contains 2 $c$'s. – Ingix Jul 21 '20 at 12:51

Is this language $L$ context free?

2 Answers2