5

As I am currently teaching regular languages and pumping lemma, I was searching for nice examples of languages, regular or not, for exercises.

  • $L_1 = \{vv\mid v\in \Sigma^*\}$ is a classic example, as it can be proven to be non regular for $|\Sigma|\geqslant 2$ using the pumping lemma;
  • $L_2 = \{ uvv\mid u,v\in\Sigma^*\}$ looks like $L_1$, however it is a bit of a trap: since $L_2 = \Sigma^*$, it is regular;
  • $L_3 = \{uvv\mid u\in\Sigma^*, v\in\Sigma^+\}$ looks like $L_2$, however it is not regular if $|\Sigma| \geqslant 2$. This can be proven using the pumping lemma: suppose it is regular, and let $n$ be the pumping length. Let $u=a^nba^nb\in L_3$ and $u = xyz$ a decomposition such that $|xy|\leqslant n$ and $|y| > 0$. Then $xz = a^kba^nb$ with $k< n$ and does not end with a (non empty) square, therefore is not in $L_3$ which is not regular;

Things became a bit complicated with $L_4 = \{uuv\mid u\in \Sigma^+, v\in \Sigma^*\}$: after struggling a bit to find a counter-example word to the pumping lemma, I proved $L_4$ to be non regular if $|\Sigma|\geqslant 2$, using the fact that $L_3 = L_4^R$ and that regular languages are closed under mirror image.

However, if $|\Sigma| = 2$, I found out that $L_4$ verifies the conclusion of the pumping lemma, with a pumping length of $5$! Indeed, since any word of length $\geqslant 4$ contains a square, any word $u\in L_4$ with $|u| \geqslant 5$ can be written $u = vwwu'$ with $|vww|\leqslant 5$ and $|v|,|w| > 0$. Therefore $u = xyz$, with $x = \varepsilon$, $y = v$ and $z = wwu'$ verifying:

  • $|xy| = |v| < |vww| \leqslant 5$;
  • $|y| = |v| > 0$;
  • for all $k \geqslant 0$, $xy^kz\in L_4$. Indeed, $xz = z = wwu' \in L_4$ (because $|w| > 0$), $xyz = u \in L_4$ (hypothesis) and if $k>1$, $xy^kz = yyy^{k-2}z \in L_4$ (because $|y| > 0$).

I then tried to find a counter-example to the pumping lemma for $L_4$ when $|\Sigma|>2$, but without success. I found out that in order to not be able to apply the previous reasoning, in any counter-example, the first $n$ (pumping length) letters need to not contain a square, which is possible, but couldn't go further.

My questions are all related to this problem:

  • I don't think I did, but maybe, did I make any mistake?
  • Is there a counter-example word to the pumping lemma for $L_4$ when $|\Sigma| \geqslant 3$?
  • Do you know of any simpler language that is a counter-example to the converse of the pumping lemma (not regular but verifying the conclusion of the pumping lemma)? vonbrand proposed this one which is a tad bit complicated. I also know of $\{ab^iab^jab^j\mid i,j\geqslant 0\}\cup \{uaav\mid u,v\in\Sigma^*\}$ which is not better.
Nathaniel
  • 15,071
  • 2
  • 27
  • 52
  • 1
    As for your last bullet point: Every language of the form $$L\cup {$^k\mid k\ne 1}\cdot \Sigma^$ for $L\subseteq \Sigma^$, $$\notin\Sigma$, satisfies the regular pumping lemma. https://cs.stackexchange.com/q/9181/4287 – Hendrik Jan Jan 08 '22 at 01:18
  • Thanks for the idea. What I found interesting with the language ${uuv\mid u\in \Sigma^+, v\in\Sigma^*}$ is that it is not defined as the union of two other languages (which is the case for your language and the two I gave, and most of the counter-example I found). – Nathaniel Jan 08 '22 at 07:11

1 Answers1

4

When the alphabet is of size $4$ or more, it is easy to prove that $L_4$ is not regular using the pumping lemma. Given a pumping length $n$, let $s$ be a square-free word over $\{0,1,2\}$ of length $n$; such a word exists since there is an infinite square-free word over $\{0,1,2\}$. We choose $w = 3s33s3 \in L_4$. Suppose that we could write $v = xyz$ so that $|xy| \leq n$, $y \neq \epsilon$, and $xy^iz \in L_4$ for all $i \geq 0$. In particular, $xz \in L_4$. We consider two cases:

  1. $x \neq \epsilon$. In this case, $xz = 3t 33s3$, where $t$ is a word over $\{0,1,2\}$ satisfying $0 < |t| < s$. If $xz$ started with a square $u^2$, then $u$ starts with a $3$, and so the second occurrence of $u$ must also start with a $3$. The second letter cannot be a $3$ (since $t$ doesn't start with $3$), hence $u = 3t3$, but since $|t| < |s|$, the word $u^2$ is not a prefix of $xz$. Hence this case is impossible.

  2. $x = \epsilon$. In this case, $xz = t33s3$, where $t$ is a non-empty suffix of $s$. Since $xz$ contains only three $3$s, if $xz$ starts with a square $u^2$, then either $u$ contains no $3$ or it contains a single $3$. It $u$ contains no $3$ then $u^2$ is a prefix of $t$, which is impossible since $s$ is square-free. If $u$ contains a single $3$ then $u = t3$, which is impossible since $t$ doesn't start with a $3$ whereas the second occurrence of $u$ does. Hence this case is also impossible.


In the ternary case, it suffices to find a collection or arbitrarily long square-free ternary words $w$ such that for any decomposition $w = xyz$ with $y \neq \epsilon$, we have $xy^i z \notin L_4$ for some $i$. I conjecture that prefixes of the Leech word satisfy this condition (even if we restrict $i$ to be either $0$ or $2$).

Suppose that arbitrarily long words exist satisfying the above constraints. Given a pumping constant $n$, choose such a word $w_k$ of length at least $n+2$ which starts and ends with the same letter $\sigma$, and let $w = w_k \sigma w_k \sigma \in L_4$. (Note that since each word satisfying the constraints is square-free, every subword of length 5 contains all different letters; hence we can always truncate a long enough word to one which ends with the same letter with which it starts.)

Suppose that $w = xyz'$, where $y \neq \epsilon$ and $|xy| \leq n \leq |w_k| - 2$. Then we can write $z' = z\sigma w_k \sigma$, where $w_k = xyz$; note that $|z| \geq 2$ and $z$ ends with $\sigma$.

According to the conjecture above, we can find $i \neq 1$ such that $xy^i z$ does not have a square prefix. If $xy^i z'$ does have a square prefix $u^2$, then $u^2$ must reach the first $\sigma$; in particular, $|u| \geq 2$.

If the first copy of $u$ doesn't reach the first $\sigma$ then then $u$ doesn't contain $\sigma^2$, but then its second copy contains both $\sigma$ and either the last letter of $z$ or the first letter of $w_k$, and so does contain $\sigma^2$. Thus the first copy of $u$ does reach the first $\sigma$.

If the first copy of $u$ doesn't end at the first $\sigma$ then it contains $\sigma^3$. Since this is the unique copy of $\sigma^3$ in $w$, this is impossible. Hence $u = xy^iz\sigma$, and so $u$ is a prefix of $w_k \sigma$. In particular, $i = 0$, since otherwise $|u| > |w_k \sigma|$. But then $u$ is a proper prefix of $w_k \sigma$, implying that $w_k$ contains a copy of $\sigma^2$, which is false.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503