9

I would like to prove that any context free language over a 1 letter alphabet is regular. I understand there is Parikh's theorem but I want to prove this using the work I have done so far:

Let $L$ be a context free language. So, $L$ satisfies the pumping lemma. Let $p$ be $L$'s pumping constant. Let $L = L_1 \cup L_2$ where $L_1$ consists of $w$ where $|w| < p$ and and $L_2$ consists of $w$ where $|w| \ge p$. We have a single letter alphabet and since $L_1$ has a restriction on the length of its words, $L_1$ is a finite language. Finite languages are regular so $L_1$ is regular. If I can show that $L_2$ is regular, I can use the fact that the union of regular languages is regular. But I am struggling on showing that $L_2$ is regular. I know that since $w \in L_2$ has to satisfy $|w| \ge p$, by the pumping lemma, $w$ can be written as $w = uvxyz$ where $|vxy| \le p$, $|vy| > 0$ and $\forall t \ge 0$, $uv^txy^tz \in L$. Since we have a single letter alphabet (say the letter is $a$), $uv^txy^tz = uxz(vy)^t = uxz(a^{|vy|})^t \in L$. Now what?

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
ilikecats
  • 229
  • 2
  • 6

3 Answers3

3

Let $L$ be a unary context-free language. According to the pumping lemma, there is a constant $p$ such that if $a^n \in L$ then either $n < p$ or there exists $q \in \{1,\ldots,p\}$ such that $a^{n+tq} \in L$ for all $t \geq 0$ (actually, the pumping lemma gives $t \geq -1$). Let $L_0$ be the set of words in $L$ of length smaller than $p$, and for $q_0 \in \{1,\ldots,p\}$, let $L_{q_0}$ be the set of words in $L$ of length at least $p$ for which the pumping lemma gives $q = q_0$. I claim that $$ L = L_0 \cup \bigcup_{q=1}^p \{ a^{n+tq} : a^n \in L_q, t \geq 0\}. $$

Proof of $ \subseteq $: If $x \in L$ then either $|x| < p$, in which case $x \in L_0$, or $|x| \geq p$, in which case $x \in L_q$ for some $q$, and so $x$ belongs to the $q$'th summand on the right (choosing $t = 0$).

Proof of $ \supseteq $: This follows directly by definition of $L_0$ and $L_q$.

Since $L_0$ is finite, it is regular. Hence it suffices to prove that for each $q \in \{1,\ldots,p\}$, the following language is regular: $$ R_q := \{ a^{n+tq} : a^n \in L_q, t \geq 0\}. $$ Let us write, for $0 \leq r \leq q-1$, $$ R_{q,r} := \{ a^{n+tq} : a^n \in L_q, t \geq 0, n \equiv r \pmod q \}, $$ so that $R_q = \bigcup_{r=0}^{q-1} R_{q,r}$. Thus it suffices to prove that each $R_{q,r}$ is regular.

If no $a^n \in L_q$ satisfies $n \equiv r \pmod q$, then $R_{q,r} = \emptyset$ is clearly regular. Otherwise, let $n_{q,r}$ be the minimal $n \equiv r \pmod q$ such that $a^n \in L_q$. I claim that $$ R_{q,r} = \{a^{n_{q,r} + tq} : t \geq 0\} = a^{n_{q,r}} (a^q)^*, $$ and so $R_{q,r}$ is regular.

Indeed, clearly $R_{q,r}$ contains the right-hand side. On the other hand, suppose that $a^m \in R_{q,r}$. Then $m = n+tq$ for some $n \equiv r \pmod q$ and $t \geq 0$ such that $a^n \in L$. But then $m = n_{q,r} + (t+(n-n_{q,r})/q)q$ (our choice of $n_{q,r}$ guarantees that $n \geq n_{q,r}$ and that $n-n_{q,r}$ is divisible by $q$), and so $a^m$ is an element of the right-hand side.

Mati
  • 63
  • 5
Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
2

You have shown that for any given $w = uvxyz = a^{|uxz|+|vy|}$ there is a regular language $L_w = a^{|uxz|+|vy|\cdot t} \subset L_2$. Now you need to show that $\bigcup L_w = L_2$ and that this can be a finite union. Note that $|vy| \leq p$ and that $a^{(|uxz|+|vy|)+|vy|\cdot t} \subset a^{|uxz|+|vy|\cdot t}$ so not all values of $|uxz|$ matter.

Karolis JuodelÄ—
  • 3,667
  • 15
  • 18
2

This is a special case of Parikh's theorem. You can find a proof in lecture notes of Lindqvist (linked in the Wikipedia article). In fact, Parikh's theorem is a bit more general: it states that the histograms of all words in a context free language are "regular" (for an exact statement, see the link).

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503