8

Define $FH(L) = \{x \in \Sigma^* : \exists y \in \Sigma^* \text{ with } |x| = |y| \text{ such that } xy \in L\}$. In other words, $FH(L)$ is the set of first halves of even length strings in $L$. Given this, if $L$ is context-free, must $FH(L)$ be context-free?

Here's my attempt at a proof:

Since $L$ is a CFL, there exists a non-deterministic PDA recognizing $L$, $M = (Q, \Sigma, \Gamma, \delta, q_0, Z_0, F)$, where $\Sigma$ is the input alphabet, $\Gamma$ is the stack alphabet, and $Z_0$ is the symbol representing the initial stack contents. Construct PDA $M'$ from $M$, with $M' = (Q', \Sigma, \Gamma, \delta', q_0', Z_0, F')$, defined as follows:

$Q' = {q_0'} \cup (Q \times \Gamma_{\varepsilon}) \times (Q \times \Gamma_{\varepsilon}) \times (Q \times \Gamma_{\varepsilon})$.

$F' = \{[(q,X),(q,X),(p,Y)] : X,Y \in \Gamma_\varepsilon \text{ and } p \in F\}$

$\delta'(q'_0, \varepsilon, \varepsilon) = \{([(q,X), (q_0,Y), (q,X)], \varepsilon) : q \in Q \text{ and } X,Y \in \Gamma_\varepsilon \} $

$\delta'([(q,X),(p,Y),(r,Z)], a, \varepsilon) = \{([(q,X),\delta(p,a,Y), \delta(r,b,Z)], \varepsilon): X,Y,Z \in \Gamma_\varepsilon\ \text{ and } b \in \Sigma\} $

The first component of a state in $Q'$ records the guessed state $q$ and does not change once it is initially recorded. The second element records what state we are in after having processed some prefix of the input x, starting from state $q_0$, and the third element records what state we are in after having processed some prefix of the guessed $y$, starting from $q$.

I am not sure if this proof works, because I am a bit confused as to what to do with the stack for $M'$.

David Smith
  • 493
  • 2
  • 8
  • What do you think? – Yuval Filmus Nov 03 '14 at 01:13
  • Well, I suspect that an automaton recognizing FH(L) wouldn't require any more memory than one recognizing L. – David Smith Nov 03 '14 at 01:14
  • Have you tried to prove your suspicion? Where did you get stuck? – Yuval Filmus Nov 03 '14 at 01:15
  • I'll write my proof attempt as an answer, so you can comment. – David Smith Nov 03 '14 at 01:19
  • 1
    "I am not sure if this proof works, because I am a bit confused as to what to do with the stack for $M'$" – that's the crux of the problem, and leads me to believe that there is some counterexample, though I couldn't think of any. The problem is that you need to simulate $M$ while at the same time counting how many symbols you have seen and how many more to go. – Yuval Filmus Nov 03 '14 at 03:35
  • Maybe it will help to specify invariants: properties that hold for the configuration before and after each parallel step of M and M'. If the construction works, you should be able to find an invariant from which the intended relationship between the accepted word of M and that of M' readily follows. – reinierpost Nov 03 '14 at 10:21
  • What if I were to just ignore the stack of $M'$, pushing nothing onto it and popping nothing from it for all transitions of $M'$? Then, I could just exploit the nondeterminism of $M'$ to guess the next (state,stack) configuration for $M$ on processing symbol $a \in \Sigma$ and arbitrary top-of-stack symbol $X \in \Gamma_\varepsilon$ among all the configurations that M could assume upon processing $(a,X)$? – David Smith Nov 03 '14 at 19:03

1 Answers1

9

The intuition developed in the comments is right. The answer is NO, there is a counter-example, a CFL for which the first halves are not CFL.

$L = \{ a^m b^n c^n a^{3m} \mid m,n\ge 1 \}$, over the alphabet $\{a,b,c\}$, from the answer on our sister site.

Proof by Pumping lemma: pick $a^p b^p c^p \in \mathrm{FH}(L)$; pumping either destroys the "$b^n c^n$"- or the "first half"-property.

A slight adaptation of that language is $K = \{ a^m b^n c^n \#\# a^{3m} \mid m,n\ge 1 \}$, over the alphabet $\{a,b,c,\#\}$. We can now "force" the point where the cutting site of the first half is and get another proof technique.

Let $H = FH(K) \cap a^*b^*c^*\#$. This means we only consider first halves where the middle is exactly at the point between the two $\#$-symbols. Thus. $m+2n+1=1+3m$, or $m=n$. Thus $H=\{a^nb^nc^n \mid n\ge 1\}\#$. Now if $K$ is context-free then $H$ is context-free (via the closure property intersection by regular languages). This language is close to a standard non-context-free example $\{a^nb^nc^n \mid n\ge 1\}$. This in turn can be obtained by right quotient with $\#$ which also preserves context-freeness.

Hendrik Jan
  • 30,578
  • 1
  • 51
  • 105
  • I added the proof idea (taken from the source linked at [math.SE] so that the value of this answer does not depend on these resources' availability. – Raphael Nov 04 '14 at 06:53
  • @Raphael Thanks. I added a text to explain the specific use of the additional $#$-character that I introduced. – Hendrik Jan Nov 04 '14 at 12:50
  • Ah, now I see where you were going with that. Nice! – Raphael Nov 04 '14 at 16:31