15

(Practice exam question in computational models)

Definition: A word $w\in \{0,1\}^*$ is called balanced if it contains the same number of $0$s as $1$s.

Let $L = \{w\in \{0,1\}^*\mid |w|$ is even and the first half of $w$ is unbalanced$\}$. Determine whether or not $L$ is context-free and prove your answer. You may do so by drawing an NPDA which recognizes $L$, using the closure properties of CFLs, or the relevant pumping lemma.

This question has been bugging me for a while; my gut tells me it isn't context-free since any PDA that recognizes it would have to check the balance of the string read thus far while simultaneously measuring its length and non-deterministically choosing an unbalanced point to validate as the middle of the word. I also haven't been able to express it as a union or concatenation of two CFLs or find a CFG which generates it.

On the other hand, I haven't been able to either find a word in the language that can't be pumped or prove that every word can be pumped.

Does anyone have any ideas on how to proceed?

Raphael
  • 72,336
  • 29
  • 179
  • 389
  • 2
    Every word can be trivially pumped using pumping lemma for CFL. Just set v the first half of the word, and x the second half of the word. However this is not sufficient to claim it is CFL. – Marcelo Fornet Jun 20 '20 at 21:39
  • 2
    @MarceloFornet While your sentiment is shared, the length of $vx$ is greater than any fixed pumping length most of the time. A stronger claim would be that $L$ satisfies the pumping lemma. – John L. Jun 20 '20 at 22:34
  • @JohnL.Thanks for pointing out. Also was thinking the problem with the first half balanced. – Marcelo Fornet Jun 20 '20 at 23:07
  • 2
    Here is a possible approach. Try proving the following more general conjecture. Given a context-free language $H$, let $D={fb: f\in H, b\in\Sigma^*, |f|=|b|}$. If $D$ is context-free, then $D$ is regular. – John L. Jun 22 '20 at 01:25
  • @JohnL. interesting idea, I’ll give it a shot. Tomorrow the TA for the course is holding a virtual office hour, too. I’ll ask her then and update here if she happens to have a solution. – Or Bairey-Sehayek Jun 22 '20 at 22:53
  • A common but informal idea is that a context-free grammar cannot maintain two dependent unbounded conditions. My conjecture, motivated by this question, is a very small attempt to express that idea formally. – John L. Jun 23 '20 at 00:05
  • Another way to pump every word, hopefully this time a correct one. Let $c$ be the most frequent character in the first half (it exist since it is unbalanced). Let $l$ be the last position of $c$ in the first half. Pumping length will be 2. Use as $v$ the word consisting of one character at position $l$ and as $x$ the word consisting of one character at position $l + 1$. Notice that $l+1$ can be on the second half only if $l = \frac{n}{2}$. Every new word will keep $c$ as the most frequent character in the first half, and will be of even size. – Marcelo Fornet Jun 23 '20 at 04:13
  • Have you tried any stronger properties to at least formulate a hypothesis? – Raphael Jun 23 '20 at 07:53
  • Smells like this one to me. You're supposed to think it's not context free, but then it is. (Lesson: intuition is broken) – Raphael Jun 23 '20 at 07:56
  • @JohnL. Perhaps you should also require that $H$ is non regular, since $H=a^*$ leads to $D=a^nw$ s.t. $|w|=n$ which is context free but not regular. – Ariel Jun 23 '20 at 08:20
  • @Ariel Thanks. I was blind to my typo. It should have been "..., then $H$ is regular". Here is the conjecture again, "Given a context-free language $H$, let $D={fb: f\in H, b\in\Sigma^*, |f|=|b|}$. If $D$ is context-free, then $H$ is regular." The intuition is that since the context-free grammar of $D$ should maintain $|f|=|b|$ as well as $H$ being context-free, one of these two conditions must not require unbounded memory, which implies $H$ being regular. – John L. Jun 23 '20 at 12:51
  • Here is a more concise and more general conjecture. "Given a language $H$ over alphabet $\Sigma$, if ${fb: f\in H, b\in\Sigma^*, |f|=|b|}$ is context-free, then $H$ is regular." – John L. Jun 23 '20 at 13:09
  • The standard CFL necessary conditions don't seem to help. For example, the (Parikh theorem) letter frequencies are the same as for the language of all nonempty even-length strings (which you can see by starting with $0^{2n} \in L$ and changing characters to 1 from left to right---except when that would make the first half balanced in which case you change the rightmost 0 instead.) – user326210 Apr 23 '22 at 11:12
  • 1
    @MarceloFornet: That's not exactly right, because with a string like 010aaa, that would produce 01aa when $n = 0$. I think that problem is fixable, but it's a bit tricky; I think that a few different cases need to be handled separately. – ruakh May 07 '22 at 00:36
  • My general conjecture above is wrong. Consider $H={0^n 1 (0|1)^n: n>0}$. – John L. May 23 '22 at 05:30

2 Answers2

1

(Note: this answer doesn't fully answer the question — I don't know whether the language is context-free — it merely addresses the question of whether it satisfies the pumping lemma, which was raised in the question and discussed in the comments, and is obviously relevant in that the language cannot be context-free if without satisfying the pumping lemma.)

On the other hand, I haven't been able to either find a word in the language that can't be pumped or prove that every word can be pumped.

Every word of length ≥ 4 can be pumped.

To see how, we distinguish three cases:

  1. If $|w|$ is not a multiple of four, then we can choose any substring of length 4, break it up however-we-want into $v$ and $x$, and pump it. This works because pumping will preserve the invariant that $|w|$ is never a multiple of four, which in turn means that the first half can never be balanced (because the first half will never have an even number of characters), so the result is necessarily in $L$.
  2. If $|w|$ is a multiple of four, and there are more 0s than 1s in the first half, then the first half must contain some occurrence of the substring '00'. (It can't be something like '010101...0', because then $|w|$ wouldn't be a multiple of four.) We can take that substring, break it up however-we-want into $v$ and $x$, and pump it.
    • When $n = 0$, the length of the result is not a multiple of four, which as we saw above means it's necessarily in $L$, as desired.
    • With $n > 0$, the first half of the result will continue to have more 0s than 1s, because incrementing n adds two 0s and causes at most one 0 to be "lost" to the second half, so the number of 0s in the first half strictly increases and the number 1s in the first half never increases. So the result is in $L$, as desired.
  3. If $|w|$ is a multiple of four, and there are more 1s than 0s in the first half, then . . . well, no need to write it all out, it's perfectly symmetric to case #2.

So the language does satisfy the pumping lemma, but as I'm sure you already know, that doesn't prove that the language is context-free. :-(

ruakh
  • 633
  • 4
  • 10
  • I am going to tag the question "open problem" once I have verified this answer. Oh no. The reason why I have not done that is I have not checked Vor's answer thoroughly. – John L. May 07 '22 at 00:51
0

Perhaps it can be proved using Ogden's Lemma and its generalization by Bader and Moura, this is a rather informal sketch of the proof.

First restrict $L$ to strings of length $4n$ and apply to it the following homomorphism between $\Sigma = \{ 0,1 \}$ and $\Sigma' = \{ a, b, c\}$:

$h(11) \to a$
$h(00) \to b$
$h(01) \to c$
$h(10) \to c$

If $L$ is CF then also the new language $L'$ obtained is CF by closure properties.

Informally $L'$ contains an unbalanced number of $a$ and $b$ in the first half and the number/occurences of $c$ doesn't matter.

Further restrict $L'$ by intersecting it with the regular language $R = \{ a^* (c^* b^*)^* \}$; let $L'' = R \cap L'$

For example the string

$a a c b | cccc \in L''$ corresponds to $11\;11\;10\;00\; |\; 10\;10\;10\;10 \in L$ ($|$ is used to mark the half of the string for better readability)

$a b c c | cccc \notin L''$ corresponds to $11\;00\;10\;10\;|\;10\;10\;10\;10 \notin L$

Suppose that $L''$ is CF, and $p$ is its pumping length. Build $w \in L''$ concatenating these four parts:

  1. $(\;a^p\;)$ leading $a$'s

  2. $(\;c^j\;)$ a sequence of $c$s, we'll fix $j$ below

  3. $(\;c^{p} \;b\;)$ repeated $p + p!$

If $n$ is the constant of the Bader-Moura's condition, then we pick $j$ large enough to exclude all the symbols in part 1 and 3: $j \geq n^{p+(p+1)(p+p!)+1}$

  1. $(c^k)$ where $k$ is large enough to be pumped exluding all previous symbols: $k \geq n^{p + j + (p+1)(p+p!)+1}$

$w = a^{p} \; c^j \; (c^{p} \;b )^{p+p!} \; c^{k} $

Now we mark the first $a$ sequence as distinguished, the string $vx$ must contain $0 < q \leq p$ distinguished positions ($\#a_{vx} = q$) by Ogden's lemma; $vx$ can also contain one $b$ (not more than one because the $b$s are spaced with more than $p$ symbols $c$) and $0 \leq r < p$ symbols $c$ ($\#c_{vx} = r$).

  1. if $vx$ is such that $\#a_{vx}=q$, $\#b_{vx}=0$, $\#c_{vx}=r$:

then we can pump $i = p! / q $ times and we obtain the same number of $a$s and $b$s; if after the pump some $b$s fall after the half of the string, we can pump the final $c^k$ independently from the rest of the string and we can "push" all $a$s and $b$s back in the first half (and $\#a_{w'} = \#b_{w'} = p + p!$), so the pumped string $w'$ is not in $L''$

  1. if $vx$ is such that $\#a_{vx}=q$, $\#b_{vx}=1$, $\#c_{vx}=r$:

then each time we pump we increase the number of $a$s and $b$s, but we cannot guarantee that we reach the same number (e.g. in the case $\#a_{vx}=q=1\#b_{vx}$). But in this case the derivation tree "isolate" the $c^j$ part of the string from the final part $c^k$, so we can pump them independently.

We can pump $c^j$ as many time as needed to "push" $p!$ symbols $b$s to the second half of the string. Suppose that the pumping length of $c^j$ is $s$ (that must be even), the half of the string is shifted towards the $b$s by $s/2$. We have $s \leq p$ so after each pump at most one $b$ is "pushed" to the second half, because the "distance" between $b$s is $p$. So also in this case we get a string $w'$ not in $L''$

Vor
  • 12,513
  • 1
  • 30
  • 60