22

Is there an algorithm/systematic procedure to test whether a language is context-free?

In other words, given a language specified in algebraic form (think of something like $L=\{a^n b^n a^n : n \in \mathbb{N}\}$), test whether the language is context-free or not. Imagine we are writing a web service to help students with all their homeworks; you specify the language, and the web service outputs "context-free" or "not context-free". Is there any good approach to automating this?

There are of course techniques for manual proof, such as the pumping lemma, Ogden's lemma, Parikh's lemma, the Interchange lemma, and more here. However, they each require manual insight at some point, so it's not clear how to turn any of them into something algorithmic.

I see Kaveh has written elsewhere that the set of non-context-free languages is not recursively enumerable, so it seems there is no hope for any algorithm to work on all possible languages. Therefore, I suppose the web service would need to be able to output "context-free", "not context-free", or "I can't tell". Is there any algorithm that would often be able to provide an answer other than "I can't tell", on many of the languages one is likely to see in textbooks? How would you build such a web service?


To make this question well-posed, we need to decide how the user will specify the language. I'm open to suggestions, but I'm thinking something like this:

$$L = \{E : S\}$$

where $E$ is a word-expressions and $S$ is a system of linear inequalities over the length-variables, with the following definitions:

  • Each of $x,y,z,\dots$ is a word-expression. (These represent variables that can hold any word in $\Sigma^*$.)

  • Each of $a,b,c,\dots$ is a word-expression. (Implicitly, $\Sigma=\{a,b,c,\dots\}$, so $a,b,c,\dots$ represent a single symbol in the underlying alphabet.)

  • Each of $a^\eta,b^\eta,c^\eta,\dots$ is a word-expression, if $\eta$ is a length-variable.

  • The concatenation of word-expressions is a word-expression.

  • Each of $m,n,p,q,\dots$ is a length-variable. (These represent variables that can hold any natural number.)

  • Each of $|x|,|y|,|z|,\dots$ is a length-variable. (These represent the length of a corresponding word.)

This seems broad enough to handle many of the cases we see in textbook exercises. Of course, you can substitute any other textual method of specifying a language in algebraic form, if you like.

reinierpost
  • 5,509
  • 1
  • 21
  • 38
D.W.
  • 159,275
  • 20
  • 227
  • 470
  • Wouldn't it be easier to start with regularity of languages? – Yuval Filmus Nov 11 '13 at 20:05
  • @YuvalFilmus, sure would! Now that you mention it, that's a great idea. Do you think the problem is feasible for regular languages? I'd be happy to ask a corresponding about regular languages, if you think that might be valuable. – D.W. Nov 11 '13 at 20:46
  • 2
    It would certainly be easier for regular languages. By the way, the general non-decidability doesn't necessarily apply to languages of the form you mention. – Yuval Filmus Nov 11 '13 at 23:50
  • 5
    I'm afraid this problem is probably open, at least a specific case is: http://cstheory.stackexchange.com/questions/17976. There might be a way to get undecidability for your more general problem, but I don't see it. – sdcvvc Nov 12 '13 at 14:55
  • it would be helpful to give some example words in the language. suggest further research/ collaboration in [chat] – vzn Sep 18 '14 at 19:10
  • It doesn't handle cases like ${a^{n^2} \colon n \ge 1}$ or other non-context free examples. – vonbrand Aug 16 '19 at 17:12

6 Answers6

1

The way I think we could handle this problem is devising a language which is context free if and only if a given word is in a recursively enumerable language, that is, if a Turing machine halts on a given input. If we can do this, we can reduce the problem $L \in CFL$ to the halting problem and deem it undecidable.

Let $L$ be a recusively enumerable language and let $M$ be a Turing machine such that $\mathcal{L}(M) = L$. Let $$ L_{M} = \{a^n c^k b^n | n > 0 \land (k = n \iff M(n)\uparrow) \land (k = 0 \iff M(n)\downarrow)\} $$

We have that $L_{M}$ is context free if and only if for each $n$ $k = 0$, that is $$ L_{M} \in CFL \iff \forall n ~M(n)\downarrow $$ which is undecidable. Therefore, there can't be an algorithm which can decide, in general, if $L \in CFL$.

ecmm
  • 81
  • 3
0

By Rice's theorem, to see if the language accepted by a Turing machine has any non-trivial property (here: being context free) is not decidable. So you would have to restrict the power of your recognizing machinery (or description) to make it not Turing complete to hope for an answer.

For some language descriptions the answer is trivial: If it is by regular expressions, it is regular, thus context free. If it is by context free grammars, ditto.

vonbrand
  • 14,004
  • 3
  • 40
  • 50
  • 5
    I agree with all of your comments, but I'm not sure I see how this answers the question or to use this how to answer the question. I'm aware of all of those facts. I describe a particular way of specifying languages. Are you suggesting it is Turing-complete? It doesn't look likely to be Turing-complete to me. A system of linear inequalities is not Turing-complete, so I suspect/speculate I have already restricted it enough to be not Turing-complete. Also, for the method I gave for specifying languages, it's not trivial, as it's not a regular expression and not a context-free grammar. – D.W. Aug 16 '19 at 16:32
  • I've changed the title to reflect this. This is an answer to the question previously stated in the title. – reinierpost Mar 17 '22 at 17:24
-1

There are Several Methods to Solve this. Let me Discuss one by one

  1. Try to make a Context-free grammar, and then check whether all the production on the left-hand side is exactly one non-terminal symbol, then language is context-free.
    Ex:- like if $aA\rightarrow Bc$ is in the context-free grammar then it's not context-free language .

  2. Take a stack and push element of language into stack, and pop the element from stack when newly element wants to push into stack, like in a language
    $L = \{a^n b^n c^n : n \in \mathbb{N}\}$
    then push all $a$'s into stack and pop one $a$ when one $b$ wants to push into stack. Then there is no element which can nullify the remaining $n$ $a$'s. So this language is not context free. Must remember in context-free language only one comparison is allowed. But here are two comparisons. It's context sensitive language (CSL), which allow two comparisons.

  3. Next approach is not straight forward. Means you can only check when language is not Context-free language (CFL). It's called Pumping Lemma.

greybeard
  • 1,041
  • 2
  • 9
  • 23
  • I don't follow the example in item 1.: the production shown cannot be part of a CF grammar. I don't remember it to be conclusive for the language of that grammar. – greybeard Jul 25 '20 at 08:50
-2

Yes there a concept of Pumping Lemma which is a negativity test.

  • 1
    The question already mentioned the pumping lemma, and explained why this is not sufficient as a complete algorithm. If you disagree with the dismissal of the pumping lemma as in the question, please explain how you think it can be used for a complete algorithm. – Discrete lizard Oct 11 '21 at 19:13
-3

Try JFLAP software if you just want to check a CFG. You can maybe even ask JFLAP developers to give you the code or algorithm for the software. you can get JFLAP from here http://www.jflap.org/jflaptmp/ it is free however it requires JDK or JRE or something. Or maybe you can try some other similar softwares and their developers.

  • 1
    I'm not sure this answers the question. JFLAP has no feature that accepts a language in mathematical notation and tells you whether it is context-free or not. – Yuval Filmus Nov 02 '18 at 16:08
  • THEOREM 2.20 in Sipser book A language is context free if and only if some pushdown automaton recognizes it. And you can build PDA in JFLAP from a grammar – Haseeb Hassan Asif Nov 02 '18 at 16:39
  • You maybe right about mathematical notation that can't be put in JFLAP but you can still put all rules of a grammar and it can either convert it into a PDA or says it is not a CFG or some other error – Haseeb Hassan Asif Nov 02 '18 at 16:41
  • How would you represent ${a^nb^nc^n : n \in \mathbb{N}}$ as a grammar? Also, it is probably undecidable to tell whether an unrestricted grammar generates a context-free language, so I doubt JFLAP can do that. – Yuval Filmus Nov 02 '18 at 16:45
  • 2
    I imagine that JFLAP can convert a context-free grammar to an equivalent PDA, but this is of absolutely no help here. – Yuval Filmus Nov 02 '18 at 16:45
  • @YuvalFilmus Then only thing remains is that you have to check by pencil and paper if a grammar is Context Free by trying to create a PDA for it. otherwise there is no algorithm yet to prove that if unrestricted grammar is context free or not. – Haseeb Hassan Asif Nov 02 '18 at 17:47
  • But there might be an algorithm that works for the special case discussed in the question. This is what the question is asking. – Yuval Filmus Nov 02 '18 at 17:57
  • The answer to this question https://cs.stackexchange.com/questions/18010/algorithm-to-test-whether-a-language-is-regular could be pretty much the answer to our above question. – Haseeb Hassan Asif Nov 02 '18 at 17:57
  • @Yuval Filmus, unrestricted grammars generate the recursively enumerable sets, i.e., they are equivalent to Turing machines in expressive power. – vonbrand Sep 01 '19 at 01:15
-4

Any language is is accepted by a Push Down Automata, is a CFL. Here is a detailed breakdown to determine whether a language is CFL or not. check if language is CFL or not

SiluPanda
  • 549
  • 1
  • 3
  • 12
  • This is not an algorithm. – xskxzr Apr 20 '19 at 18:35
  • I don't see how this answers the question. I am aware that a language is context-free iff it is accepted by a PDA, but that doesn't seem to help with finding an algorithm of the form requested in the question. The geeksforgeeks article you link to does not provide a complete algorithm for this problem; it just lists non-exhaustive special cases that are easier (and it's not a great reference, as some of its statements are a bit sketchy/dubious). – D.W. Apr 20 '19 at 18:37
  • AFAIK, there is no well structured algorithm for that yet. (correct me if I am wrong). The best we can do is to check for cases. – SiluPanda Apr 22 '19 at 11:40