Solving the emptiness problem for a CFG in Chomsky normal form (linear)

Question

Given a CFG in Chomsky normal form, is there an algorithm that solves the emptiness problem in linear runtime? I thought about using depth search here, but I think it's a little bit above linear runtime.

score 8 · Accepted Answer · answered May 25 '18 at 15:50

Yup, it can be done. For each nonterminal $A$, introduce a boolean variable $x_A$, with the intent that if $x_A$ is true, that means $L(A)$ is non-empty. Then you can convert each production into a corresponding Horn clause:

$A \to BC$ becomes $(x_B \land x_C) \implies x_A$
$A \to a$ becomes $x_A$
$S \to \varepsilon$ becomes $x_S$

Let $\varphi$ denote the conjunction of these Horn clauses. Find the minimal satisfying assignment for $\varphi$; that can be done in linear time. If this assignment makes $x_S$ true, then the language is non-empty, otherwise it is empty.

Alternatively, if you prefer a more direct algorithm, here is a standard one that you might see in textbooks.

Start out with all nonterminals unmarked. If you see a rule $A \to a$, mark $A$. If you see a rule $S \to \varepsilon$, mark $S$. Whenever you mark a nonterminal, check all rules of the form $A \to BC$ where it appears on the right-hand side; if both $B$ and $C$ are marked, mark $A$. Repeat until convergence. At that point, all marked nonterminals correspond to nonterminals that generate a non-empty language, so the language is non-empty iff $S$ is marked.

This also runs in linear time. It takes a little more work to see why, but it's true. In particular, each nonterminal can only be marked once, and each rule of the form $A \to BC$ will only be checked at most twice (once when $B$ is marked, once when $C$ is marked), so the amount of work you do is $O(1)$ per nonterminal plus $O(1)$ per rule, which is linear in the size of the grammar. It does require suitable data structures that map from each nonterminal to a list of all rules containing it on the right-hand side, but that can be built in advance in linear time as well.

Thanks! Could you explain further why $O(1)$ per rule means that it is linear in the size of the grammar? — Julian, May 27 '18 at 12:29
@Julian, the size of the grammar is the number of rules; call that $n$. Such a grammar has at most $n$ non-terminals (every non-terminal must appear on the left-hand side of every rule). So $O(1)$ per nonterminal, times $n$ nonterminals, plus $O(1)$ per rule, times $n$ rules, leads to $O(1) \times n + O(1) \times n = O(n)+O(n) = O(n)$. — D.W., May 27 '18 at 14:35

Solving the emptiness problem for a CFG in Chomsky normal form (linear)

1 Answers1

Linked