Given a CFG in Chomsky normal form, is there an algorithm that solves the emptiness problem in linear runtime? I thought about using depth search here, but I think it's a little bit above linear runtime.
1 Answers
Yup, it can be done. For each nonterminal $A$, introduce a boolean variable $x_A$, with the intent that if $x_A$ is true, that means $L(A)$ is non-empty. Then you can convert each production into a corresponding Horn clause:
- $A \to BC$ becomes $(x_B \land x_C) \implies x_A$
- $A \to a$ becomes $x_A$
- $S \to \varepsilon$ becomes $x_S$
Let $\varphi$ denote the conjunction of these Horn clauses. Find the minimal satisfying assignment for $\varphi$; that can be done in linear time. If this assignment makes $x_S$ true, then the language is non-empty, otherwise it is empty.
Alternatively, if you prefer a more direct algorithm, here is a standard one that you might see in textbooks.
Start out with all nonterminals unmarked. If you see a rule $A \to a$, mark $A$. If you see a rule $S \to \varepsilon$, mark $S$. Whenever you mark a nonterminal, check all rules of the form $A \to BC$ where it appears on the right-hand side; if both $B$ and $C$ are marked, mark $A$. Repeat until convergence. At that point, all marked nonterminals correspond to nonterminals that generate a non-empty language, so the language is non-empty iff $S$ is marked.
This also runs in linear time. It takes a little more work to see why, but it's true. In particular, each nonterminal can only be marked once, and each rule of the form $A \to BC$ will only be checked at most twice (once when $B$ is marked, once when $C$ is marked), so the amount of work you do is $O(1)$ per nonterminal plus $O(1)$ per rule, which is linear in the size of the grammar. It does require suitable data structures that map from each nonterminal to a list of all rules containing it on the right-hand side, but that can be built in advance in linear time as well.

- 159,275
- 20
- 227
- 470
-
Thanks! Could you explain further why $O(1)$ per rule means that it is linear in the size of the grammar? – Julian May 27 '18 at 12:29
-
@Julian, the size of the grammar is the number of rules; call that $n$. Such a grammar has at most $n$ non-terminals (every non-terminal must appear on the left-hand side of every rule). So $O(1)$ per nonterminal, times $n$ nonterminals, plus $O(1)$ per rule, times $n$ rules, leads to $O(1) \times n + O(1) \times n = O(n)+O(n) = O(n)$. – D.W. May 27 '18 at 14:35