Language theoretic comparison of LL and LR grammars

Question

People often say that LR(k) parsers are more powerful than LL(k) parsers. These statements are vague most of the time; in particular, should we compare the classes for a fixed $k$ or the union over all $k$? So how is the situation really? In particular, I am interested in how LL(*) fits in.

As far as I know, the respective sets of grammars LL and LR parsers accept are orthogonal, so let us talk about the languages generated by the respective sets of grammars. Let $LR(k)$ denote the class of languages generated by grammars that can be parsed by an $LR(k)$ parser, and similar for other classes.

I am interested in the following relations:

$LL(k) \overset{?}{\subseteq} LR(k)$
$\bigcup_{i=1}^{\infty} LL(k) \overset{?}{\subseteq} \bigcup_{i=1}^{\infty} LR(k)$
$\bigcup_{i=1}^{\infty} LL(k) \overset{?}{=} LL(*)$
$LL(*) \overset{?}{\circ} \bigcup_{i=1}^{\infty} LR(k)$

Some of these are probably easy; my goal is to collect a "complete" comparison. References are appreciated.

@AndreaTucci: Yes, but that only covers the grammars, not the generated languages. — Raphael, Jul 11 '12 at 18:06

score 69 · Accepted Answer · edited Jun 16 '20 at 10:30

69

There are numerous containments known. Let $\subseteq$ denote containment and $\subset$ proper containment. Let $\times$ denote incomparability.

Let $LL = \bigcup_k LL(k)$, $LR = \bigcup_k LR(k)$.

Grammar level

For LL

$LL(0) \subset LL(1) \subset LL(2) \subset LL(2) \subset \cdots \subset LL(k) \subset \cdots \subset LL \subset LL(*)$
$SLL(1) = LL(1), SLL(k) \subset LL(k), SLL(k+1) \times LL(k)$

Most of these are proven in Properties of deterministic top down grammars by Rosenkrantz and Stearns. $SLL(k+1) \times LL(k)$ is a rather trivial exercise. This presentation by Terence Parr places $LL(*)$ on slide 13. The paper LL-regular grammars by Jarzabek and Krawczyk show $LL \subset LLR$, and their proof trivially extends to $LL \subset LL(*)$

For LR

$LR(0) \subset SLR(1) \subset LALR(1) \subset LR(1)$
$SLR(k) \subset LALR(k) \subset LR(k)$
$SLR(1) \subset SLR(2) \subset \cdots \subset SLR(k)$
$LALR(1) \subset LALR(2) \subset \cdots \subset LALR(k)$
$LR(0) \subset LR(1) \subset LR(2) \subset \cdots \subset LR(k) \subset \cdots \subset LR$

These are all simple exercises.

LL versus LR

$LL(k) \subset LR(k)$ (Properties of deterministic top down grammars plus any left recursive grammar)
$LL(k) \times SLR(k), LALR(k), LR(k-1)$ (simple exercise)
$LL \subset LR$ (any left recursive grammar)
$LL(*) \times LR$ (left recursion versus arbitrary lookahead)

Language level

For LL

$LL(0) \subset LL(1) \subset LL(2) \subset \cdots \subset LL(k) \subset \cdots \subset LL \subset LL(*)$
$SLL(k) = LL(k)$

Most of these are proven in Properties of deterministic top down grammars. The equivalence problem for LL- and LR-regular grammars by Nijholt makes references to papers showing $LL(k) \subset LL(*)$. The paper LL-regular grammars by Jarzabek and Krawczyk show $LL \subset LLR$, and their proof trivially extends to $LL \subset LL(*)$

For LR

$LR(0) \subset SLR(1) = LALR(1) = LR(1) = SLR(k) = LALR(k) = LR(k) = LR$

Some of these were proven by Knuth in his paper On the Translation of Languages from Left to Right in which he introduced LR(k), the rest are proven in Transforming LR(k) Grammars to LR(1), SLR(1), and (1,1) Bounded Right-Context Grammars by Mickunas et al.

LL versus LR

$LL \subset LR(1)$ (containment follows from the above, $\{ a^i b^j | i \geq j \}$ is the canonical example for strict containment)
$LL(*) \times LR$ (the language $\{ a^i b^j | i \geq j \}$ shows half the claim, and the introduction of The equivalence problem for LL- and LR-regular grammars by Nijholt makes references to papers showing the other half)
$LR(1) = DCFL$ (see e.g. reference here).

edited Jun 16 '20 at 10:30

Community

1

answered Mar 07 '12 at 01:17

Alex ten Brink

9,066
3
35
63

Excellent answer though, I had already upvoted. I would have thought Frank deRemer proved LALR <= LR in his original LALR paper? (1969?) – user207421 Apr 11 '12 at 18:43
@EJP: $LALR(k) \subseteq LR(k)$ is immediate if you define $LALR$ by collapsing $LR$ states. deRemer only proved that his construction created the same parser. The grammar classes are not equal, which is a nice exercise (or even a nice question for this site). The language class equality is a lot more tricky: read the paper if you really want the details ;) – Alex ten Brink Apr 11 '12 at 22:58
1

@AlextenBrink I did read the paper, and was taught by Frank de Remer, but it's 30+ years ago ;-) Thanks for all the detail. – user207421 Apr 12 '12 at 00:13
It might be nice to collect example grammars for each inequality. – o11c May 31 '15 at 05:51
1

@o11c I think that would overburden a single answer. My impression is that Alex gave good references where necessary; he states "easy exercise" for some. I guess if a reader can not come up with a grammar, they can post a new question asking for that specific case. – Raphael Jun 18 '15 at 10:33
Surprised at how languages have equality when grammars have containment. For grammars, we have $LR(0)\subset SLR(1)\subset LALR(1)\subset LR(1)$. For languages, we have $LR(0)\subset SLR(1) = LALR(1)=LR(1)$. Why is this so? – RajS Dec 16 '17 at 07:12
1

What does it mean "$SLL(k+1) \times LL(k)$ is a rather trivial exercise."? Which statement is trivial? What does "$\times$" mean here? If it is the Cartesian product operation, or any other operation, the sentence does not make sense. – Alexey Nov 03 '18 at 12:09
2

@alexey: $\times$ in this answer means "no containment relationship". In other words, $A\times B$ means that both $A-B$ and $B-A$ are non-empty. – rici Feb 15 '19 at 15:39
1

@rici, where does this use of "times" symbol come from? This is not what it normally means. In fact, i've never heard of using "times" for a relation instead of an operation... – Alexey Feb 15 '19 at 19:04
1

@alexey: it should be a big X. (As in Not, or crossed-out). There is probably something more X-like than the times sign, but I've seen it used that way before. – rici Feb 15 '19 at 21:29
1

@rici, the symbol used here is times, but even for X, where does this usage come from? I doubt this to be coming from mathematics... – Alexey Feb 15 '19 at 21:35
1

@akexey I guess it comes from X mark often used in opposition to check mark. So it has a negative connotation in common usage. There are some very clunky symbols for "incomparable"; according to Wikipedia, it is $\mid\mid$ but I wonder if that would have been obvious to a casual reader. Indeed, writing the two bars crossed feels more incomparable to me :-) – rici Feb 15 '19 at 22:48
1

According to wiki, "It was also proven that there exist LR(1) languages that are not LALR." Is this a contradiction to what you wrote in the answer? – hengxin Dec 02 '21 at 14:14

Language theoretic comparison of LL and LR grammars

1 Answers1

Grammar level

For LL

For LR

LL versus LR

Language level

For LL

For LR

LL versus LR

Linked

Related