Why can't a LR(0) parser have GOTO and Reduce Items in the same canonical set?

Question

I am talking in reference to this question from this compiler test. I think it is actually true that a GOTO and REDUCE can't be present in the same set, but am having a hard time coming up with an example or counterexample.

Pls provide and example and explanation to make me understand.

https://gateoverflow.in/299820/Go-compiler1-parsing-5

rici · Accepted Answer · 2022-12-30T13:43:29.583

This isn't what you asked, but let me start by saying that I suspect the answer they're looking for in the question you link is (D), "A LR(0) parser can parse any regular grammar".

That statement is clearly false; an LR(0) grammar can only parse languages with the "prefix property", which holds if no sentence in the language is a prefix of another sentence in the language. Many regular languages do not have the prefix property, including, for example, $\{a, aa\}$. (You can create an augmented language by adding an explicit end-of-input marker –conventionally written $\\\$$– to the end of every sentence. The language thus augmented is regular and has the prefix property. That still doesn't let you parse "any regular grammar" with LR(0); it just means that every augmented regular language has an LR(0) parser, which is quite a different statement.)

An LR(0) automaton has no lookahead; that's what the $(0)$ means. So if a language includes $u$ and $uv$, where $|v|\ge 1$ (in other words, it does not have the prefix property), then once the parser reads $u$, it cannot decide whether to accept or continue.

Another consequence of not having any lookahead is that if a REDUCE action is available in a state, it must be unconditional. No SHIFT action is possible for that state (that would be a shift/reduce conflict) and no other REDUCE action is possible (that would be a reduce/reduce conflict). None of those restrictions immediately implies the action of GOTO actions, though.

It's pretty common to see two claims about LR(0) grammars:

"If a reduce item is present in a LR(0) itemset it cannot have any other item." (That's option C from the question you linked to, with a small vocabularly change.)
"No grammar with an empty production can be LR(0)." (That's somewhere in the discussion of the question.)

In fact, neither of the above claims are strictly true. Consider the following simple grammar: $$\begin{align}S&\to M A\\ M&\to\\ A&\to a \end{align}$$

That grammar recognizes the language $\{a\}$, which is not a particularly interesting language, other than as an example. Of course, that language is LR(0) –any language whose sentences are all the same length are LR(0). That doesn't mean that all grammars for that language are LR(0), but the above grammar certainly is; here's the LR(0) state machine generated by the grammophone tool, which happily classifies the grammar as LR(0):

Looking at that state machine, it's evident that state 0 has a REDUCE action ($M\to$) and two GOTO actions, one on $M$ and the other on $S$. (Of course, it has no SHIFT actions. In an LR(0) grammar, the generated parser can't have both REDUCE and SHIFT actions in the same state.)

(Perhaps it's worth noting that you can tell what types of actions are possible for a state by looking at the symbol following each • in the itemset: if the • is at the end of a production, there is a REDUCE action; if the • comes before a terminal, there is a SHIFT action, and if the the • comes before a non-terminal, there is a GOTO action.)

Certainly, the "marker" non-terminal $M$ contributes nothing to the recognition of sentences. We could simply remove $M$ from the grammar without doing anything more than deleting every reference to it in every production, and if we did so, there would no longer be a reduction action for it.

The point of marker non-terminals is to attach some syntax-directed semantic action to a production which is evaluated before the production is reduced. (Parser generators which provide this facility automatically call these "mid-rule actions".) Marker non-terminals are always ε-productions, and they don't necessarily invalidate the LR(0) property. But they are not essential for recognition, and so possibly there is some way of rephrasing the "no ε-production" criterion which more precisely characterises LR(0) grammars. I haven't seen one, though, and my inclination is to think that the GATE question is not really correct (and it wouldn't be the first GATE question I've seen which has led me to think that).

LR(0) means that the grammar can be parsed bottom-up, producing a rightmost derivation in reverse, using zero symbols of lookahead. "LR-ness" is a property of the grammar, not the language. So LR(0) can indeed parse any regular grammar (which is a specific kind of grammar; essentially a DFA represented in BNF), even though it can't deterministically parse all grammars that represent regular languages. — Pseudonym, Dec 21 '21 at 01:29
@Pseudonym: LR(0) applies (differently) to both grammars and languages, but independent of that point, only languages with the prefix property can be parsed left-to-right without lookahead, for the reason I noted in the answer (once you see the prefix, you need to know whether or not there is another input symbol in order to decide what to do, which requires lookahead). And the language I used as an example does not have the prefix property, so there is no LR(0) grammar which can be written for it. — rici, Dec 21 '21 at 01:34
The usual workaround is to add an end-of-input symbol; the augmented language where every sentence ends in $$$, or whatever you used to mark end-of-input, does have the prefix property, so it is true that augmented regular languages are LR(0). The problem with GATE is that it is impossible to know precisely what they mean by their questions, since the details of certain definitions depend on what I assume is the national curriculum (or perhaps the vagaries of the understanding of whoever set the exam question), to which I'm not privy. — rici, Dec 21 '21 at 01:46
Also see https://cs.stackexchange.com/a/2715/4416. If you have some good reason to believe that GATE is talking only about augmented regular languages, share it and I'll be happy to modify the answer. — rici, Dec 21 '21 at 01:47
When I read (D) in the link, "regular grammar", to me, means this: https://en.wikipedia.org/wiki/Regular_grammar — Pseudonym, Dec 21 '21 at 04:35
@Pseudonym: that definition allows $A\to\epsilon$, and many grammars with $\epsilon$ productions are not LR(0). ($S\to a A; A\to a; A\to \epsilon$) — rici, Dec 21 '21 at 05:02
Although the regular grammar $S\to a; S\to a A; A\to a$ is also not LR(0). Neither is $S\to a; S\to a a$. — rici, Dec 21 '21 at 05:13
None of those examples are regular grammars. First example: $S \rightarrow a A; A \rightarrow \epsilon,|,a B; B \rightarrow \epsilon$ accepts the same language and is a regular grammar. It is LR(0) if you introduce a terminal symbol: $S \rightarrow a A; A \rightarrow $,|,a B; B \rightarrow $ $. — Pseudonym, Dec 21 '21 at 05:21
@pseudonym: by what definition are those examples not regular? The definition you cited says "all rules have at most one non-terminal" and either all non-terminals are at the end of the rule or all non-terminals are at the beginning of the rule. That's it, according to WP. — rici, Dec 21 '21 at 05:27
Oh, I see. Sorry. It uses a different definition than I was taught. In the definition I was taught, $S \rightarrow a$ is not allowed, and you have to write $S \rightarrow aA; A \rightarrow \epsilon$, because that is how you would write the DFA. — Pseudonym, Dec 21 '21 at 05:31
@Pseudonym: that's how you create a regular grammar to match a DFA. But it's not the definition of what a regular grammar is, and I think you'll have a hard time finding a text which provides a definition different from the WP one. Read Chomsky if you want an "official" version. :-) — rici, Dec 21 '21 at 05:37
@rici, this actually not an actual GATE exam question... It is just a mock exam question to be attempted by GATE aspirants before taking the actual GATE exam, just for practice... — Abhishek Ghosh, Jan 17 '22 at 07:56
@rici in the para ‘perhaps it’s worth noting…’. Do you mean that . comes before non terminal then it’s GOTO, if .comes before a terminal then it’s shift? — e_noether, Dec 30 '22 at 10:20

Why can't a LR(0) parser have GOTO and Reduce Items in the same canonical set?

1 Answers1