To factor a grammar you need to check the alternatives of productions (those separated by |
) whether they have some common symbols in sequence. For example, whether they have a common prefix (that is, whether they start with the same symbol). Join together as many subsequet common symbols (common factors) as you can, and put the rest into parentheses. For example, look at the productions for item
:
item --> NAME price | NAME quantity price
You can rewrite all alternatives one below the other to better see the common prefix:
item --> NAME price
item --> NAME quantity price
Now you can see that they both start with NAME
, and they both end with price
. The only difference is the presence of the middle term quantity
, which seems to be optional:
item --> NAME () price
item --> NAME (quantity) price
So you can rewrite it like this, making a new production for the contents of the parenthesis:
item --> NAME optionalQuantity price
and the new symbol optionalQuantity
you've just introduced can be equal to what? The quantity
or just nothing. So you write it down this way:
optionalQuantity --> quantity
optionalQuantity -->
As you can see, the last production for optionalQuantity
is empty. This form is sometimes called erasure, because it can "erase" its own symbol from other productions (could be replaced by nothing at all). If you're allowed to use EBNF, you can write all the above three productions as such:
item --> NAME quantity? price
The question mark after quantity
means that it could be omitted (it's optional). Then you don't need a separate productions for optionalQuantity
.
You can apply the same technique for items
production:
items --> item | item items
When you rewrite the alternatives separately, you can see, they both start with item
, and then there's an optional items
again:
items --> item
items --> item items
or with parentheses around symbols which are not common:
items --> item ()
items --> item (items)
so in EBNF you could write it just like:
items --> item items?
or in plain context-free grammar as:
items --> item optItems
optItems --> items
optItems -->
And then the syntax is factored. But notice what the above syntax really says: It says that items
is just one item
, which optionally could be followed by more items. In EBNF you could express it shorter like this:
items --> item item*
where star after item
means that it could be repeated zero or more times. But it could be rewritten even further, because one followed by zero or more is the same as one or more. So you can express it more shortly as:
items --> item+
where +
means "one or more".