Should a lexer (tokenizer) handle unknown operators?

Question

I have a list of supported operators, my question is whether the lexer should just yield the token for the operator or raise a syntax error in case that particular operator (let's say "?") doesn't exist in the operators list?

for example, the operators list [+, -]. for the expression "1 ? 2", should the output be [number:'1', operator:'?', number:'1'] or it should raise a syntax error?

The parser should handle it instead of the tokenizer?

I think this would be a syntax error. What will you do if you got "@ 1 $ $ $ 2" instead? which one is the "operator" here? That would be a syntax error and hence you don't want to deal with it. — nir shahar, Oct 30 '21 at 18:18
Thanks Nir. I will simply raise an error when I reach an unknown token — Jonathan1609, Oct 30 '21 at 18:42
Unless you specify this token as "known" and still don't want to compile it. This can be useful, for example - when you want to reserve keywords that haven't been implemented yet in a programming language. — nir shahar, Oct 30 '21 at 18:44
Often the best strategy is for the lexer to package an unrecognized character as an ordinary token (of some unused type) and pass it through to the parser, where it will trigger a syntax error. This allows error processing (reporting and recovery) to be centralised in one place. If you're using the classic (f)lex API, you can just use the fallback rule . { return *yytext; }, and then if you later add a ? token to your language, you don't need to change the lexer at all. — rici, Oct 31 '21 at 02:53
"Should" sounds like a matter of opinion, and the answer may depend on your requirements or your particular situation. — D.W., Oct 31 '21 at 23:13

Should a lexer (tokenizer) handle unknown operators?

0 Answers0