Questions tagged [parsing]

Analyzing (un)structured data to convert it into a structured, normalized format.

297 questions
27
votes
5 answers

Name for this type of parser, OR why it doesn't exist

Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments regarding why this use of that term may be…
9
votes
5 answers

What is the responsibility or benefit of a Tokenizer?

Suppose I had a grammar like: object { members } members pair pair string : value value number string string " chars " chars char char chars number digit digit number I could parse the following…
Johannes
  • 336
7
votes
3 answers

Proper separation between lexing and parsing

I am currently writing a parser which, given a source file, turns it into an AST of some language, respecting the idiomatic process of lexing and then parsing using well-known parser generators (think lex and yacc). However, I am unsure as how to…
ThreeFx
  • 199
5
votes
1 answer

How to add precedence to LALR parser like in YACC?

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I…
greenoldman
  • 1,536
  • 1
  • 14
  • 27
3
votes
3 answers

Generating an AST directly vs. converting from a CST

As I understand it, some parsers generate an abstract syntax tree on the fly, while others first generate a concrete syntax tree and then convert it. What are the tradeoffs between the two? Is there some way to tell what will be easier given a…
Jess
  • 173
3
votes
1 answer

How to parse different number types with LALR(1)

Consider a LALR(1) parser for a file format that allows integer numbers and floating point numbers. As usual, something like 42 shall be a valid integer and a valid float (with some automagic conversion in the background). There might be parsing…
Martin
  • 476
3
votes
2 answers

How to get lookahead symbol when constructing LR(1) NFA for parser?

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute…
greenoldman
  • 1,536
  • 1
  • 14
  • 27
2
votes
2 answers

Is recursive-descent parsing a panacea for DoS threats posed by 'Evil' regexes? Or does evilness stem from the grammar?

ReDos attacks exploit characteristics of some (otherwise useful) regular expressions ... essentially causing an explosion of possible paths through the graph defined by the NFA. So does using a recursive-descent parser (such as ANTLR) necessarily…
1
vote
2 answers

Extracting useful information from free text

We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use whatever open-source tools are available. …
1
vote
1 answer

How to read reduce/shift conflicts in LR(1) DFA?

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.293 in the 2nd edition) and I moved forward from my last question: How to get lookahead symbol when constructing LR(1) NFA for parser? Now I have such…
greenoldman
  • 1,536
  • 1
  • 14
  • 27
0
votes
1 answer

What algorithm to use to parse expressions in this simple language?

I don't want to include the entirety of my language, but I think the relevant part is: ::= | | | | | | …
0
votes
0 answers

Parsing nested structure from facebook pfff - how to decide on parser

I'm playing around since a while with building a little open source project based on the output of facebooks pfff tool, that can parse multiple programming languages and output some kind of unified AST. For example, given a simple PHP file like…
BMBM
  • 337
0
votes
1 answer

Rule order for Parsing Lists with LALR(1)

When creating the grammar for parsing a list (something like “ITEM*”) with a LALR(1) parser, this basically can be done in two ways: list : list ITEM | ; or list : ITEM list | ; What are the pros and cons of these two…
Martin
  • 476
0
votes
1 answer

How lookaheads are propagated in "channel" method of building LALR parser?

The method is described in Dragon Book, however I read about it in ""Parsing Techniques" by D.Grune and C.J.H.Jacobs". I start from my understanding of building channels for NFA: channels are built once, they are like water channels with…
greenoldman
  • 1,536
  • 1
  • 14
  • 27