Analyzing (un)structured data to convert it into a structured, normalized format.
Questions tagged [parsing]
297 questions
27
votes
5 answers
Name for this type of parser, OR why it doesn't exist
Conventional parsers consume their entire input and produce a single parse tree. I'm looking for one that consumes a continuous stream and produces a parse forest [edit: see discussion in comments regarding why this use of that term may be…

Kevin Krumwiede
- 2,596
9
votes
5 answers
What is the responsibility or benefit of a Tokenizer?
Suppose I had a grammar like:
object
{ members }
members
pair
pair
string : value
value
number
string
string
" chars "
chars
char
char chars
number
digit
digit number
I could parse the following…

Johannes
- 336
7
votes
3 answers
Proper separation between lexing and parsing
I am currently writing a parser which, given a source file, turns it into an AST of some language, respecting the idiomatic process of lexing and then parsing using well-known parser generators (think lex and yacc). However, I am unsure as how to…

ThreeFx
- 199
5
votes
1 answer
How to add precedence to LALR parser like in YACC?
Please note, I am asking about writing LALR parser, not writing rules for LALR parser.
What I need is...
...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far.
For now I…

greenoldman
- 1,536
- 1
- 14
- 27
3
votes
3 answers
Generating an AST directly vs. converting from a CST
As I understand it, some parsers generate an abstract syntax tree on the fly, while others first generate a concrete syntax tree and then convert it. What are the tradeoffs between the two? Is there some way to tell what will be easier given a…

Jess
- 173
3
votes
1 answer
How to parse different number types with LALR(1)
Consider a LALR(1) parser for a file format that allows integer numbers and floating point numbers.
As usual, something like 42 shall be a valid integer and a valid float (with some automagic conversion in the background).
There might be parsing…

Martin
- 476
3
votes
2 answers
How to get lookahead symbol when constructing LR(1) NFA for parser?
I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute…

greenoldman
- 1,536
- 1
- 14
- 27
2
votes
2 answers
Is recursive-descent parsing a panacea for DoS threats posed by 'Evil' regexes? Or does evilness stem from the grammar?
ReDos attacks exploit characteristics of some (otherwise useful) regular expressions ... essentially causing an explosion of possible paths through the graph defined by the NFA.
So does using a recursive-descent parser (such as ANTLR) necessarily…

David Bullock
- 189
1
vote
2 answers
Extracting useful information from free text
We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use whatever open-source tools are available. …

Bryan Boettcher
- 2,774
1
vote
1 answer
How to read reduce/shift conflicts in LR(1) DFA?
I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.293 in the 2nd edition) and I moved forward from my last question: How to get lookahead symbol when constructing LR(1) NFA for parser?
Now I have such…

greenoldman
- 1,536
- 1
- 14
- 27
0
votes
1 answer
What algorithm to use to parse expressions in this simple language?
I don't want to include the entirety of my language, but I think the relevant part is:
::=
|
|
|
|
|
| …

NeomerArcana
- 361
0
votes
0 answers
Parsing nested structure from facebook pfff - how to decide on parser
I'm playing around since a while with building a little open source project based on the output of facebooks pfff tool, that can parse multiple programming languages and output some kind of unified AST.
For example, given a simple PHP file like…

BMBM
- 337
0
votes
1 answer
Rule order for Parsing Lists with LALR(1)
When creating the grammar for parsing a list (something like “ITEM*”) with a LALR(1) parser, this basically can be done in two ways:
list
: list ITEM
|
;
or
list
: ITEM list
|
;
What are the pros and cons of these two…

Martin
- 476
0
votes
1 answer
How lookaheads are propagated in "channel" method of building LALR parser?
The method is described in Dragon Book, however I read about it in ""Parsing Techniques" by D.Grune and C.J.H.Jacobs".
I start from my understanding of building channels for NFA:
channels are built once, they are like water channels with…

greenoldman
- 1,536
- 1
- 14
- 27