I'm trying to come up with an implementation of a matcher for regular expressions containing backreferences like:
([a-c])x\1
which would match axa
, bxb
and cxc
but nothing else.
While I've seen a number of posts about the theory about what is the class of languages this type of regexes describe, I didn't manage to find some more concrete implementation details beyond the fact that matchers for Context Sensitive Languages
are Linear Bound Automata
.
Could you point out some resources about the implementation of such matchers?
UPDATE: Currently this is my best reference: "Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions"
PS: These regex features are supported by most standard libraries of modern programming languages (Perl, Java, C# etc), but I wouldn't start there since I believe those implementations are quite terse.