4

It is well known that "regex" with back-reference is not a regular expression anymore. For instance, (.*)\1 matches any string repeated twice. However, is it possible to construct a DFA for regex with lookaround?

I found this question and particularly this answer, which says

Look-ahead and look-behind are nothing special in the world of finite automata as we only match whole inputs here. Therefore, the special semantic of "just check but don't consume" is meaningless; you just concatenate and/or intersect checking and consuming expressions and use the resulting automata.

(emphasis mine)

However, I'm not quite sure how to intersect the checking and matching automata. For simple cases such as foo(?=bar), I can just concatenate them (foobar). But this won't work for more complex cases such as foo(?=bar)(?=baz): since lookahead will not consume the input, what follows foo must match both bar and baz.

This answer suggests using Alternating Finite Automaton, but after reading through the tremendously unhelpful Wikipedia page, I am totally confused how this can help turn lookarounds into plain old DFA.

user12986714
  • 141
  • 3
  • 4
    None of your sources contains a definition of lookahead or lookbehind, making it difficult to answer the question. In particular, I'm not sure how to interpret more than one of each. – Yuval Filmus Dec 25 '21 at 09:18

1 Answers1

4

Lookaround assertions do not extend expressive power of regular expressions: it is still the set of regular languages. See [1] for proof. There is also a reference to the paper [2] where they prove this by conversion to a DFA.

[1] Berglund M, van der Merwe B, van Litsenborgh S (2021) Regular Expressions with Lookahead. JUCS - Journal of Universal Computer Science 27(4): 324-340. https://doi.org/10.3897/jucs.66330

[2] Takayuki Miyazaki, Yasuhiko Minamide, Derivatives of Regular Expressions with Lookahead, Journal of Information Processing, 2019, Volume 27, Pages 422-430, Released on J-STAGE June 15, 2019, Online ISSN 1882-6652, https://doi.org/10.2197/ipsjjip.27.422