14

Given two arbitrary regular expressions, is there an "efficient" algorithm to determine whether they match the same set of strings?

More generally, can we compute the size of the intersection of the two match sets?

What algorithms are there to do this, and what complexity class do they live in?

If we disallow the Kleene star, does that alter the picture at all?

Juho
  • 22,554
  • 7
  • 62
  • 115
  • What do you mean by the "size of the intersection"? In most interesting cases, it will be infinitely large; are you interested in sizes w.r.t. $\Sigma^n$? – Raphael May 28 '13 at 07:14
  • @Raphael My understanding is that eliminating the Kleene star forces the size of the set to be finite. – MathematicalOrchid May 28 '13 at 17:22
  • Depends. What other operators are allowed? If you allow complementation, what you say is not true. Also, you ask for the situation with Kleene star, too, so you need to clarify anyway. – Raphael May 28 '13 at 17:43
  • See also http://cs.stackexchange.com/q/12624/755 – D.W. Dec 18 '15 at 04:04

2 Answers2

15

Hendrik Jan gives a good answer for complexity class, but not an algorithm itself.

The simplest algorithm to do this that I know of is to convert the regular expression to a DFA. There are known techniques for converting a regular expression to an NFA, and an NFA to a DFA.

Once you have two DFAs, testing for equivalence is efficient and decidable, since the minimal form of a DFA is unique up to isomorphism.

However, constructing these DFAs from NFAs could take lots of time, and produce extremely large DFAS, exponentially large in the worst case.

Joey Eremondi
  • 29,754
  • 5
  • 64
  • 121
  • I wish you had actually explained how to do it and what the process and result of converting a regex to a DFA/NFA looks like. –  Nov 17 '21 at 09:53
  • @Boris Fair enough, but converting a regex to an NFA is a standard algorithm in automata theory, that an introductory text should explain, as is converting an NFA to a DFA. It seemed orthogonal to the question at hand. – Joey Eremondi Nov 17 '21 at 17:35
14

Equivalence of regular expressions is known to be PSPACE-complete, which is rather bad. The paper "Complexity of Decision Problems for Simple Regular Expressions" lists several subclasses of regular expressions with their respective complexities. (link)

Hendrik Jan
  • 30,578
  • 1
  • 51
  • 105