Determining a minimal-state regular language $A$ such that $A \cap B = C$ given regular languages $B$ and $C$

Question

Suppose I have a regular language $B$ and a regular language $C$ such that $C \subseteq B$.

How do I find a regular language $A$ such that $A \cap B = C$, where $A$ is represented by a DFA with as few states as possible?

I don't really know where to start looking for research on this, so even pointers to some work in this area would be appreciated.

The motivating case for this general question is building state machines to recognize certain regular expressions within UTF-8 streams, where I can rely on the UTF-8 stream being well-formed. (Or rather, my engine need not return correct values for malformed UTF-8 streams)

I'm quite familiar with the named theorem; however, I don't see how to apply it here. Did you have something in particular in mind? — Daniel Martin, Oct 05 '16 at 01:06

score 5 · Accepted Answer · edited Apr 13 '17 at 12:48

This question is essentially answered here. You are asking for the smallest DFA such that its language contains all words from $C \cap B$ and no words from $C \setminus B$. This called a minimal separating DFA (for $C \cap B$ and $C \setminus B$) in answer number one of the linked other question, where also a link to a paper with an algorithm is given. The other answer in that thread gives a reference to a paper showing that the problem is at least NP-hard. Since the size of the separating automaton is bounded (by the product of the DFA sizes for $B$ and $C$), the (decision version of the) problem is NP-complete.

Determining a minimal-state regular language $A$ such that $A \cap B = C$ given regular languages $B$ and $C$

1 Answers1