Suppose I have a regular language $B$ and a regular language $C$ such that $C \subseteq B$.
How do I find a regular language $A$ such that $A \cap B = C$, where $A$ is represented by a DFA with as few states as possible?
I don't really know where to start looking for research on this, so even pointers to some work in this area would be appreciated.
The motivating case for this general question is building state machines to recognize certain regular expressions within UTF-8 streams, where I can rely on the UTF-8 stream being well-formed. (Or rather, my engine need not return correct values for malformed UTF-8 streams)