3

I've come across this problem in my studies, and I've abstracted it to the more general case here.

Given a finite alphabet, what is a regular expression that matches all strings over the alphabet, except one particular finite substring?

As an example:

Given $\Sigma = \{a, b, c\}$

What is a regular expression that matches all of $\Sigma$ except the substring $ ba$?

What I really want is simply $\Sigma^* - ba$.

Raphael
  • 72,336
  • 29
  • 179
  • 389
user2666425
  • 145
  • 1
  • 2
  • 1
    In your question you state "except one particular finite sub string", where "sub" suggests that you are interested in $\Sigma^* - \Sigma^* ba \Sigma^*$ instead? – Hendrik Jan Sep 12 '13 at 13:21
  • @HendrikJan Yes, that is what I rather meant to say, sorry about that. – user2666425 Sep 12 '13 at 21:12
  • 1
    What have you tried? Have you looked at any comparable example, e.g. via [tag:regular-expressions]? Can you give an NFA and convert it? – Raphael Sep 16 '13 at 07:18

2 Answers2

4

Just draw the minimal complete DFA accepting your string, change the final states to get the complement and now convert this new DFA to a regular expression.

In your case, you will get $\mathcal{A} = (Q, A, \cdot, 1, F)$ with $Q = \{0, 1, 2, 3\}$, $A = \{a, b, c \}$, $F = \{3\}$ and $1 \cdot b = 2$, $2 \cdot a = 3$ and $q \cdot x = 0$ for all other transitions. Thus the automaton for the complement is $\mathcal{A}' = (Q, A, \cdot, 1, F')$ with $F' = \{0, 1, 2\}$.

Converting $\mathcal{A}'$ to a regular expression gives $$ 1 + b + (c + bc + baA)A^* $$

J.-E. Pin
  • 6,129
  • 18
  • 36
  • I have not seen this notation of $1$ in a regular expression. And is $+$ equivalent to the union operation? – user2666425 Sep 12 '13 at 21:14
  • 1
    This is the algebraic notation (quite common) for a regular expression. Union is denoted by + and 1 denotes the empty word, which is the neutral element for the concatenation product: for all words u, 1u = u = u1. – J.-E. Pin Sep 12 '13 at 23:09
  • In the regular expression you gave, it contains '..$baA$'. So doesn't this set (or language) represented by this regular expression obviously contain strings with ba as a substring? Furthermore, if you're doing $A^*$, you will get $ba$, $bba$, will you not? – user2666425 Sep 12 '13 at 23:43
  • By definition, $A = a + b + c$ and hence $baA = baa + bab + bac$. Thus $baAA^*$ is the set of all words of the form $baau$, $babu$ or $bacu$ for some word $u$. None of these words is equal to $ba$. – J.-E. Pin Sep 13 '13 at 07:02
  • But they contain $ba$ as a substring. I rather meant to imply in my question that I want a regular expression for all strings over $\Sigma$ except those containing $ba$ as a substring. – user2666425 Sep 13 '13 at 17:10
  • 1
    The last sentence of your question clearly states "What I really want is simply $\Sigma^*− ba$" and I tried to answer this question. If the question you had in mind was different, it would be better to ask another question. – J.-E. Pin Sep 13 '13 at 20:26
  • You did answer that question. Sorry for the misunderstanding! – user2666425 Sep 14 '13 at 00:25
1

One way of finding such a regular expression is $$ \epsilon+a+b+c+aa+ab+ac+bb+bc+ca+cb+cc+(a+b+c)(a+b+c)(a+b+c)(a+b+c)^*. $$ There might be more succinct solutions, but this always works (for all finite languages).

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
  • I think OP means $\Sigma^-\Sigma^ba\Sigma^*$ as henric mentioned in his comment, otherwise as you write excluding that string is enough. –  Sep 12 '13 at 13:30
  • 1
    I don't see a "way" here, only a final result. – Raphael Sep 16 '13 at 07:18
  • 1
    @Raphael I'm sure you can generalize this. If the maximal word omitted is of length $\ell$, I'm listing all other words of length at most $\ell$, and add $\Sigma^{\ell+1} \Sigma^*$. – Yuval Filmus Sep 16 '13 at 15:01