3

I'm studying for my exam and I came across the following exam question from last year, the only way I know how to solve this is build a regex that accounts for all six different series of letters so for example to recognize a string that has the letters a,b and c occur in that order:

$(a+b+c)^*a(a+b+c)^*b(a+b+c)^*c$

The question: Give a regular expression r over the alphabet A = {a, b, c} such that the language determined by r consists of all strings that contain at least one occurrence of each symbol in A. Briefly explain your answer.

  • I think an alternation of six terms of the form $a(a)b(a+b)c(a+b+c)$ is easier to explain. BTW: What has been your* question? – greybeard Dec 28 '19 at 16:44
  • @greybeard: yeah, i belatedly got that. I deleted my comment while thinking about how to prove that formulation in a way that justifies "easier to explain" since it seems to me that OP's formulation requires very little explanation at all. – rici Dec 30 '19 at 07:17
  • @greybeard: The big advantage of yours, it seems to me, is not ease of explanation but rather the fact that it leads to a deterministic grammar. OP's expression is a classic example of exponential state blowup of the standard regex->NFA->DFA algorithm. – rici Dec 30 '19 at 07:32

2 Answers2

5

Your solution looks good to me, and it is probably what they expect of you.

It is interesting to consider the more general question: how large does a regular expression for this language be, as a function of the size of the alphabet? Denoting the size of the alphabet by $n$, Theorem 9 here shows a lower bound of $\Omega(c^n)$ for some (explicit) $c > 1$. (The theorem is for context-free grammars, but a regular expression can be translated to a context-free grammar.) Your construction is $O(n\cdot n!) = 2^{O(n\log n)}$, so there is a some gap here.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
0

Spelling out the six alternatives in their most symmetrical form:
(a+b+c)* a (a+b+c)* b (a+b+c)* c (a+b+c)* +
(a+b+c)* a (a+b+c)* c (a+b+c)* b (a+b+c)* +
(a+b+c)* b (a+b+c)* a (a+b+c)* c (a+b+c)* +
(a+b+c)* b (a+b+c)* c (a+b+c)* a (a+b+c)* +
(a+b+c)* c (a+b+c)* a (a+b+c)* b (a+b+c)* +
(a+b+c)* c (a+b+c)* b (a+b+c)* a (a+b+c)*

greybeard
  • 1,041
  • 2
  • 9
  • 23
  • 2
    Thanks for answering. However, we're not looking for answers that consist solely of an equation or regular expression; we'd like you to support your answer with explanation of how you got it or rationale or justification that it is correct. Can you [edit] your answer accordingly? Thank you! – D.W. Dec 29 '19 at 20:30