Regular Expression Notation

Question

I'm doing a theory of computation course and can't for the life of me find any good resource that will tell me how a regular expression such as (a+b)* converts to set form. I've thought of a binary one that I might be able to answer if I find this information:

(1+ (01)*)10

Would this simply be a string that starts with 1 and ends with 10 with all strings over {0, 1} in the middle of the string?

user11977 · Accepted Answer · 2014-01-15T03:24:54.020

0

In regex, + means 'one or more', while * means 'zero or more'.

So in words your expression is: one or more 1s, followed by zero or more 01s, followed by 10.

For example, 1111101010110 or 110 are both valid, but 01010110 and 111010 are not.

That doesn't actually line up with your earlier example of (a+b)* though, where the * acts on an expression with a + in it. That means zero or more of (one or more as, followed by b).

So take any expressions with several a's followed by a b: aaab, ab, aab, and put them together: aaababaab is valid. So is the empty string, since the star allows 0 such expressions. (In this specific case, a more simpler wording is: every b much be preceded by at least one a).

edited Jan 15 '14 at 03:24

answered Jan 15 '14 at 03:19

user11977

1,416

Excellent answer, this explained everything perfectly. Thanks very much. – user3130467 Jan 15 '14 at 03:26
Notations vary. X+Y here might mean that X is repeated one or more times, or it might mean the union of X and Y. In a theory of computation it is much more likely to be the latter. – MJD Jan 15 '14 at 03:30

MJD · Answer 2 · 2014-01-15T03:34:26.990

$01$ means the string 01 and nothing else: exactly two symbols long, with a 0 and then a 1.

$(01)^\ast$ means zero or more repetitions of 01. So it could be any of $\epsilon$ (the empty string), 01, 0101, 010101, and so on.

In a theory-of-computation course, the $A+B$ notation almost certainly means the union of the two expressions $A$ and $B$. That is, any string that is in the set $A$ or that is in the set $B$. It almost certainly does not mean that $A$ is repeated one or more times, unless it is superscripted, like this: $A^+B$.

Supposing that the $+$ means union, $(1+(01)^\ast)$ means either the string 1 or something represented by $(01)^\ast$ as in the previous paragraph. So one of $\epsilon$ (the empty string), 1, 01, 0101, 010101, and so on. This is just like the previous paragraph, except that it also includes 1.

$(1+ (01)^\ast)10$ means something of the form $(1+ (01)^\ast)$ as in the previous parapgraph, followed by 10. So one of 10, 110, 0110, 010110, 01010110, and so on. Just like the previous paragraph, but with 10 on the end.

The book Higher-Order Perl, available free online, contains complete Perl code for a program that takes a regular expression and that generates all the strings represented by that expression, in order. If you like Perl, it might be worth a look. If not you'll probably get bogged down in the details of the code.

I don't understand why the textbook for your theory of computation course doesn't explain this. Did you look in the index?

Regular Expression Notation

2 Answers2