3

For an alphabet $A = \{ a_1, a_2..., a_n \}$, the set of regular langages $L_r$ on $A$ are recursively defined by closed union, concatenation, and Kleene star's operator. I understood that languages ($A^*$) and regular languages (a subset of $A^*$) are different. Why do we need Kleene star, isn't concatenation enough for this definition?

Very simple "proof" that should be obviously wrong:

If $X \in L_r$ a regular language on $A$, and $E \in X^*$ (if i'm right also $X^* \in L_r$) a word then we could write $E$ as $e_1e_2\dots e_n = e_1 . e_2 \ldots e_{n-1} . e_n$, with each $e_i \in X$. Then $E$ is explicitly constructible by concatenation.

I forgot $\epsilon$ but so just add a simple rule that allow $\epsilon$. My intuition says that it has something to do with infinity, that Kleene Star allows infinite-lengh chains whereas concatenation doesn't. Is it that?

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
rafoo
  • 165
  • 6
  • 2
    For the same reason we need while-loops in programming languages. If we can't iterate all programs will stop within a fixed number of steps. – Hendrik Jan Feb 03 '21 at 17:50
  • 1
    @HendrikJan Something something recursion :) Which actually has an analogue in this context as well - CFGs allow recursion, regular expressions do not. – orlp Feb 03 '21 at 19:13
  • @orlp Agreed. I was comparing regular expressions to "sequence, selection, iteration" programs. – Hendrik Jan Feb 04 '21 at 13:27

1 Answers1

6

Regular expressions without Kleene star define finite languages. You can prove this by induction on the structure of the regular expression. In contrast, $a^*$ is a regular expression which defines an infinite language.

We could try to define $a^*$ using concatenation and union: $$ a^* = \epsilon + a + a^2 + a^3 + \cdots $$ Unfortunately, the required regular expression is infinite, which we do not allow (infinite expressions do make sense in some contexts, for example in infinitary logic, but not here).

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
  • 1
    +1. And not only don't we allow infinite unions in the definition of regular expressions, we actually can't do so -- or at least, that definition wouldn't be equivalent to the standard one -- because that would also allow languages such as $a^n b^n = \epsilon + ab + aabb + aaabbb + aaaabbbb + \cdots$ that are known to be non-regular according to the standard definition. – ruakh Feb 04 '21 at 04:53
  • 2
    In fact, infinite unions could represent all languages. – Yuval Filmus Feb 04 '21 at 06:20