5

What I mean by a "symbolic regular expression" (if there already is a different name for this I'm not aware of it) is a regular expression that may include exponents that are symbolic arithmetic expressions.

Example 1: $a^k|b^*$ means "either $k$ copies of $a$ or zero or more copies of $b$".
Example 2: $a^{k+1}|a^k$ means "either $k$ or $k+1$ copies of $a$".

What I'd like to do is disambiguate such regular expressions. I know that to disambiguate a normal regular expression, you can convert it to an NFA, then a DFA, then back to a regular expression.

The problem is not completely straightforward. For example, $a^k|a^j$ is ambiguous if $j=k$ and unambiguous otherwise. Thus, the appropriate output would be, for example, $$a^k \text{ if } k=j, \qquad a^k|a^j \text{ otherwise.}$$

Does anyone know if there has been anything written about this problem?

Andrew
  • 287
  • 1
  • 14
  • Hmmm. So, you might want to convert an expression such as $\bigl[(a^3)^\ast \mathrel| (a^5)^\ast\bigr]$ to $\bigl[\varepsilon \mathrel| a^3 \mathrel| a^5 \mathrel| a^6 \mathrel| a^8a^\ast\bigr]$, that is separating out all possibilities exhaustively into disjoint collections (a sort of sum-of-products)? – Niel de Beaudrap Aug 10 '12 at 00:15
  • @NieldeBeaudrap Yes, I would want to do that (except instead of 3 and 5 the exponents would be symbolic expressions). – Andrew Aug 10 '12 at 00:50
  • Nota bene: there is hope, as there is no inherently ambiguous regular language. This paper may provide an answer in a special case. – Raphael Aug 10 '12 at 07:16
  • 2
    There is a problem: if you require a case analysis for all possible values of the symbolic exponents, an example such as I gave yields either an indeterminate number of disjuncts in the formula (it is not an augmented-regular-expression, but a schema for such expressions), or an infinite number of cases. If we consider $\bigl[(a^j)^\ast \mathrel| (a^k)^\ast\bigr]$, the number of disjuncts in the result doesn't just depend on whether $j=k$. For example, $k=nj+r$ (for $n,r\geqslant0$) has at least $\lcm(j,k)+1 = \frac{nj^2+rj}{\gcd(j,r)}+1$ disjuncts, which for $n,r,j$ arbitrary is unbounded. – Niel de Beaudrap Aug 10 '12 at 09:45
  • I said the exponents may be "symbolic arithmetic expressions"; there are different classes of expression that could be allowed, e.g. just symbols or integers, polynomials, polynomials and exponentials, ... I am interested in even the simplest case, such as in Niel's example above. – Andrew Aug 15 '12 at 15:05
  • @AndrewMacFie: So, would you admit the answer $a^{{uj + vk ,\mathrel|, u,v \in \mathbb N}}$? It's certainly unambiguous, and I'm not entirely sure how to address the example that I gave without also admitting something like that as a possible answer. – Niel de Beaudrap Aug 15 '12 at 15:09
  • @NieldeBeaudrap I'm not sure whether I would call that unambiguous or not. Something "simpler" than that would be ideal. That's a good question, though. I'll think more about it. – Andrew Aug 15 '12 at 18:32

0 Answers0