3

Why is $L=\{w \mid ~|w|\bmod3=\#_a(w)\bmod3\}$ a regular language?

$\#_a(w)$ is the number of $a$'s in $w$.

So far every language that I saw containing modulo was a regular language. Can you give me an example of a non-regular language with a modulo operator in it?

John L.
  • 38,985
  • 4
  • 33
  • 90
Math4me
  • 249
  • 2
  • 8
  • 1
    You asked why $L$ is a regular language, but you didn't define it's alphabet. I suggest reading https://cs.stackexchange.com/questions/1331/how-to-prove-a-language-is-regular on how to prove that a language is regular – Pietro Jun 20 '22 at 08:50
  • There exists a finite state automaton accepting $L$, that is the reason we call it regular. – Hendrik Jan Jun 20 '22 at 23:15

4 Answers4

11

$\newcommand{\m}{\operatorname{\%}}$ Let $d(w)=(|w|-\#_a(w))\m3$, where $n\m 3$ is the remainder of dividing $n$ by $3$ as defined in almost every programming language. Note $L=\{ w\mid d(w)=0\}$.

The simple and critical observation is that there are only 3 possible values for $d(w)$: $0, 1, 2$. A deterministic finite automaton (DFA) can remember and update that value for the input read so far. Accordingly, it is easy to construct a DFA that accepts $L$.

Specifically, let DFA $D$ have three states, $q_0, q_1$ and $q_2$. Here are the transitions of $D$ so that state $q_i$ accept words of $d$ value $i$.

  • $\delta(q_i, a)= q_i$, since reading an $a$ will increase both $|\cdot|$ and $\#_a(\cdot)$ by 1.
  • $\delta(q_0, x)= q_1$, $\delta(q_1, x)= q_2$, $\delta(q_2, x)= q_0$, where $x$ is any symbol that is not $a$, since reading a $x$ will increase $|\cdot|$ by 1 but keep $\#_a(\cdot)$.

Let $q_0$ be the start state and the only final state. Then $L$ is accepted by $D$. So $L$ is regular.


"Every language that I saw containing $\text{modulo}$ was a regular language." This is nice observation. The reason is that there is only finitely many values of $n\m d$ for an integer $n$ and a fixed integer $d$. DFAs are powerful enough to remember and update finitely many information.

However, if there are other conditions involved to define the language, it may not be regular any more.

For example, consider $N= \{a^nb^n \mid n = 0 \bmod 2\}$. $N$ is non-regular since $a^0, a^2, a^4, \cdots$ are pairwise distinguishable. For two different even number $i$ and $j$, $a^i$ and $a^j$ are distinguished by $b^i$ as $a^ib^i\in D$ but $a^jb^i\not\in D$.

John L.
  • 38,985
  • 4
  • 33
  • 90
6

The following language is not regular $L = \{a^n b^m c^n \mid m = n \bmod 2\}$.

To see that $L$ is not regular, suppose towards a contradiction that $L$ is regular and let $p$ be its pumping length. Then, $a^{2p}c^{2p} \in L$ and there is some $1 \le k \le p$ such that $a^{2p + ik} c^{2p} \in L$ for every integer $i \ge -1$. Choosing $i=-1$ yields the contradiction $a^{2p-k} c^{2p} \in L$.

Steven
  • 29,419
  • 2
  • 28
  • 49
5

Lucky enough, your case is quite easy. The language is defined by the rule "total number of letters, modulo 3, equals total number of a's, modulo 3". This is equivalent to "number of letters that are not a's is divisible by 3". Same language, but a lot easier.

You build a state machine with three states $S_0$, $S_1$, and $S_2$. Being in state $S_i$ means "the number of letters that are not a's is i modulo 3". You start in state $S_0$ which is also the only valid ending state. An a returns to the same state, anything that is not an a goes from $S_0$ to $S_1$, from $S_1$ to $S_2$, and from $S_2$ to $S_0$.

If you had only two symbols a and b, then a regular expression for the language would be (a* b a* b a* b)* a*. For more symbols, say a, b, c, d, you would replace b with (b | c | d).

gnasher729
  • 29,996
  • 34
  • 54
5

So far every language that I saw containing modulo was a regular language.

As John L. notes, that's a very good observation. Indeed, any language where the only constraint on words is that some number modulo $n$ (which we can update letter by letter as we parse the input) belongs to some set of numbers modulo $n$ must be regular. This is because such a language can be parsed by a DFA with $n$ states, where each state encodes one of the $n$ possible values of the number, and where the state transitions define how the number changes when a new input letter is appended to the end of the input word.

More generally, we can even prove the following theorem:

Theorem: A language $L$ is regular if (and only if!) it can be represented in the following form: $$L = \{w \mid s(w) \in A\}$$ where $s(w)$ can only take a finite number of possible values and where, for any word $w$ and any letter $c$, knowing $s(w)$ and $c$ is sufficient to determine $s(wc)$.

Again, the proof is basically trivial, given that we know that a language is regular if and only if it is accepted by some DFA: Given a language $L$ defined as above, define a DFA $D$ with one state for each possible value of $s(w)$, with transitions encoding the map $(s(w), c) \mapsto s(wc)$, and let its accepting states be those that correspond to values of $s(w)$ that are in $A$. Clearly $D$ accepts exactly the language $L$. (Conversely, given a DFA $D$ accepting a language $L$, let $s(w)$ be the current state of $D$ after reading the input word $w$ and let $A$ be the set of accepting states of $D$. Then $L$ will satisfy the definition above.)


Note that $s(w)$ doesn't necessarily have to be a single number modulo $n$. For example, for the example language in your question, we could naturally define $s(w)$ to be a pair of numbers modulo $3$: $$s(w) = (|w| \bmod 3, \#_a(w) \bmod 3).$$ As defined, $s(w)$ can still clearly take only a finite number of possible values (nine, to be exact) and it should be easy to see how to update each of the numbers in the pair whenever a new input letter is read.

Of course, in this particular case a more compact encoding, such as $$s(w) = (|w| - \#_a(w)) \bmod 3,$$ is also possible. But the "naive" encoding method is more general, and can be used (with trivial modifications) to show e.g. that the following language is also regular: $$L = \{w \mid |w| \bmod 3 = \#_a(w) \bmod 4\}$$

Ilmari Karonen
  • 2,160
  • 12
  • 18