1

Consider the language:

$L=$binary strings that contain a substring of the form $ww$, where $w \in (0+1)(0+1)^*$.

I am convinced this language is not regular, as $w$ can have arbitrary length due to the $(0+1)^*$ term, and whatever is generated by this term must be remembered for the next occurrence of $w$ immediately thereafter. So, a DFA cannot accept this language. Is this correct? Perhaps, I could show this more formally with the pumping lemma, but I just want to make sure the intuition is correct.

Now, do you agree that the way I have defined the language above implies that the first occurrence of $w$ must be the same as the second occurrence of $w$? For example, $w$ cannot equal $01$ in its first instance, but $011$ in its second instance.

Now, consider the language:

$T=$binary strings that contain a substring of the form $wy$, where $w,y \in (0+1)(0+1)^*$.

T is regular. Because $(0+1)^*(0+1)=(0+1)(0+1)^*$, the regex representing $T$ can be written as $(0+1)^*(00+01+10+11)(0+1)^*$. Is this correct?

Now, back to language $L$. This was defined in some exercises I was doing exactly as it is above. The solution said that $L$ is regular and can be represented by regular expression: $(0+1)^*(00+11+0101+1010)(0+1)^*$.

Lastly, If my thinking about $L$ is incorrect for some reason, does the regular expression $(0+1)^*(0+1)(0+1)^*(0+1)(0+1)^*(0+1)^*$ equal $L$?

I hope I have communicated my thoughts clearly on this matter. Please help me understand.

  • any word containing the substring $00$ or $11$ satisfies your condition, can you take it up from here? – Ariel Mar 22 '21 at 09:17
  • Please don't delete your question after receiving an answer. It can potentially be considered impolite to do so. Part of our mission is to build up an archive of high-quality questions and answers, and answerers might be responding on this basis, not only to help you, but also to help others with a similar question in the future. – D.W. Mar 23 '21 at 06:42

1 Answers1

1

Your language is regular.

Let's see which words do not belong to $L$. If a word contains $00$ or $11$ as a substring that it does belong to $L$, hence any word not in $L$ must have alternating $0$s and $1$s. If such a word has length at least $4$ then it must start with $0101$ or $1010$, and so again it is in $L$. We are left with the following words, which comprise $\overline{L}$: $$ \epsilon, 0, 1, 01, 10, 010, 101. $$ Since $\overline{L}$ is finite, it is regular, hence so is its complement $L$.

What this argument shows is that any word in $L$ contains one of $00,11,0101,1010$ as a substring. This explains the given regular expression.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503