3

I saw this question yesterday . Moreover , it has two beautiful answers such that Number of $n$ length word that can be formed using the alphabets $a$, $b$, $c$, $d$ such that $a$ and $b$ never come together.

The question says that

Number of $n$ length word that can be formed using the alphabets $a$, $b$, $c$, $d$ such that $a$ and $b$ never come together.

I tried to solve this question , but i reached to wrong answer because of overcounting. When i tried Goulden-Cluster method , i made very big mistake because i should have calculated the number of $n$ lenght word that does not contain $ab$ or $ba$. You can see my wrong answer for this question such that

"Lets use recursion.

Let say that $n$ length string end up with $a$ and do not have $a$ and $b $ together , so there are $a_{n-1}$ such strings

Let say that $n$ length string end up with $b$ and do not have $a$ and $b $ together , so there are $a_{n-1}$ such strings

Let say that $n$ length string end up with $c$ and do not have $a$ and $b $ together , so there are $a_{n-1}$ such strings

Let say that $n$ length string end up with $d$ and do not have $a$ and $b $ together , so there are $a_{n-1}$ such strings

$\color{blue}{However}$, when it end up with $a$ or $b$ ,the preceding term before the last term may be $b$ for $a$ and $a$ for $b$ , such that $.....ab$ or $......ba$. Hence ,we must $\color{red}{subtract}$ those strings. These strings can be indicated by $a_{n-2}$.

Then $a_n=4a_{n-1}-2a_{n-2}$ where $a_3=48,a_2=14 , a_1=4 , a_0=1$

MORE EXPLANATION= Lets assume that a string ends up with $\color{red}{a}$ , and it does not have any substring containing $a$ and $b$ together.Because of it ends up with $\color{red}{a}$ , the substring that does not contain any $a$ and $b$ together has lenght $a_{n-1}$.However , we can see that this substring that does not contain any $a$ and $b$ together might start with $\color{blue}{b}$ such that $(....\color{blue}{b}-\color{red}{a})$.

Then , we obtain strings such that they contain $a$ and $b$ are together in the beginning but not in the rest of string. So , there are $a_{n-2}$ such strings. Thus ,we must subtract them from $4a_{n-1}$ strings. Moreover, this situation is valid for those string that end up with $\color{blue}{b}$. Hence there are $2a_{n-2}$ such strings.

Moreover, when i check the result by goulden -cluster method , i reached the same answer."

Now , i am looking for the true solution by using $\color{red}{Goulden - Jackson Cluster}$ method for this question.

Firstly , i thought that i should find the generating function for never $ab$ .Then , find it for never $ba$. At last , add them each other and subtract the generating function of never $ab$ and $ba$ at the same time.

However, i did not work .

Thanks in advance...

2 Answers2

3

$ \def\w{\text{weight}} \def\C{\mathcal{C}} $ Here is the solution using the cluster method. The generating function is $$ f(s)=\frac1{1-4s-\w(\C)}, $$ so we need to find $\w(\C)$. The set of bad words is $\{ab,ba\}$. Proceeding as in the paper, we get $$ \w(\C[ab])=-s^2-s\cdot \w(C[ba])\\ \w(\C[ba]) = -s^2-s\cdot \w(C[ab]) $$ Furthermore, $\w(\C)=\w(\C[ab])+\w(\C[ba])$. Adding the above two equations, you get $\w(\C)=-2s^2-s\cdot \w(\C)$, so that $$ \w(\C)=\frac{-2s^2}{1+s} $$ Finally, $$ f(s)=\frac1{1-4s-\frac{-2s^2}{1+s}}=\frac{1+s}{1-3s-2s^2} $$ You then read off the recurrence $a_n-3a_{n-1}-2a_{n-2}=0$ from the denominator of the generating function.

Mike Earnest
  • 75,930
  • Mike , thanks fro your work , but i want to ask something , when i calculate overlapping, i found it empty set. However, you found it $1$ . Can you explain it , please – Not a Salmon Fish Jun 11 '21 at 08:00
  • For example, for $AB$ , the head is $A$ and the tail is $B,AB$ – Not a Salmon Fish Jun 11 '21 at 08:03
  • The head of AB is A, and the tail of BA is also A. Therefore, overlap(AB, BA) = {A}. $\tag*{}$ When the words are longer, you can have multiple heads and tails. For example, $$ \text{head }(ANAGRAM)={A,AN,ANA,ANAG,ANAGR,ANAGRA}\ \text{tail }(BANANA)={A,NA,ANA,NANA,ANANA}\ \text{overlap }(ANAGRAM,BANANA)={A,ANA} $$ In both of these examples, we had words overlapping with other words. It is also possible to have words overlap with themselves. E.g, overlap(CHURCH, CHURCH) = {CH}. @Bulbasaur – Mike Earnest Jun 11 '21 at 14:01
  • excellent work ! By the way if you like goulden jackson method , you can suggest me any tricky way that you know in combinatorics . I am undergraduate student and i need to improve my math skills over combinatorics. Hence , i always ask other people any recommendation for me. – Not a Salmon Fish Jun 11 '21 at 15:40
  • @Bulbasaur I was glad to learn about the Goulden-Jackson method. One of my favorites is the reflection principle. This is most famously used to compute the Catalan numbers. You can also use this to find the probability distribution for hitting times of random walks. Even more advanced: https://math.stackexchange.com/questions/3149930/the-number-of-monotone-lattice-paths-in-the-vicinity-of-the-diagonal/ – Mike Earnest Jun 11 '21 at 18:19
2

I do not know about the Goulden-Cluster method, but I can explain why $a_n\neq 4a_{n-1}-2a_{n-2}$.

Let us call the strings counted by $4a_{n-1}$ type I, and the strings counted by $2a_{n-2}$ type II. That is,

  • Type I: (valid string of length $n-1$) + (any single letter)

  • Type II: (valid string of length $n-2$) + ($ab$ or $ba$)

The subtraction $4a_{n-1}-2a_{n-2}$ is only valid if all of the type II strings are also type I strings. However, this is not the case.

For this example, $n=5$. Consider $$ dcaba $$ This string is of type II, since it is of the form $dca+ba$, where $dca$ is a valid string of length $n-2$.
However, this string is not of type I since $dcab$ is not a valid string of length $n-1$.

In summary, when you do $4a_{n-1}-2a_{n-2}$, you are subtracting away too many strings.


However, your method can be fixed. As I said before, you are subtracting away too many strings. The strings which are being unnecessarily subtracted all look like (valid string of length $n-3$)+($aba$ or $bab$). Therefore, it seems we could fix your argument by adding strings of this type back in, resulting in $a_n \stackrel{?}=4a_{n-1}-2a_{n-2}+2a_{n-3}$. However, this has a similar problem; now, strings which look like (valid string of length $n-4$)+($abab$ or $baba$) have been added back in when they shouldn't have been, so these need to be subtracted out. If you carry out this argument all the way to the bottom, you get the following correct recurrence: $$ a_n=4a_{n-1}-2a_{n-2}+2a_{n-3}-2a_{n-4}+\dots \pm 2a_{0}\tag{1} $$ Of course, we would like to make this a recurrence with finite order. Here is a trick to do this. $(1)$ implies $$ a_{n-1}=4a_{n-2}-2a_{n-3}+2a_{n-4}-\dots \mp 2a_0\tag2 $$ Adding $(1)$ to $(2)$, you get $$ a_{n}+a_{n-1}=4a_{n-1}+2a_{n-2} $$ This simplifies to the correct recurrence!

Mike Earnest
  • 75,930