2

consider the words over the alphabet $\{a,b,c,d,e\}$ of length $7$. What is the number of words that does not contain "de" and "abd" ?

A reasonable solution (I think) is to set $t_n$ to be the number of $n$-th long words without "de" and "abd", and then to find a recurrence relation. From this recurrence we can compute $t_7$.

What is that recurrence relation? Is there another solution without recurrence? Thanks!

Ben Grossmann
  • 225,327
boaz
  • 4,783

3 Answers3

5

The following answer is based upon the Goulden-Jackson Cluster Method. We consider the set of words of length $n\geq 0$ built from an alphabet $$\mathcal{V}=\{a,b,c,d,e\}$$ and the set $B=\{abd,de\}$ of bad words, which are not allowed to be part of the words we are looking for. We derive a generating function $f(s)$ with the coefficient of $s^n$ being the number of wanted words of length $n$.

According to the paper (p.7) the generating function $f(s)$ is \begin{align*} f(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=5$, the size of the alphabet and $\mathcal{C}$ is the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[abd])+\text{weight}(\mathcal{C}[de])\tag{2} \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[abd])&=-s^3\\ \text{weight}(\mathcal{C}[de])&=-s^2-s\cdot\text{weight}(\mathcal{C}[abd])\tag{3}\\ \end{align*} so that \begin{align*} \text{weight}(\mathcal{C})=-s^3+\left(-s^2-s\cdot\left(-s^3\right)\right)=-s^2-s^3+s^4 \end{align*} The additional term on the right-hand side of (3) takes account of the overlapping of $ab\color{blue}{d}$ with $\color{blue}{d}e$.

We obtain according to (1) and (3) \begin{align*} f(s)&=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-5s+s^2+s^3-s^4}\\ &=1 + 5 s + 24 s^2 + 114 s^3 + 542 s^4 + 2577 s^5\\ &\qquad + 12\,253 s^6 + \color{blue}{58\,260} s^7 + 277\,012 s^8 + 1\,317\,124 s^9 +\cdots \end{align*} where the last line was calculated with the help of Wolfram Alpha.

Result: The blue marked coefficient of $s^{7}$ shows there are $\color{blue}{58\,260}$ words of length $7$ over the alphabet $\mathcal{V}$ which do not contain $abd$ or $de$.

Markus Scheuer
  • 108,315
  • May I know where did you study? Do you know some related books that are good for self learning? If you know a book with a simple English that would be fine because I am Arabic and my English is poor. You solved my problem before some months "words counting and ranking" amazingly, and you solved this problem amazingly, too. – Hussain-Alqatari Sep 08 '19 at 19:47
  • @Hussain-Alqatari: I've studied at the university of Vienna many years ago. You might start with H. Wilf's Generatingfunctionology. Please consult this post for more information. – Markus Scheuer Sep 08 '19 at 20:21
  • Nice solution! I would just like to point out that we don't really need Wolfram Alpha. From the GF equation we have $(1-5s+s^2+s^3-s^4) f(s) = 1$ so if $f(s) = \sum_{n=0}^{\infty} a_n s^n$ then $a_n - 5 a_{n-1}+a_{n-2}+a_{n-3}-a_{n-4}=0$ with $a_1=1$, so we have a recursion from which we can easily calculate $a_2,a_3,a_4$, etc. – awkward Sep 09 '19 at 13:33
  • @awkward: Yes, you're right of course, thanks. I use WA often as verification tool and typically, I also write additionally a code snippet, which generates the valid words (for small exponents). This way I can check the result with different approaches. – Markus Scheuer Sep 09 '19 at 13:44
  • Just a correction to my previous comment, which I can't edit now: I should have written $a_0=1$, not $a_1=1$. – awkward Sep 09 '19 at 13:58
1

Note: This answer is incorrect; see comments below for discussion


We can justify a recurrence as follows: as soon as there is a starting string that "fails to be" either de or abd, we can freely append any string that has no de nor abd.

We can fail produce a forbidden string in the following ways:

  • Begin with b,c, or e. (3 ways, 1 slot used),
  • Begin with d, followed by any letter besides e (4 ways, 2 slots used),
  • Begin with a, followed by any letter besides b (4 ways, 2 slots used),
  • Begin with ab, followed by any letter besides d (4 ways, 3 slots used).

This leads to the following recurrence: $$ t_n = 3t_{n-1} + 8t_{n-2} + 4t_{n-3}; n \geq 3. $$ The initial conditions are $$ t_0 = 1, \quad t_1 = 5, \quad t_2 = 24. $$

Ben Grossmann
  • 225,327
  • Thanks! I noticed that by your relation $t_3=116$. Should that note be $t_3=5^3-11=114$ (since we remove "abd" and "de$\star$", "$\star$de) ? – boaz Sep 08 '19 at 19:41
  • @boaz I agree with your observation, and honestly I'm not sure where I went wrong. If setting $t_0 = 1$ is problematic then maybe the same recurrence using $t_1,t_2,t_3$ will give you the correct sequence, but I can't see why setting $t_0 = 1$ would lead to any issues here. – Ben Grossmann Sep 08 '19 at 19:50
  • 1
    I think I see the problem: my recurrence is invalid since, for example, ad and dd cannot be freely followed by a valid string. – Ben Grossmann Sep 08 '19 at 19:51
  • This is troubling; I really see no quick fix here. Sorry about that. I'll leave it up in case it's useful. – Ben Grossmann Sep 08 '19 at 20:05
1

Here we use PIE the inclusion-exclusion principle to count the number of valid $7$-letter words from the alphabet $\{a,b,c,d,e\}$ which do not contain the bad words $\{abd,de\}$.

In order to do the job some kind of bookkeeping is helpful. We consider \begin{align*} .\ .\ .\ .\ .\ .\ . &-\left(abd\ .\ .\ .\ .|de\ .\ .\ .\ .\ .\right)\tag{1}\\ &+\left(abd\ abd\ .|abd\ de\ .\ .|abde\ .\ .\ .|de\ de\ .\ .\ .\right)\tag{2}\\ &-\left(abde\ abd|abde\ de\ .|abd\, de\, de|de\ de\ de\ .\right)\tag{3} \end{align*}

Comment:

  • In (1) we count all $7$-letter words indicated by seven dots which gives $5^7$. Then we subtract all words which contain at least one bad word.

    Since $abd$ consumes three characters and four are left for free assignment, we count $\binom{5}{1}5^4$ words of this kind and similarly $\binom{6}{1}5^5$ in the other case with the bad word $de$.

  • In (2) we add words containing two bad words as compensation for those which we've subtracted twice in (1), noting that we also have to consider overlaps $abd$ with $de$ giving $\color{blue}{abde}$.

  • In (3) we finally subtract words containing three bad words which were added twice in (2). For instance $abde\ abd$ occurs in $\color{blue}{abd\ abd\ .}$ as well as in $\color{blue}{abde\ .\ .\ .}$

  • No more cases are left to consider, since words containing four or more bad words have length $>7$.

We obtain according to (1) to (3): \begin{align*} 5^7&-\left(\binom{5}{1}5^4+\binom{6}{1}5^5\right)\\ &\quad+\left(\binom{3}{2}5^1+2\binom{4}{2}5^2+\binom{4}{1}5^3+\binom{5}{2}5^3\right)\\ &\quad-\left(2\binom{2}{2}5^0+2\binom{3}{2}5^1+\binom{3}{2}5^0+\binom{4}{3}5^1\right)\\ &=78\,125-(3\,125+18\,750)+(15+300+500+1\,250)-(2+30+3+20)\\ &=78\,125-21\,875+2\,065-55\\ &\,\,\color{blue}{=58\,260} \end{align*}

Note: The Goulden-Jackson Cluster method used in another post is based upon the PIE approach and conveniently hides all this bookkeeping from us.

Markus Scheuer
  • 108,315