7

There are 400 houses in a row. Each house is to be coloured with any of the following five colours: red, blue, green, yellow, and white. The colour of each house is to be chosen randomly from among these five colours and independent of any other selection. Now, what is the probability that at least 8 houses in a row will have the same colour?

My idea is to first calculate the number of ways in which there are at least 8 red-coloured houses in a row. Then do the same for green, then blue, yellow and white (${R}\to{G}\to{B}\to{Y}\to{W}$). Suppose, $ n(R) $ denotes the number of ways that there are 8 red-coloured houses in a row.

So, $$n(R) = (number\ of\ ways\ to\ choose\ one\ row\ of\ 8\ houses\ from\ 400\ houses)\times(number\ of\ ways\ rest\ of\ the\ houses\ can\ be\ painted) = \mathrm{393}\!\cdot\!\mathrm{5^{392}} $$

Now, $$ n(R) = n(G) = n(B) = n(Y) = n(W)$$

Initially I was careless to conclude that the required probility will be,

$$ p = \frac{n(R) + n(G) + n(B) + n(Y) + n(W)}{5^{400}}$$

But a few moments later I realised that there is a lot of double-counting in this solution. (For example, when calculating $n(R)$, you can take 1st 8 houses to be red, and the rest the 392 houses are randomly coloured. This will include the case that last 8 houses in a row are red. So, when you come to choose last 8 houses to be red, there will be an occasion when first 8 houses become red-coloured. Similarly there are so many ways that double counting is happening.) What should be done?

5 Answers5

6

Consider a random process that starts by painting the first house with a random color. At each step it paints the next house with a random color out of $m=5$ colors. We are looking for the event that there are $k=8$ same-colored houses in a row.

Let's say that we are in state $s \in \{1,2,\ldots,k-1\}$ if the last $s$ houses but not $s+1$ were of the same color (no matter which color). Initially, with one house, we are in state $1$. Add another state $k$ that indicates that somewhere in the sequence there are $k$ consecutive houses of the same color.

This is a Markov chain with the following transition probabilities: For $s=1,\ldots,k-1$ we have

  • $P_{s,s+1} = \frac{1}{m}$ (next house has the same color: now one more house)
  • $P_{s,1} = 1-\frac{1}{m}$ (next house has different color: back to one house)

State $k$ is absorbing ($P_{k,k}=1$) because if ever we find the $k$ same-colored houses, we are happy and don't care what happens later. All other transition probabilities are zeros. So the transition matrix with $m=5$ colors and $k=8$ is $$ P_{8 \times 8} = \begin{bmatrix} 4/5 & 1/5 & 0 & \ldots & 0\\ 4/5 & 0 & 1/5 & \ldots & 0\\ \vdots &&& \ddots & \vdots \\ 4/5 & 0 & 0 & \ldots & 1/5\\ 0 & 0 & 0 & \ldots & 1 \end{bmatrix} $$

Start with the initial state vector $S=[1,0,\ldots,0]$, multiply it $n-1$ times by the transition matrix from the right, and you get the state vector when we have $n$ houses. The last entry is the probability that we are in the absorbing state.

With $n=400$, $m=5$, $k=8$ we get $0.004\,019\,088\,139$ (to 12 decimals; of course we can also do the matrix multiplication in $\mathbb Q$ and we get the exact probability as a rational, but its denominator has more than 200 digits).

Matlab code:

n=400; m=5; k=8;
s = [1 zeros(1,k-1)];       % Initial state
U = ones(k-1, 1);
P = [U*(1-1/m) diag(U/m);
    zeros(1,k-1) 1];        % transition matrix
s = s * P^(n-1);            % apply transition n-1 times
p = s(k)                    % probability of state k
3

$\mathbf{NOTE=}$ This answer needs a little bit wolfram-alpha or any other sofware to calculate this torturus exponential generating function calculations.

By using P.I.E , we calculate at least $8$ red or at least $8$ blue or at least $8$ green or at least $8$ yellow or at least $8$ white

The exponential generating function for at least $8$ red : $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)$$

The others can be shown by $e^x$

Then , when there are at least $8$ red house , find the coefficient of $x^{400}$ in the expansion of $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg) \times e^{4x}$$ and multiply it by $400!$

By P.I.E ,we know that it will be applied for the other colors , so find $$5 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg) \times e^{4x}\bigg]$$

When two color have at least $8$ house ,then $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^2 \times e^{3x}$$

We know that we can select any of two by $C(5,2)=10$ ways , so $$10 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^2 \times e^{3x}\bigg]$$

When three color have at least $8$ house ,then $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^2 \times e^{2x}$$

We know that we can select any of three by $C(5,3)=10$ ways , so $$10 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^3 \times e^{2x}\bigg]$$

When four color have at least $8$ house ,then $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^4 \times e^{x}$$

We know that we can select any of four by $C(5,4)=5$ ways , so $$5 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^4 \times e^{x}\bigg]$$

When five colors have at least $8$ house ,then $$\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^5 $$

We know that we can select any of three by $C(5,5)=1$ ways , so $$1 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^5 \bigg]$$

By P.I.E $$5 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg) \times e^{4x}\bigg] - 10 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^2 \times e^{3x}\bigg] + 10 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^3 \times e^{2x}\bigg] -5 \times [x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^4 \times e^{x}\bigg] +[x^{400}] \bigg[\bigg(\frac{x^8}{8!}+ \frac{x^9}{9!}+ \frac{x^{10}}{10!}+\frac{x^{11}}{11!}..\bigg)^5 \bigg]$$

Unfortunately , you should use worlfram-alpha to calculte it or any software .After that divide the solution by $5^{400}$

$\mathbf{EDITION=}$ Thanks to @Mike Earnest , I realize that there is an another condition such that the houses must be consecutive. I could not realize it. My answer does not cover the restriction about being consecutive .

$\mathbf{EDITION-2=}$ I spend a little time to find easier way , but i could not. Hence , i am writing this cumbersome solution. This solution needs some knowledge from Enumerative Combinatorics. My method is called Goulden-Jacson-Cluster method. By utilizing @lulu's comment , we will find the number of strings which do not contain single character blocks of length $\geq 8$

I am putting here a article about Goulden -Jackson : https://arxiv.org/abs/math/9806036

I will solve only for length $8$ , the rest is for you , but i highly recommend using a software for it.

The number of strings that do not contain any single block of legth $8$ :

Our alphabet $V= \{R,B,G,Y,W\} \rightarrow |V|=5$ . Moreover our bad words are $\{RRRRRRRR,BBBBBBBB,GGGGGGGG,YYYYYYYY,WWWWWWWW \}$

$$A(x)=\frac{1}{1-dx-\text{weight}(\mathcal{C})}$$

$$\text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[RRRRRRRR] +\mathcal{C}[BBBBBBBB] +\mathcal{C}[GGGGGGGG] +\mathcal{C}[YYYYYYYY] +\mathcal{C}[WWWWWWWW])$$

$\text{weight}(\mathcal{C}[RRRRRRRR])= -x^8-(x+x^2+x^3+x^4 +x^5 +x^6 +x^7)\text{weight}(\mathcal{C}[RRRRRRRR])$

So , $$\text{weight}(\mathcal{C}[RRRRRRRR])= \frac{-x^8}{(1+x+x^2+x^3+x^4 +x^5 +x^6 +x^7)}$$

The other are the same if you calculate , then $$\text{weight}(\mathcal{C})= \frac{-5x^8}{(1+x+x^2+x^3+x^4 +x^5 +x^6 +x^7)}$$

So , $$A(x)=\frac{1}{1-5x-\frac{-5x^8}{(1+x+x^2+x^3+x^4 +x^5 +x^6 +x^7)}}$$

When , you find the expansion this fraction , you need to find the coefficient of $[x^{400}]$ . It will give you the number of strings that satisfy the desired condition in $8$ consecutive houses.

$\mathbf{WARNING=}$ I know that i did not get in calculation details in "edition $2$" , because it take much time to explain the all process in Goulden -Jackson , so i put a link above. You can ask any question if you do not understand the process in the link. This edition is just for giving an approach.

3

We interpret $8$ houses in a row as runs of $8$ consecutive houses. The following answer is based upon the Goulden-Jackson Cluster Method. We consider the set of words of length $n\geq 0$ built from an alphabet $$\mathcal{V}=\{R,G,B,Y,E\}$$ and the set $B=\{R^8,G^8,B^8,Y^8,E^8\}$ of bad words, which are not allowed to be part of the words we are looking for. We derive a generating function $A(z)=\sum_{n=0}^\infty a_nz^n$ with the coefficient of $z^n$ being the number of words of length $n$. Since we want to count the number of words which do have runs of length $8$, the resulting generating function is \begin{align*} \frac{1}{1-5z}-A(z)&=1+5z+5^2z+5^3z+\cdots-A(z)\\ &=\sum_{n=0}^\infty \left(5^n-a_n\right)z^n\tag{1} \end{align*} The wanted number of $400$ houses is the coefficient $[z^{400}]$ of the series in (1).

According to the paper (p.7) the generating function $A(z)$ is \begin{align*} A(z)=\frac{1}{1-dz-\text{weight}(\mathcal{C})}\tag{2.1} \end{align*} with $d=|\mathcal{V}|=5$, the size of the alphabet and $\mathcal{C}$ is the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\sum_{t\in\{R,G,B,Y,E\}}\text{weight}(\mathcal{C}[t^8])\tag{2.2} \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[t^8])&=-z^8-\left(z+z^2+\cdots+z^7\right)\text{weight}\mathcal{C}([t^8])\tag{2.3}\\ \text{weight}(\mathcal{C}[t^8])&=-\frac{z^8(1-z)}{1-z^8}\qquad\qquad t\in\{R,G,B,Y,E\}\tag{2.4} \end{align*} so that according (2.2) and (2.4) \begin{align*} \text{weight}(\mathcal{C})&=\sum_{t\in\{R,G,B,Y,E\}}\text{weight}(\mathcal{C}[t^8])\\ &=\sum_{t\in\{R,G,B,Y,E\}}(-1)\frac{z^8(1-z)}{1-z^8}\\ &=-\frac{5z^8(1-z)}{1-z^8}\tag{2.5}\\ \end{align*} The additional term on the right-hand side of (2.3) takes account of the overlapping of $t^8, t\in\{R,G,B,Y,E\}$ with parts of it. We obtain according to (2.1) and (2.5) \begin{align*} \color{blue}{A(z)}=\frac{1}{1-5z+\frac{5z^8(1-z)}{1-z^8}} &\;\;\color{blue}{=\frac{1-z^8}{1-5z+4z^8}} \end{align*} We finally get \begin{align*} \frac{1}{1-5z}-A(z)&=\frac{1}{1-5z}-\frac{1-z^8}{1-5z+4z^8}\\ &=5z^8+\color{blue}{45}z^9+325z^{10}+2\,125z^{11}\\ &\qquad+\cdots+\color{blue}{\left(5^{400}-a_{400}\right)}z^{400}+\cdots\tag{3.1} \end{align*} where the last line was calculated with some help of Wolfram Alpha.

A small example: The blue marked coefficient of $z^{9}$ shows there are $\color{blue}{45}$ words of length $9$ over the alphabet $\mathcal{V}=\{R,G,B,Y,E\}$ which do contain runs of length $8$. These $45$ words are \begin{align*} \begin{array}{ccccccccc} R\color{blue}{RRRRRRRR}&R\color{blue}{GGGGGGGG}&R\color{blue}{BBBBBBBB}\\ G\color{blue}{RRRRRRRR}&G\color{blue}{GGGGGGGG}&G\color{blue}{BBBBBBBB}\\ B\color{blue}{RRRRRRRR}&B\color{blue}{GGGGGGGG}&B\color{blue}{BBBBBBBB}\\ Y\color{blue}{RRRRRRRR}&Y\color{blue}{GGGGGGGG}&Y\color{blue}{BBBBBBBB}\\ E\color{blue}{RRRRRRRR}&E\color{blue}{GGGGGGGG}&E\color{blue}{BBBBBBBB}\\ \color{blue}{RRRRRRRR}G&\color{blue}{GGGGGGGG}R&\color{blue}{BBBBBBBB}R\\ \color{blue}{RRRRRRRR}B&\color{blue}{GGGGGGGG}B&\color{blue}{BBBBBBBB}G\\ \color{blue}{RRRRRRRR}Y&\color{blue}{GGGGGGGG}Y&\color{blue}{BBBBBBBB}Y\\ \color{blue}{RRRRRRRR}E&\color{blue}{GGGGGGGG}E&\color{blue}{BBBBBBBB}E\\ \\ R\color{blue}{YYYYYYYY}&R\color{blue}{EEEEEEEE}\\ G\color{blue}{YYYYYYYY}&G\color{blue}{EEEEEEEE}\\ B\color{blue}{YYYYYYYY}&B\color{blue}{EEEEEEEE}\\ Y\color{blue}{YYYYYYYY}&Y\color{blue}{EEEEEEEE}\\ E\color{blue}{YYYYYYYY}&E\color{blue}{EEEEEEEE}\\ \color{blue}{YYYYYYYY}R&\color{blue}{EEEEEEEE}R\\ \color{blue}{YYYYYYYY}G&\color{blue}{EEEEEEEE}G\\ \color{blue}{YYYYYYYY}B&\color{blue}{EEEEEEEE}B\\ \color{blue}{YYYYYYYY}E&\color{blue}{EEEEEEEE}Y\\ \end{array} \end{align*}

Result: We find according to (3.1) and Wolfram Alpha the wanted number of words is \begin{align*} \color{blue}{5^{400}-a_{400}}&=[z^{400}]\left(\frac{1}{1-5z}-\frac{1-z^8}{1-5z+4z^8}\right)\\ &\,\,\color{blue}{=1.556\,428\,\ldots\cdot 10^{277}} \end{align*} and the wanted probability is \begin{align*} \color{blue}{\frac{5^{400}-a_{400}}{5^{400}}=4.019\,088\ldots\cdot 10^{-3}} \end{align*}

Markus Scheuer
  • 108,315
  • I wonder something , question says that at least $8$ houses , so should we not count for $n \geq 8$ , please look at @lulu's comment and edition $2$ part of my answer. Have a nice day – Not a Salmon Fish Sep 12 '21 at 11:22
2

For the complementary event, the row of houses decomposes into monochromic of length between $1$ and $7$, where the colors of adjacent sections are different. I will write this as $$ \text{bad sequence = leftmost group + (sequence of non-left groups)}\tag1 $$ The generating function for the first left-most sequence is $$ \text{leftmost group g.f. : }\quad5\cdot (x+x^2+\dots +x^7),\tag2 $$ since there are $5$ choices for the color, and the length can be between $1$ and $7$. Each subsequent section has generating function $$\text{non-left group g.f. : }\quad 4\cdot (x+x^2+\dots +x^7),\tag3$$ since the color must be different from its neighbor to the left. Finally, it is well known that if a combinatorial object has generating function $f$, then the g.f. for sequences of that object is $$\text{g.f. for sequence : }\quad 1+f(x)+f(x)^2+\dots=\frac1{1-f(x)}.\tag4$$ Therefore, combining $(1)$ to $(4)$, the number of bad sequences is $$ \text{# bad outcomes}=[x^{400}]\left(5\cdot (x+\dots +x^7)\times \frac{1}{1-4\cdot (x+\dots+x^7)}\right) $$ Here, $[x^{400}]$ is the most important operator for generating functions, the coefficient extraction operator, and it just means the coefficient of $x^{400}$ when the inside function is expanded as a series. Using a CAS like Wolfram|Alpha, you can verify that $$ P(\text{8 of a color in a row})=1-\left(\frac{\text{# bad outcomes}}{5^{400}}\right)=0.00401908813 $$ Here is a Wolfram|Alpha computation of # bad outcomes.

Mike Earnest
  • 75,930
1

EDIT: I really liked @Jukka Kohonen's answer. I created a python script to do what he did in Matlab:

Python Code:

import numpy as np
# import pdb
n=400; m=5; k=8;
temp=np.zeros((1,k-1))
temp2=np.array([[1]])
s=np.concatenate((temp2,temp),axis=1)      # Initial state
U = np.ones((k-1, 1));
temp=np.multiply(U,(1-1/m))
flattemp=np.concatenate((U/m,),axis=None)
temp2=np.concatenate((temp,np.diag(flattemp)),axis=1)
temp=np.zeros((1,k-1))
temp=np.concatenate((temp,np.array([[1]])),axis=1)

P = np.concatenate((temp2,temp),axis=0) # transition matrix temp=np.linalg.matrix_power(P,n-1) # apply transition n-1 times ss = np.matmul(s,temp)[0] # set ss = s * P^(n-1) and convert to 1D array
initstate=s # s is a reserved keyword in pdb, so set initstate=s

pdb.set_trace()

prob = ss[k-1] # probability of state k print(f'the answer is {prob}')

Hint: Take a look at Time taken to hit a pattern of heads and tails in a series of coin-tosses to help with understanding this question. There is also some good material in Probability of finding a particular sequence of base pairs.

I'm still working on this, but here's how I'm thinking of it: If there were only $n=8$ houses, the answer would be simple: $5*(20\%)^8$. That's because each house has a 20% chance of being a particular color, and 8 of them have to be that color (hence $(20\%)^8$), but since that logic can be followed with any of the 5 colors, that bit is multiplied by 5.

Now the question is how to extend that to 9,10, and on to 400 houses. I'm working on this. I believe that similar to the logic in Probability of finding a particular sequence of base pairs., with 9 houses, we would have a 2 looks at 8-house-groups. So finding the sequence of all the same color would be twice as likely, but to keep from double-counting, we need to subtract the probability of having a sequence that both starts with and ends with the same color, which would be $(20\%)^9$. So the probability with n=9 houses would be:

$5[2(20\%)^8 - (20\%)^9]$

To extend that to n=10 houses,

$5[(n-7)(20\%)^8 - (n-8)(20\%)^9]$

And for n=400 houses:

$5[(393)(20\%)^8 - (392)(20\%)^9]$

$=0.00402688 \approx 0.4\%$

  • 1
    Close, but no cigar. Your approach is correct with $n=10$ and even up to $n=15$, but starting from $n=16$ it overcounts, although ever so little. With $n =16$ you can have a sequence of 8 white houses and then 8 red houses; your formula counts this as two events. (And same for other color pairs.) With $n \ge 17$ you can also two or more separate runs of 8 white houses, etc. – Jukka Kohonen Sep 11 '21 at 19:50
  • @JukkaKohonen true. And, your approach is very powerful and easy to understand. But, I still found the answer to within $0.19$%. – bittahProfessional Dec 16 '21 at 19:54