Find the expected value and variance

Question

Determine the expected value ($\mathbb{E}[X]$) and variance of the number of times it is necessary to roll a dice until the result "1" ocurres 4 times in a row.

I know it is a negative binomial distribution, maybe with parameters 4 and 1/6, and I think $X_n:$ "amount of rolls to obtain $n$ consecutive 1s" could be the random variable.

Any help is welcome!

shoteyes · Accepted Answer · 2021-06-06T03:31:38.730

Edit: I have changed this answer significantly because my original answer which was accepted was for a slightly different question and, not to mention, incorrect. I have (hopefully) fixed all errors.

Unfortunately, a negative binomial distribution is unlikely to be helpful here because it doesn’t account for the number of times “in a row” that an event occurs, only the total amount of times that it does. To take that into account, we treat this experiment as an absorbing Markov chain with transition matrix, $$\begin{pmatrix} 5/6 & 1/6 & 0 & 0 & 0 \\ 5/6 & 0 & 1/6 & 0 & 0 \\ 5/6 & 0 & 0 & 1/6 & 0 \\ 5/6 & 0 & 0 & 0 & 1/6 \\ 0 & 0 & 0 & 0 & 1 \end{pmatrix}.$$ The $(i,j)$-entry of this matrix tells us the probability that the next trial will put us in state $q_{j - 1}$ given that the current trial is in state $q_{i - 1}$ where $$q_k = \begin{cases} \text{“the current streak of $1$’s is $k$”}, & 0 \leq k \leq 3 \\ \text{“the current streak of $1$’s is $4$, or the experiment has ended”}, & k = 4 \end{cases}.$$ Now, the thing about state $q_4$ is that the experiment stops when we hit a streak of four $1$’s in a row; it’s a way of keeping track that we’ve hit the end of the experiment, i.e., $q_4$ is our absorbing state. The other entry probabilities are intuitive: at any step we could roll a $1$ with probability $1/6$, and that increases our streak by $1$, bringing us to next state; or we could not roll a $1$ with probability $5/6$, and that takes our streak back down to $0$. However, once we reach $q_4$ for the first time, the experiment is over and will remain in state $q_4$ with probability $1$.

It’s a standard result in probability theory that $\operatorname{E}(X) = \sum_{n\geq 0} P(X > n)$, and $\operatorname{E}(X^2) = \sum_{n\geq 0}(2n + 1) P(X > n)$ for any random variable, $X$, whose range is nonnegative integers.

One fact about absorbing Markov chains is that $$ P(X > n) = \mathbf{p} A^n \mathbf{1} $$ where $X$ is the number of steps until an absorbing state is reached, $\mathbf{p}$ is the probability row vector of the initial state with absorbing states removed, $A$ is the transition matrix with the rows and columns of any absorbing state removed, and $\mathbf{1}$ is the vector of appropriate length whose entries are all $1$. In our case, $X$ is the number of trials until we hit four consecutive $1$’s (which is $q_4$), $\mathbf{p} = (1, 0, 0, 0)$ because our intitial state is $q_0$ (there is a streak of $0$ if the first trial has yet to happen), and $A$ is our transition matrix without the last row and column.

Hence, $$\begin{align*} \operatorname{E}(X) &= \mathbf{p} \sum_{n\geq 0} A^n \mathbf{1} \\ &= \mathbf{p} (I - A)^{-1} \mathbf{1} \\ &= \mathbf{p} \begin{pmatrix}6^4 & 6^3 & 6^2 & 6\\ 6^4 - 6 & 6^3 & 6^2 & 6\\ 6^4 - 6^2 & 6^3 - 6 & 6^2 & 6\\ 6^4 - 6^3 & 6^3 - 6^2 & 6^2 - 6 & 6 \end{pmatrix} \mathbf{1}\\ &= \begin{pmatrix} 6^4 & 6^3 & 6^2 & 6\end{pmatrix}\mathbf{1}\\ &= 6^4 + 6^3 + 6^2 + 6 \\ &= 1554, \end{align*}$$ and $$\begin{align*} \operatorname{Var} X &= \operatorname{E}(X^2) - \operatorname{E}(X)^2\\ &= \mathbf{p} \sum_{n\geq 0} (2n + 1) A^n \mathbf{1} - 1554^2 \\ &= \mathbf{p} \sum_{n\geq 0} ((2n + 2)A^n - A^n)\mathbf{1} - 1554^2 \\ &= 2\mathbf{p} \sum_{n\geq 0} (n + 1)A^n \mathbf{1} - \underbrace{\mathbf{p} \sum_{n\geq 0} A^n \mathbf{1}}_{=\operatorname{E}(X)} - 1554^2\\ &= 2\mathbf{p}(I - A)^{-2}\mathbf{1} - 1554 - 1554^2\\ &= (2 \cdot 6^8 + 4 \cdot 6^7 + 6 \cdot 6^6 + 8 \cdot 6^5) - 1554 - 1554^2\\ &= 2404650. \end{align*} $$

I have left out the calculation for the entries of $(I - A)^{-2}$ because it is quite tedious and doesn’t look as nice as $(I - A)^{-1}$ does, but any program (or human) that can do matrix operations will confirm the result. Note that I’ve used the fact that $\sum_{n\geq 0}A^n = (I - A)^{-1}$ and $\sum_{n\geq 0} (n + 1)A^n = (I - A)^{-2}$ for any matrix, $A$, whose spectral radius is less than $1$.

Momo · Answer 2 · 2021-06-06T18:52:57.180

There is an easier and more elementary way to calculate the expectation and variance, by conditioning on the first occurrence of a non-1 face.

After a thourough search, I found that "drhab" already did the calculations for the general case, so all you have to do is just to plug in $N=4$ and $p=\frac{1}{6}$ in his second answere here: variance of the number of coin toss to get N heads in row.

Yet another way is to calculate the generating function like in robjohn answer here: Expected Number of Coin Tosses to Get Five Consecutive Heads and use its derivatives to calculate the desired mean and variance.

Yet another way (perhaps more intuitive) is to condition on the number of non-one rolls. If we denote by $T$ the occurence of a non-1 roll, then we can divide each sample in a (random) number of runs:

$$\underbrace{1\ldots 1T}_{\text{Run } \#1}\ \underbrace{1\ldots 1T}_{\text{Run } \#2}\ \cdots\ \underbrace{1\ldots 1T}_{\text{Run } \#N-1}\ \underbrace{1111}_{\text{Run} \#N}$$

Where the last run consists of for ones in a row, and the other $N-1$ runs are one of the $T$, $1T$, $11T$, or $111T$ (note that $N$ is also a random variable).

If we denote by $X_i$ the number of rolls of Run $\#i$, we can write: $$X=X_1+X_2+\ldots+X_{N-1}+X_N$$

Where $X_N=4$ and for $i<N$ we have $X_i$ are i.i.d. random variables distributed as:

$$X_i=\begin{cases} 1 & \text{ with probability }\frac{5\cdot 6^3}{6^4-1} \\ 2 & \text{ with probability }\frac{5\cdot 6^2}{6^4-1} \\ 3 & \text{ with probability }\frac{5\cdot 6}{6^4-1} \\ 4 & \text{ with probability }\frac{5}{6^4-1} \end{cases}$$

Therefore $E[X_i]=\frac{1550}{6^4-1}$, $\operatorname{Var}(X_i)=\frac{381750}{(6^4-1)^2}$, and: $$X=X_1+X_2+\ldots+X_{N-1}+4$$ Where $N\sim\operatorname{Geometric}\left(\frac{1}{6^4}\right)$ independent of $X_1, X_2, \ldots X_N$, so $E[N]=6^4$, $\operatorname{Var}(N)=6^4(6^4-1)$.

By conditioning on $N$:

$$E[X]=E[E[X|N]]=E[(N-1)E[X_1]+4]=(E[N]-1)E[X_1]+4=1550+4=1554$$

And:

$$\operatorname{Var}(X)=\operatorname{Var}(E[X|N])+E[\operatorname{Var}(X|N)]=E[X_1]^2\cdot\operatorname{Var}(N)+\operatorname{Var}(X_1)\cdot(E[N]-1)=2404650$$

I knew the easier method for calculating $\operatorname{E}(X)$ by conditioning, but I assumed it would be difficult to do so for $\operatorname{E}(X^2)$ so instead I went for a Markov approach to calculate each moment. Using drhab’s method and solving $\operatorname{E}(X^2) = \sum_{n = 1}^4 \left(\operatorname{E}(X^2) + 2 (1554) n + n^2\right) (5/6) 1/6^{n - 1} + 4^2/6^4$ for $\operatorname{E}(X^2)$ does give the correct answer of $\operatorname{E}(X^2) = 4819566$. Amazing shortcut! — shoteyes, Jun 06 '21 at 06:04

Find the expected value and variance

2 Answers2