1

I know that there are already numerous questions that adress this problem. However, I am not interested in a soltuion at all but in an explanation of a particular solution (see https://math.stackexchange.com/a/972946/579544). Unfortunately, the author didn't react to my comment so I hope it's ok if I open a new question.


It is reasonably clear that the required expectation exists. Let us call it $a$. Let $b$ be the expected number of additional rolls we need, given that we have not yet met our goal, but have just tossed a $6$. If the first roll is not a $6$, then we have used $1$ roll, and our conditional expectation, given this happened, is $1+a$. If the first roll is a $6$, then we have used a roll, and the conditional expectation is $1+b$. It follows that $$a=\frac{5}{6}(1+a)+\frac{1}{6}(1+b).\tag{1}$$

Suppose now that we have just rolled a $6$, and have not yet met our goal. With probability $\frac{1}{6}$, we roll a $6$. We have used $1$ roll, and the game is over. With probability $\frac{5}{6}$, we roll a non-$6$, we have used $1$ toss, and the conditional expectation is $1+a$. It follows that $$b=\frac{1}{6}(1)+\frac{5}{6}(1+a).\tag{2}$$


I have encountered this question in a basic stochastic course where we are only equipped with the definition of expected value and conditional probability. Intuitively the recursive formulas are clear to me. But I am wondering how to rigorously derive them just by using our means at hand?

I guess I have to start with something like $$\mathbb{E}(X)=\sum\limits_{k\in X(\Omega)}^{\infty}P( \{X=k\})\cdot k,$$ where $X$ is a random variable that counts the tosses until two consecutive $6$'s have appeared and then do some magic manipulations to get at a recursive relation...

Philipp
  • 4,483
  • 2
  • 10
  • 22
  • 1
    What isn't clear in the given explanation? You have two active states, let $E_1$ be the expected number needed given that the prior throw was a $6$, and let $E_0$ be the expected number if the prior throw wasn't a $6$ (so the answer is $E_0$). Then $E_1=\frac 16\times 1 +\frac 56\times (E_0+1)$ and $E_0=\frac 16\times (E_1+1)+\frac 56\times (E_0+1)$. Now just solve. – lulu Feb 03 '23 at 20:42
  • 1
    @lulu, what exactly do you mean by "active state"? You are just rephrasing the above answer without explaining why the equations should hold. Let me be more explicit: If $a$ is the expected value of a random variable, then it is not clear what the random variable is, what values it can attain and why its expected value satisfies the recursive relation. – Philipp Feb 03 '23 at 21:59
  • I didn't write a recursive formula. Just consider two separate problems. One in which you start with a "prior $6$", that's $E_1$, and one in which you don't, that's $E_0$. Both problems make sense on their own, and they clearly are related in the manner I describe. True, as the author you quote mentions, you have to separately argue that both expectations exist, but that's rather obvious in this case. – lulu Feb 03 '23 at 22:02
  • Perhaps you'd find it easier to just look at what might happen on the first two trials. Note that either you win or the game restarts. That does give you a recursion, sort of, and you get $E=E_0=\frac 1{36}\times 2+\frac 5{36}\times (E_0+1)+\frac 16\times \frac 5{36}\times (E_0+2)$. Again, you still need to argue existence separately. – lulu Feb 03 '23 at 22:04
  • 1
    @lulu, don't get me wrong, intuitively I perfectly understand the recursion and it makes sense to me to say something like "with a probability of $ \frac{5}{36}$ you restart the game..." and this produces the recursive structure. But still this is not a rigorous argument to justify that the expected value can be calculated by the above stated formulas. You need something like that $\mathbb{E}(X)=\sum\limits_{k\in X(\Omega)}^{\infty}P( {X=k})\cdot k$ and then you realize that the sum/series contains something that you can exploit to create the recursion. – Philipp Feb 03 '23 at 22:20
  • No, you don't. But you are free to try to construct an argument along those lines. You'll find, I think, that you repeat the same logic when you try to compute $P(X=k)$. – lulu Feb 03 '23 at 22:23
  • Look up Markov chains. That's what this is an instance of. They are a veery powerful collection of techniques, especially useful when, as here, the paths to success might be very long and messy, but those paths keep returning to familiar positions (what I called "states" earlier). – lulu Feb 03 '23 at 22:24
  • @lulu, ah ok this is what I suspected at first, that it would be easier if you approach the problem from a differnent perspective such as Markov chains. But I still don't get it why whithout any other theory we can simply say that the two equations are a valid way to express the expected value. – Philipp Feb 03 '23 at 22:30
  • Do you understand the second, more recursive, expression I wrote down? That's simply based on the fact that the trials must start out in one of three ways: $X,,6X, 66$ where $X$ denotes any outcome other than $6$. the probabilities are $\frac 56, \frac 5{36}, \frac 1{36}$ respectively. Depending on which branch you take you can calculate the expectation as I wrote. – lulu Feb 03 '23 at 22:33
  • To do the same thing with your sum, note that $P(X=k)=\frac 56\times P(X=k-1)+\frac 5{36}P(X-2)$, assuming that $k>2$. Note that you still have to worry about existence. That's not a huge issue here, but in other problems, it can be a problem. – lulu Feb 03 '23 at 22:34

1 Answers1

3

You would use the concept of conditional expectation to derive the formula. Conditional expectation is defined analogously to regular expectations, except with conditional probabilities instead of prior probabilities. If $X$ is a discrete random variable and $A$ an event, then the conditional expectation of $X$ given $A$, denoted $\mathrm{E}[X|A]$ is defined as:

$$\mathrm{E}[X|A] = \sum_{x} x P(X=x|A)$$

Note that this could just as well have been regarded as the expectation of some new random variable $Z$ whose distribution is the conditional distribution of $X$ given $A$, since conditional probabilities are just probabilities, so all of the useful properties of expectations like linearity of expectation still hold. Note that if $X$ is independent of $A$ then $\mathrm{E}[X|A]=\mathrm{E}[X]$ since $P(X=x|A)=P(X=x)$.

In our derivation we will also use a so-called law of total expectation. Let $\{A_i\}$ be a finite or countable collection of events that partition the sample space. Then:

$$\mathrm{E}[X] =\sum_i \mathrm{E}[X|A_i]P(A_i)$$

To see that the above is true, use the definition of conditional expectation: \begin{align} \sum_i \mathrm{E}[X|A_i]P(A_i) &= \sum_i \left(\sum_x x P(X=x|A_i)\right)P(A_i)\\ &= \sum_x x \sum_i P(X=x|A_i)P(A_i)\\ &= \sum_x x P(X=x)\\ &= \mathrm{E}[X] \end{align}

where I have used the law of total probability, $P(X=x) = \sum_i P(X=x|A_i)P(A_i)$, in line 3. Now with that background, we can see how we would derive the system of equations in your question.

Let $X$ be defined as in your question. Let $B$ be the event that the first dice roll is a 6 and let $A$ be the event that the first dice roll is not a six. Note that $\{A,B\}$ form a partition of the sample space. By law of total expectation: $$\mathrm{E}[X] = \mathrm{E}[X|B]P(B)+\mathrm{E}[X|A]P(A)$$ We know that $P(B) = \frac{1}{6}$ and that $P(A) = 1-P(B)= \frac{5}{6}$. I use $X|A$ to denote the random variable obtained from the conditional distribution of $X$ given $A$. Now, both $X|A$ and $X|B$ are new random variables. Each dice roll is independent and the number of additional rolls $Z$ needed to roll 2 consecutive 6's given event $A$ has the same distribution as $X$. Therefore, $X|(A= \text{event that first dice roll is not a 6)}=1+Z$ has the same distribution as $1+X$. Hence we have $\mathrm{E}[X|A] = \mathrm{E}[1+X] = 1+\mathrm{E}[X]$, by linearity of expectation. The distribution of $X|(B=\text{event that first roll is a 6})$ is trickier, however, and cannot be related to $X$ as simply as in the case of event $A$. We can define a new random variable, $Y$, which is the number of additional rolls needed to roll 2 consecutive 6's after the first roll, so $X|B = 1+Y|B$. We have $\mathrm{E}[X|B] = \mathrm{E}[1+Y|B] = 1 + \mathrm{E}[Y|B]$. Let $a = \mathrm{E}[X]$ and let $b=\mathrm{E}[Y|B]$. The equation we derived now reads:

$$a= \frac{5}{6}(1+a)+\frac{1}{6}(1+b)$$

We are almost there! We just need an expression for $\mathrm{E}[Y|B]$ in terms of itself and $\mathrm{E}[X]$. Let $C$ be the event that the second dice roll is a 6, and let $D$ be that the second dice roll is not a six; again, $\{C,D\}$ partition the sample space. Again, using law of total expectation:

$$\mathrm{E}[Y|B] = \mathrm{E}[Y|C,B]P(C|B)+\mathrm{E}[Y|D,B]P(D|B)$$

Here $\mathrm{E}[Y|C,B]$ denotes the conditional probability of $Y$ given events $C$ and $B$ both occurred. Since the dice rolls are independent, we have $P(C|B) = P(\text{rolling a 6}) = \frac{1}{6}$ and $P(D|B) = 1-P(C|B)= \frac{5}{6}$.

Now, $\mathrm{E}[Y|C,B] = \sum_y y P(Y=y|C,B)$, where I write $P(Y=y|C \cup B)$ for $P(Y=y|C,B)$. But, if the first dice roll is a 6 and the second dice roll is also a 6, then $Y=1$. So, $\mathrm{E}[Y|C,B]=1$. Now, if events $D$ and $B$ both occur (the first two dice rolls are not 6's), we are back at the situation we have started with, needing to count the number of rolls needed until we roll 2 consecutive 6's. Therefore, the conditional distribution of $Y$ given events $D$ and $B$ occurred (the first two dice rolls are not 6's) is the exact same as the distribution of the random variable $1+X$ (Can you see why?). Therefore, $\mathrm{E}[Y|D,B] = \mathrm{E}[1+X] = 1 + \mathrm{E}[X]$, and so we finally arrive at the second equation:

\begin{align} \mathrm{E}[Y|B] &= \mathrm{E}[Y|C,B] P(C|B) + \mathrm{E}[Y|D,B]P(D|B) \\ &= (1)\frac{1}{6}+\left(1+\mathrm{E}[X]\right)\frac{5}{6} \end{align}

Or, written in terms of our variables $a$ and $b$:

$$b = \frac{1}{6}+(1+a)\frac{5}{6}$$

And that is the long way to derive that systems of equations.

Isaac
  • 486
  • I guess this answer is far less "elementary" though than the original answer you cited. – Isaac Feb 03 '23 at 23:11
  • Great answer!! I am still going through it. I think there is a little typo, forgot $\mid B$, instead of "[...] which is the number of additional rolls needed to roll 2 consecutive 6's after the first roll, so $X|B=1+Y$." It should be: "[...] which is the number of additional rolls needed to roll 2 consecutive 6's after the first roll, so $X|B=1+Y|B$." – Philipp Feb 03 '23 at 23:56
  • Yes you're right. Glad you found it helpful! – Isaac Feb 04 '23 at 03:27
  • Just one more question: How do you exactly define the new random variables $X\mid A$ and $X\mid B$? It seems to me that they are defined on a subspace of the original sample space. But usually each random variable must be defined on the whole $\Omega$. Can you elaborate a bit on this issue? – Philipp Feb 05 '23 at 13:19
  • Their definition follows from the definition of conditional probability, i.e. $P(A|B) = \frac{P(A,B)}{P(B)}$, where $P(A,B)$ denotes $P(A\cup B)$. You might find this answer helpful for a more formal elaboration: https://math.stackexchange.com/questions/496608/formal-definition-of-conditional-probability – Isaac Feb 07 '23 at 23:32
  • Would you explain how did you figure out that 1) X|A = 1+ has the same distribution as 1+ and 2) Y doesn't share the same distribution? – revo Apr 10 '23 at 18:41