4

A drunkard stands at the origin. Each second he goes right with probability $p$, left with probability $q$, and remains still with probability $1-p-q$. When he arrives at $x=+D$ or $x=-D$, he pays $M$ dollars and is transported back to the origin.

Now, there is a new option: at each step, after he knows where he is going to go (left or right), the drunkard has an option to pay $m$ dollars and remain in its place. The drunkard's ultimate goal is to minimize his expeced payment over quadrillion seconds. How should the drunkard decide, in each second, whether to pay?

I think I can solve the special case in which $q=0$, i.e, the drunkard moves only rightwards. In this case, $D$ steps rightwards cost $M$ dollars, so a single step costs in average $M/D$ dollars. Therefore, the drunkard should never pay if $m>M/D$, and always pay if $m<M/D$ (there may be exceptions at the first few steps or at the quadrillionth step, but they are negligible).

However, I am not sure whether the same argument is true for arbitrary $q$.

So, when should the drunkard pay?

  • 1
    $p+q<1$? So once the drunkard decides to move there are only two outcomes, left and right, so how is $p+q\neq 1$? – Hamed Jul 06 '17 at 20:54
  • is it a prob. that he stand still? – serg_1 Jul 06 '17 at 20:57
  • Yes, with probability $1-p-q$ he does not move. I clarified it above. – Erel Segal-Halevi Jul 06 '17 at 21:00
  • 1
    I still don't understand. If the event "standing still" is probabilistic, and if the drunkard stands still pays $m$ dollars, then where is the decision? There is nothing to optimize! As I understood it at first, at each step the drunkard makes a decision to move or not to. If the answer is yes, then a coin is tossed with two outcomes. Maybe I don't understand the question well. – Hamed Jul 06 '17 at 21:05
  • Since the decision to pay $m$ dollars is made after the type of step (left or right) is known, I don't think there's much to be gained by allowing $1-p-q$ to be nonzero, unless you intend to replace "quadrillion" with $N$. – Barry Cipra Jul 06 '17 at 21:22
  • 1
    Oh, missed it, the decision is made after the random value is announced. So there are two types of "standing still". One in which the probabilistic outcome is not moving (with probability $1-p-q$). And one in which the drunkard intentionally decides to pay some money and not move after the random value is announced. – Hamed Jul 06 '17 at 21:23
  • @Hamed yes, exactly. – Erel Segal-Halevi Jul 07 '17 at 04:59

2 Answers2

1

Even for the case $q=0$ your argument does not work. Suppose $N$ is the number of seconds the drunkard has to walk (a constant). Let us call each time the drunkard goes all the way to $D$ a tour.

Suppose $k$ tours are completed in $N$ seconds, with respectively $\{n_1, \cdots, n_k\}$ extra times the drunkards has payed. Also suppose the last tour remains incomplete with $\nu$ payups and $\mu$ non-payups. Then the total money paid will be $$ P=Mk+\nu m+m\sum_{j=1}^{k} n_j $$ Now the $j$th tour takes $D+n_j$ seconds. And the last incomplete tour is $\nu+\mu$ seconds, with $\mu<D$ of course. So the above is conditioned on $$ N=Dk +\mu +\nu + \sum_{j=1}^{k} n_j $$ This is now truly a conditional optimization problem. Now suppose $m>M/D$, you suggest never paying a single penny. Your payout then will be $M\lfloor N/D\rfloor$. But if instead, in each tour, we pay exactly $n$ times, and not paying anything in the last incomplete tour, $$ N=(D+n)k+\mu\Longrightarrow k=\lfloor N/(D+n)\rfloor $$ The payup is then $P_n=M\lfloor N/(D+n)\rfloor+mn$. The payup you propose is the $n=0$ case. For example suppose $N=2D\ell$ for some large $\ell$ and $n=D$. Then $$ P_0=\ell M+\ell M, \qquad P_D=M\ell +Dm $$ Clearly if $\ell M>Dm$ then $P_D$ case is much better. This happens when $\ell>Dm/M$, or $N>2D^2m/M$. This actually tells you $m>M/D$ or not, never paying up is not a good idea, if $N$ is large enough.

Your estimate for $m<M/D$ is much better. Suppose one goes with the strategy of always paying. Then the payup is $P'=mN$. In the same example as before this is worse that $P_D$ unless the condition $(2mD-M)\ell<Dm$ is satisfied. Considering $\ell\ggg 1$, the condition is more or less limited to cases when $m<M/2D$, which is similar to your estimate in this example.

However suppose instead $N=sD\ell$ with $s$ a small integer and $n=(s-1)D$. Then a similar analysis shows that $$ P=M\ell+(s-1)Dm $$ Comparing with $P'=sD\ell M$ (always paying case), the condition for $P'$ being better than $P$ is now $m<M/sD$. I guess my point is, this is a very complicated problem even in case of $q=0$. The behavior is very sensitive to the values of $m,M,D,N$.

Hamed
  • 6,793
  • Thanks for the analysis! But I think there is a mistake in your expression for $P_n$ - it should be $(M+mn)k$. Then the following analysis is much simpler: $P_n = (M+mn)N/(D+n)$ (I am neglecting the final incomplete tour since it is negligible relative to the quadrillion rounds...). Taking the derivative of $P_n$ as a function of $n$ gives that the derivative is positive if $m > M/D$ and negative if $m < M/D$, so in the former case you want $n$ as small as possible (0) and in the latter case you want it as large as possible ($N$). – Erel Segal-Halevi Jul 07 '17 at 14:36
1

This is only a very partial answer, looking at the simplest nontrivial case, $D=2$.

Presumably the quantity of interest can be described as the average expenditure per round, where the process is allowed to continue indefinitely (which I take to be the essential meaning of "quadrillion"). With this in mind, we may as well assume that $p+q=1$. Allowing the possibility of randomly remaining stationary only means you're slowing the average expenditure per round down by a factor of $p+q$, whatever stategy is employed.

In general, since the option to remain stationary isn't decided until after the next putative step has been randomly chosen, there is never any need to do so unless you are on the $M$-dollar "precipice" at one of the two ends, in this case at $+1$ or $-1$. At any other point you may as well take the step, incurring no cost, in hopes of reversing it on the next round.

Let's denote by $N$ the average expenditure per step if you Never pay to remain stationary. Under this strategy, you always wind up back at the origin every other round, so it's only on the even numbered rounds, when you're at $\pm1$, that you risk having to pay $M$. The average expenditure is

$$N={p^2+q^2\over2}M$$

Next, let's denote by $B$ the average expenditure per round if you pay to remain stationary at Both $+1$ and $-1$ (if the alternative would take you to $+2$ or $-2$). This gives a Markov process on the states $\{-1,0,1$} with transition matrix

$$\pmatrix{q&q&0\\p&0&q\\0&p&p}$$

and dominant eigenstate

$${1\over1-pq}\pmatrix{q^2\\pq\\p^2}$$

(Note: $q^2+pq+p^2=1-pq$.) It follows that the average expenditure per round (in the limit) is

$$B={q^3+p^3\over1-pq}m$$

For example, if $p=q=1/2$, then

$$N={1\over4}M\quad\text{and}\quad B={1\over3}m$$

so the crossover occurs at $m={3\over4}M$. (Note, if $pq=0$, then $N=M/2$ and $B=m$, with a crossover at $m=M/2=M/D$, as the OP observed.)

There are, of course, two more stategies: pay to remain stationary at $-1$ but not $+1$, and vice versa. Let's denote by $L$ and $R$ the average expenditure per round for these two strategies. If I've done the algebra correctly, the results are

$$L={p^3\over1+p^2}M+{q^2\over1+p^2}m$$ and $$R={q^3\over1+q^2}M+{p^2\over1+q^2}m$$

For $p=q=1/2$ we get

$$L=R={1\over10}M+{1\over5}m$$

It's not hard to check that, for $p=q=1/2$, we have $600(N-L)(L-B)=(3M-4m)^2$, which means that the $L$ (and $R$) strategy always lies between the $N$ and $B$ strategies, with crossover at $m={3\over4}M$, so for the classic, symmetric drunkard's walk, one should be willing to pay to remain stationary either Never or a Both ends (which makes sense, by symmetry). For general $p+q=1$, however, there can be cases when $L$ or $R$ is optimal.

It might be worth looking carefully at the case $D=3$, but hopefully this gives some indication of how things will go.

Barry Cipra
  • 79,832