6

CONTEXT: Was wondering if I play $100$ games, how likely is it that I will have a stretch of $30$ games where I think I'm good because I have a $67\%$ winrate but I actually just have a $50\%$ winrate (and am getting lucky).

I've done calculations to find out the odds of an individual stretch of $30$ coinflips having at least $20$ heads is roughly $5\%$ ($0.04937$).

My initial guess was that you can fit $70$ stretches of $30$ inside of $100$ coinflips, so you could do $1-0.95^{70} = 97\%$. But this is wrong because overlapping stretches of $30$ have linked probabilities (i.e. if coinflips $1$ - $30$ have $30\%$ heads, it is impossible for coinflips $5$-$35$ to have $67\%$ heads).

If anyone is aware of a general formula for this (for this example $N=100$, $n=30$, $x=20$, $p_i=0.5$), that would be great. Thanks.

Tianlalu
  • 5,177
  • 2
    This is going to be quite tricky to get a closed form solution. It is not hard to write a monte carlo simulation to get the numerical answer. Finding the probability of getting a streak of 20 heads say (i.e. a stretch of 20 coinflips with 20 heads) is already quite difficult. See e.g. https://math.stackexchange.com/questions/602123/what-are-the-odds-of-getting-heads-7-times-in-a-row-in-40-tries-of-flipping-a-co – zoidberg Dec 06 '18 at 05:42
  • @norfair I don’t quite understand the complexity. Naively, I want to say “Well, $\binom{30}{20}$ counts all the ways to get $20$ heads in a string of $30$. And there are $70$ ways of embedding a string $30$ in a string of $100$. So the answer is $70\binom{30}{20}/2^{100}$.” However, this is clearly far from correct. – Santana Afton Dec 06 '18 at 13:25
  • 1
    @SantanaAfton, well your intuition has lead you close to computing the expected number of stretches, rather than the probability, which indeed is much simpler to compute. You can put an indicator function at each of the (I think it's 71) substrings of length 30 in a string of length 100 that detects whether 20 heads appeared. Then the expectation is $71\binom{30}{20}/2^{30}$. – zoidberg Dec 06 '18 at 17:03
  • 3
    The point is that expectation is always linear so is relatively easy to compute when you can write it as the sum of indicator events. When the indicator events interact in complicated ways (neither independent nor mutually exclusive), probability can be hard. Here, the overlaps between the substrings is what causes difficulties. You can model all these dependencies via a finite state Markov chain, but naively this would have about $\binom{30}{20}$ states, which is close to useless. – zoidberg Dec 06 '18 at 17:08
  • 1
    Actually, $71\binom{30}{20}/2^{30}$ is the expectation of getting a stretch of exactly 20 heads. Of course you need to sum from 20 to 30 to get "at least." – zoidberg Dec 06 '18 at 17:13

1 Answers1

1

As norfair says, this is not difficult to simulate. As an example in R tried $100000$ times,

require(matrixStats)
set.seed(2018)
games  <- 100
run    <- 30
target <- 20 
prob   <- 1/2
cases  <- 100000

matdat <- matrix(rbinom(games*cases, 1, prob), ncol=games)
matdatcum <- rowCumsums(matdat)
matdatdiff <- matdatcum[, run:games] - cbind(0,matdatcum[, 1:(games-run)]) 
atleasttarget <- rowSums(matdatdiff >= target)

mean(atleasttarget)       # average number of runs hitting target  
mean(atleasttarget >= 1)  # average proportion hit target at least once

this simulation suggests about $3.49$ as the average number of runs hitting the target (compare this with your $0.04937 \times 71 \approx 3.505$ as the theoretical value) and, to answer your actual question, about $0.377$ for the proportion of times it happens at least once (the final digit may be $1$ or $2$ out)

Henry
  • 157,058