1

Please bear with me because I'll ask you with a very basic understanding of math.

During these hard times, I try to acquire and hone new skills so that I can be a little more competitive in finding a job after this whole situation dies down. I chose to learn Python because recently I'm interested in data analysis-related jobs. One day I found a question regarding probabilities and the goal was to write a code that returns the probabilities of a 6-streak (doesn't matter whether it's Head or Tail, HHHHHH or TTTTTT) turn up in a hundred coin tosses and the experiment is repeated up to one thousand times.

I write the code and got a 54.54 % probability that a 6-streak turn up in a hundred coin tosses.

My friends' answers were different and us having little to no background in math and coding only exacerbates the situation. I tried to find an analytical answer by reading similar problem like this, this, and this. After reading from those links and expecting enlightenment, however, it only made me more confused because I initially thought that the analytical answer would be just a mix of simple equations (e.g., plugging constant to the equation, multiplication, or division).

My questions to you all are:

  1. Why can't I just use permutation and combination that I learned in high school to solve the above problem? Also, is 54.54% the right answer to my problem?
  2. What I learned from previous links is many answers refer to Markov's rule or chain and I'm not familiar at all with it, why is that rule used to solve the probability problem? Is this Markov rule a single equation that solves all of the probability of coin tosses or are there any other similar approach?
  3. Says that there are other similar questions that ask what is the probability of 6-streak appear, let's say: (a) 3 times, (b) exactly 3 times, and (c) no more than 3 times. What differs in the approach to each of the problems?
  4. I want to learn more about this similar problem and all data-science related topics (e.g. statistics, big O, etc.). Where do I start if I have no experience nor knowledge at all related to the topics? Is there any curriculum that you guys can recommend?

I'd be really happy if the explanation is detail but the approach is like an explain-it-like-i'm-five type kind of approach. Lastly, sorry for the long rambling and any little input is greatly appreciated. Thank you!

yfr
  • 23

1 Answers1

0

This is not the best forum for details about programming in a specific language. If you did a simulation in Python, you should not expect simulation results to match exact results from combinatorics.

In R, the procedure rle (for Run Length Encoding) shows the number and length of runs in a sequence x of results. For example, here is one way to find the length of the longest run of either 0s (say Tails) and 1a (say Heads) among (a particular) 20 tosses of a fair coin. In this example the answer is six.

set.seed(2020)  $ for reprocucibility
x = sample(c(0,1), 20, rep=T); x
[1] 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0
rle(x)
Run Length Encoding
 lengths: int [1:8] 1 1 1 6 4 2 4 1
 values : num [1:8] 1 0 1 0 1 0 1 0
rle(x)$lengths
[1] 1 1 1 6 4 2 4 1
max(rle(x)$lengths)
[1] 6

If I repeat this procdure many times, I can get a good idea of the distribution of the length ML of longest run among 100 tosses, and thus of its mean. The answer is $7.975 \pm 0.011.$ With $m = 10^5$ iterations one can expect about two place accuracy. (With $m = 10^6$ I got $6.978\pm 0.004.)$

set.seed(1120)
ML = replicate(10^5, max( rle( sample(c(0,1), 100, rep=T))$lengthis ))
summary(ML)
     Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    3.000   6.000   7.000   6.976   8.000  22.000 
mean(ML)
[1] 6.97582          # E(RL)
2*sd(ML)/sqrt(10^5)
[1] 0.01128483       # 95% margin of simulation error
hist(ML, prob=T, br=(2:22)+.5, col="skyblue2", main="Sim Dist'n of ML")

enter image description here

I suppose analogous procedures are available in Plython. You seem to have good links for analytic solutions.

BruceET
  • 51,500
  • Thank you so much for the response @BruceET ! I guess I didn't word my question properly (not an English native speaker). My real question is "What is the probability of at least a 6-streak (doesn't matter whether it's HHHHHH or TTTTTT) appearing in a 100 coin tosses? Assume the experiment is repeated ten thousand times". Anyway, I tweaked some lines in my code to match your answer and I think my generated graph is pretty much the same as yours. – yfr Nov 22 '20 at 06:52