0

A friend asked me this probability problem but I could not provide an accurate answer.

Suppose you are given a randomly generated N-ary string. Depending the string, certain digits/characters may form runs. For instance, in the string "2971300770900058444481622", for 0, the longest one of these runs is of length 3.

Given a N sized sample of digits/characters, a L length string, and a distinct character/digit (it does not matter which one — the probability of each one is equal) what is the expected value for the length of the max run?

Please give an explicit and exact formula if possible. If it is not possible, explain why.

As an example, for a randomly generated decimal string of length 10000 (N=10, L=10000), the expected length of the max substring of zeros is ~3.6.

  • The expected number of runs is $\sum_{m=1}^L P(\ell_L \ge m)$, where the formula for $P(\ell_L\ge m)$ comes from this question, using $p=1/N$. – Mike Earnest Dec 02 '21 at 00:08
  • It can be shown that the length of the longest run tends to $\log_N L$. as $L\to \infty$. See https://www.sciencedirect.com/science/article/pii/0001870885900039 for a citation, it's equation $(1)$ in the introduction. – Mike Earnest Dec 02 '21 at 00:15

0 Answers0