7

Let $X_n$ be the number of longest runs in a sequence of $n$ i.i.d. Bernoulli(1/2) random variables. Here, "run" means a nonempty block of adjacent $1$s; e.g., in the sequence $110110001$, there are two longest runs (each being $11$).

By using a program from OEIS, the limiting distribution of $X_n$, as $n\to\infty,$ is "empirically" suggested to be $$p_k:=\lim_{n\to\infty}P[X_n=k]={1\over k\ 2^k\ \log(2)}\ \ (k=1,2,3,...)$$

Question: As this is probably a known result, can someone cite a published proof? (Or provide any information on how to prove it.)

"Derivation":
As described elsewhere, for $n\ge 1$, there is a bijection between (a) the compositions of $n$ with exactly $k$ largest parts, and (b) the binary strings of length $n-1$ with exactly $k$ longest runs of $1$s, such that these are equal in number. This number is listed as $T(n,k)$ at OEIS A238341, where a program for $T(n,k)$ is also given. (I've posted a separate question asking for explanation/sources of the algorithm implemented by this program.)

NB: Letting $A(n,k)$ be the number of length-$n$ binary strings with $k$ longest runs, and $T(n,k)$ be the number of compositions of $n$ with exactly $k$ largest parts, there is an "edge effect" due to which these are related as follows:

$$A(n,k)=\begin{cases} 1&\text{if $k=0$}\\[1ex] T(n+1,k) &\text{otherwise} \end{cases}$$

and it is to be noted that in fact $A(n,k)=0$ if $k\gt\lceil{n\over 2}\rceil$.

Thus, we have $$P[X_n=k]={A(n,k)\over 2^n} \ \ (k=0,1,2,3,...)$$

Using the OEIS program gives $P[X_n=1]\approx 0.7213$ for $n\gt 100,$ which led me to guess that $p_1={1\over 2 \log 2}$. I then let $c_k:= p_k\ \log 2$, which implies $\sum_{k=1}^\infty c_k=\log 2;$ i.e., a series converging to $\log 2.$ One of the best-known such series is $c_k={1\over k\ 2^k},$ which is strongly supported by computing $(P[X_n=k] \log 2)_{n\gt 100}\approx(1/2,1/8,1/24,1/64, 1/160,...)=({1\over k\ 2^k})_{k=1,2,...}.$


A Python/SageMath rendition of the OEIS program is online at SageMathCell. It takes about a minute there to plot $(P[X_n=k])_{n=1..45}$ for $k=0,1,2,3$.


Aside: (Sorry for the length, but I thought someone might be interested ...)

@kodlu's comment has led me to show that the limit distribution of the number of longest runs in a random binary sequence of length $n\to\infty$ is the same distribution for all of the following different definitions of "run":

  • D1: "run" means a nonempty block of adjacent $1$s.
  • D2: "run" means a nonempty block of adjacent $1$s, or a nonempty block of adjacent $0$s.
  • D3: "run" means a nonempty block of adjacent $1$s, flanked on each side either by a $0$ or by the start/end of the sequence.
  • D4: "run" means a nonempty block of adjacent $1$s, flanked on each side either by a $0$ or by the start/end of the sequence, or a nonempty block of adjacent $0$s, flanked on each side either by a $1$ or by the start/end of the sequence.

Proof:

Clearly, D1 and D3 give the same distribution for all $n$ because the longest run is the same by either definition (hence the limit distributions are the same). Similarly, D2 and D4 give the same distribution for all $n$ (though different from those for D1 and D3).

Therefore, all that remains to show is that the limit distribution for D1 is the same as that for D2. To do so, let $A(n,k)$ (resp. $B(n,k)$) denote the number of length-$n$ binary strings with exactly $k$ longest runs, according to definition D1 (resp. D2), and let $X_n(s)$ (resp. $Y_n(s)$) denote the number of longest runs in sequence $s$, according to definition D1 (D2, resp.). We will show that $$B(n,k)=2\ A(n-1,k) \ \ (0\lt k\lt n)$$ from which it follows that for $0\lt k\lt n$, $$P[Y_n=k] = {B(n,k)\over 2^n}={A(n-1,k)\over 2^{n-1}}=P[X_{n-1}=k] $$ and hence $$\lim_{n\to\infty}P[Y_n=k]=\lim_{n\to\infty}P[X_n=k]\ \ (k=1,2,3,\dots).$$

(We can dispense with $k=0$, because $P[Y_n=0]=0$ and $P[X_n=0]=2^{-n}\to 0.$)

Thus proceeding, with $s[1]$ denoting the first element of $s$, we have, for $0\lt k\lt n$: $$\begin{align}B(n,k) &\overset{1}{=}|\{s\in\{0,1\}^n:\ Y_n(s)=k\}|\\ &\overset{2}{=}2\ |\{s\in\{0,1\}^n:\ Y_n(s)=k,\ s[1]=0\}|\\ &\overset{3}{=}2\ |\{s\in\{0,1\}^{n-1}:\ X_{n-1}(s)=k\}\\ &\overset{4}{=}2\ A(n-1,k) \end{align}$$

Eq.1 implies Eq.2 because there is a bijection between the sequences beginning with $0$ and those beginning with $1$, pairing each $s$ to its complement $\bar s$ (i.e. with all bits complemented), such that $Y_n(s)=Y_n(\bar s).$ (Any run in $s$ exactly matches a run of the complementary symbol in $\bar s.$ )

To show that Eq.2 implies Eq.3, denote the corresponding sets (suppressing $k$) as $$T_n:=\{s\in\{0,1\}^n:\ Y_n(s)=k,\ s[1]=0\}\\[1ex] S_{n-1}:=\{s\in\{0,1\}^{n-1}:\ X_{n-1}(s)=k\}$$

Now, any element of $T_n$ can be written, with unique positive integers $(n_1,...,n_j)$ summing to $n$, in the usual "run-length encoding" form $$t = \ 0^{n_1}1^{n_2}0^{n_3}1^{n_4}\dots b^{n_j}, \ \ b=(j+1)\bmod 2$$

and any element of $S_{n-1}$ can be written, with unique positive integers $(n_1,...,n_j)$ summing to $n$, in the special form $$s = \ 1^{n_1-1}01^{n_2-1}01^{n_3-1}\dots01^{n_j-1}$$

thus setting up the bijections

$$\underbrace{0^{n_1}1^{n_2}0^{n_3}1^{n_4}\dots b^{n_j}}_{\in T_n}\leftrightarrow(n_1,\dots,n_j)\leftrightarrow\underbrace{1^{n_1-1}01^{n_2-1}01^{n_3-1}\dots01^{n_j-1}}_{\in S_{n-1}}$$

Now, $Y_n(0^{n_1}1^{n_2}0^{n_3}1^{n_4}\dots b^{n_j})=X_{n-1}(1^{n_1-1}01^{n_2-1}01^{n_3-1}\dots01^{n_j-1})=$ the multiplicity of $\max(n_1,...,n_j)$, the sole exception being when $n_1=\dots=n_j=1$ (which occurs iff $k=0$ or $k=n$; i.e. when $Y_n(0101...)=n\ne 0=X_{n-1}(000...)).$

Consequently, $|T_n| = |S_{n-1}|$ for $0\lt k\lt n$. QED

r.e.s.
  • 14,371
  • To understand your definition, had you defined runs of 0's would the example have 2 runs of length 2? i.e., Do you allow overlaps or runs have to be delimited by the other symbol or be at the beginning or end of the sequence? – kodlu Jan 16 '24 at 17:02
  • 1
    @kodlu It won't matter, since we're only looking at longest runs. In 110110001, the longest run of 0s is 000, whether or not overlap is allowed. – r.e.s. Jan 16 '24 at 17:09
  • Your conjecture appears in "Brands, J. J. A. M., Steutel, F. W., & Wilms, R. J. G. (1994). On the number of maxima in a discrete sample. Statistics & Probability Letters, 20(3), 209–217. doi:10.1016/0167-7152(94)90044-2 " , formula 3.6 – leonbloy Jan 17 '24 at 13:42
  • @leonbloy There is a non-paywalled 1992 Memorandum by the same authors, with the same title (but no "formula 3.6"). Would you mind checking whether these are substantially the same? – r.e.s. Jan 18 '24 at 06:13

1 Answers1

1

Not really an answer, rather a reference.

Let it be clear that we are considering the longest runs of ones in $n$ fair coins. Using this approach, we can assume the runlengths correspond to iid geometric ($p=1/2$) variables. Because the mean of each one is $2$, we will have approximately near $n/2$ such variables, of which half are ones.

Then we can consider $t=n/4$ iid geometric variables $Y_j \in \{1,2 \cdots\}$ with $p=1/2$.

Letting $m=\max_{1 \le j \le t}(Y_j)$ be its maximum, we are interested in the distribution of the random variable $Z= \sum_{j} [Y_j = m]$ as $t\to \infty$.

(BTW: this formulation implies that, if the distribution exists, it does not matter if we consider the runs of ones, zeros, or both - and that the border effects are negligible).

This is quite a tough problem. The reference paper seems to be Brands, J. J. A. M.; Steutel, F. W.; Wilms, R. J. G., On the number of maxima in a discrete sample, Stat. Probab. Lett. 20, No. 3, 209-217 (1994). ZBL0802.60048. You already found a non-paywalled preprint from 1992 here.

There it's proved that, for our case, there is not a limit distribution of $Z$, there is an oscillation that does not fully vanish. But, numerically, the oscillation is small, so the distribution "almost" converges to a logarithmic distribution:

$$ P(Z=k) \approx \frac{1}{k \, 2^k \log 2}$$

as you conjectured. This corresponds to formula $3.6$ in the original paper, $3.3$ in the 1992 preprint.

leonbloy
  • 63,430
  • Very interesting! I'm still "digesting" the relevant parts of the preprint, but because of the geometric r.v. heuristics, it seems unclear that $Z_t$ and my $X_n$ necessarily have the same asymptotic behavior. E.g., the OEIS entry for "the number of compositions of n+1 with unique largest part" (which could of course be wrong) states that $a(n)\sim 2^{n-1}/\log(2)$, implying $P[X_n=1]={a(n)\over 2^n}\to{1\over 2\log(2)}.$ – r.e.s. Jan 18 '24 at 15:15
  • Well, that agrees with $P(Z=1)$ (there was an error in my formula) – leonbloy Jan 18 '24 at 15:31
  • I'm referring to whether the limit exists or not. OEIS implies it does, under the usual meaning of "$\sim$". (BTW, I had posted a plot of $(P[X_n=1])_{n=10..100}$ elsewhere and mentioned the curious oscillations.) – r.e.s. Jan 18 '24 at 16:03
  • Interesting. I cannot say. It would seem strange that going from the iid geometric setting to the fixed length $n$ original setting makes the oscillation disappear (asymptotically), but ... I cannot say. – leonbloy Jan 18 '24 at 16:42
  • It seems clear that the reference paper does not strictly apply here. The reason is that it treats the multiplicity of $\max(N_1,\dots,N_n)$ where $n$ is not random, whereas my $X_n$ is the multiplicity of $\max(N_1,\dots,N_R)$, where $N_i$ is the length of the $i$th success run and $R$ is the random number of success runs in $n$ Bernoulli trials. – r.e.s. Jan 18 '24 at 18:15
  • Yes, but that should be asympotically equivalent. If we produce $t=n/2$ geometric rv with $p/2$, then sum $\sum Y_i$ will not be exactly $n$ but it concentrate around that value. This is a standard well known tool, see eg https://maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf I concede that here the asymptotics are tricky, but I would doubt that this changes the result. – leonbloy Jan 18 '24 at 18:45
  • FWIW, to my earlier post I've added a new plot of the $P[X_n=1]$ sequence, pushing $n$ up to $60000$. It is indeed consistent with the "almost convergence" described in the reference paper, agreeing with the oscillations having a limiting amplitude of $0.0000073...$ according to the paper. – r.e.s. Jan 31 '24 at 22:00