Do runs of every length occur in this string?

Question

In reference to the strings defined here (constructed by repeatedly appending the last "half" of the current string), consider the particular infinite string $s$ generated by starting with $\text{abc}$:

$$\begin{align} \quad &\text{abc}\\ &\text{abcbc}\\ &\text{abcbccbc}\\ &\text{abcbccbcccbc}\\ &\cdots\\ &\text{______________________________}\\ s = \ &\text{abcbccbcccbcbcccbccbcbcccbccccb...} \end{align} $$

More formally, the general rewriting rule is $$a_0 a_1 \cdots a_{n-1} \ \ \to \ \ a_0 a_1 \cdots a_{n-1} a_{\left\lfloor\frac{n}{2}\right\rfloor } a_{\left\lfloor\frac{n}{2}\right\rfloor+1} \cdots a_{n-1}. $$ Clearly,

$\text{a}$ occurs only in the initial position,
$\text{b}^k$ occurs infinitely often for $k=1$, but never occurs for $k\ge2$,

and one may conjecture that

$\text{c}^k$ occurs for every $k\ge 1$ (and hence, infinitely often for each $k\ge 1$).

How to prove or disprove the conjecture?

Some possibly-relevant facts:

Computations show that the index of the first occurrence of $\text{c}^k$ is as follows, for some small $k$: $$\begin{align} &\text{substring} \quad & \text{index}\\ &\text{c} &\text{2}\\ &\text{cc} &\text{4}\\ &\text{ccc} &\text{7}\\ &\text{cccc} &\text{26}\\ &\text{ccccc} &\text{27308}\\ &\text{cccccc} &\approx 10^{519}\\ &\text{ccccccc} &? \ (\gt 10^{40677})\\ \end{align} $$ (The exact index for the first occurrence of $\text{c}^6$ is a $520$-digit number, and $\text{c}^7$ does not occur in the first $10^{40677}$ terms of $s$.)
Let $L_n$ be the length of the $n$th intermediate string in the generating process illustrated above. Then $$L_{n+1} = L_n + \left\lfloor\frac{L_n + 1}{2}\right\rfloor,\ \ L_0 = 3.$$ Hence, $L_n$ grows exponentially: $$L_n \gtrsim 3(\frac{3}{2})^n.$$
Let $s_n$ be the $n$th intermediate string, and let $t_n$ be the $n$th appended string (so $s_n = s_{n-1} t_n$). Now, every intermediate string ends with $\text{bc}$, so $\text{c}^k$ would first occur only when some $t_n$ begins with $\text{c}^{k-1}$, in which case $s_n= s_{n-1} t_n$ will contain the first occurrence of $\text{c}^k$ beginning at index $L_{n-1}-1$.
After $\text{c}^k$ first occurs, the number of instances of $\text{c}^k$ grows approximately exponentially in the number of iterations, as does the length, and the ratio $$p_k = \frac{\text{number of instances of c}^k\text{ on the }n\text{th iteration}}{L_n} \approx \frac{1}{\text{index of the first occurrence of c}^k} $$ is approximately a constant independent of $n$. Since $\text{c}^{k+1}$ first occurs when one of these instances of $\text{c}^k$ happens to begin the "last half" of an intermediate string, this may be compared to a sequence of Bernoulli trials, each with success probability $p_k$. For such a process, the expected number of trials to get the first success is just $1/p_k$, so the index of the first occurrence of $\text{c}^{k+1}$ would be compared to $$\frac{2}{3} L_{n_k + 1/p_k} \approx 2(\frac{3}{2})^{n_k + i_k} $$ where $n_k$ is the number of iterations to get the first occurrence of $\text{c}^k$, and $i_k$ is the corresponding index. E.g., the first-occurrence index of $\text{c}^6$ would be compared to $2(\frac{3}{2})^{n_5 + i_5} = 2(\frac{3}{2})^{22 + 27308} \approx 10^{4813}$ (when in fact it is approximately $10^{519}$). Similarly, the first-occurrence index for $\text{c}^7$ (if it exists) would be compared to $2(\frac{3}{2})^{n_6 + i_6} \approx 10^{10^{518}}$. These comparisons are quite poor, but may help to understand how the first-occurrence indices can be so enormous.

Your conjecture can be proved using a string based computer program. however, I am interested in seeing approach from high level mathematicians on this question :-) — MonK, Jul 24 '14 at 14:26
Since $a$ never propagates, I imagine there's a way to state rewrite rule without it. (In which case, my point in your previous question regarding doing $abc$ v. $01$ as digits is quite superfluous.) — Semiclassical, Jul 24 '14 at 14:58
@Semiclassical -- The initial $\text{a}$ just acts as a "placeholder". The same sequence (without the initial $\text{a}$) is generated by starting with $\text{bc}$ and using the modified rewriting rule $a_1 \cdots a_{n} \ \ \to \ \ a_1 \cdots a_{n} a_{\left\lfloor\frac{n+1}{2}\right\rfloor } a_{\left\lfloor\frac{n+1}{2}\right\rfloor+1} \cdots a_{n}$, but I don't see how this helps. (BTW, your reference to "my" previous question suggests that you've mistaken me for the poster of the linked question.) — r.e.s., Jul 24 '14 at 23:20
Ack. Sorry for the mistaken attribution. And I don't mean that it makes things easier to prove results, just that it is superfluous either way — Semiclassical, Jul 24 '14 at 23:25
@Sid -- I would be interested in any algorithm/program capable of proving the conjecture. (I'm skeptical that any such is available, however.) — r.e.s., Jul 24 '14 at 23:28
@r.e.s. It's no proof but I've posted the problem of finding higher indices at http://codegolf.stackexchange.com/questions/35209/where-are-the-runs-in-this-infinite-string. cccccchas not been found. — Calvin's Hobbies, Jul 27 '14 at 15:32
@Calvin'sHobbies - Thanks for the link -- the program posted there succeeded in finding the first occurrence of $\text{c}^6$! (I edited it into the table above.) — r.e.s., Jul 28 '14 at 02:30
The size of those numbers is startling. I doubt very much that anyone will be able to track down the first occurrence of $c^7$! — Semiclassical, Jul 28 '14 at 03:21
That lower bound on $c^7$ is kind've stupefying. I very much wish we could give an argument as to why it's so big, or some kind of heuristic for estimating it. (For instance, I note that in the computational thread you suggested that $c^6$ should occur at least 125 times before $c^7$ can appear. Can you spell out that heuristic a bit?) — Semiclassical, Jul 30 '14 at 16:08
@Semiclassical - After $c^k$ first occurs, more instances of it arise in two ways: (1) as copies of the original and other old copies, and (2) as new cases when an "appended half" again happens to begin with $c^{k-1}$. Now consider only the type-2 occurrences. Computations show that there is 1 occurrence of $c^2$ before $c^3$ occurs, 1 of $c^3$ before $c^4$ occurs, 4 of $c^4$ before $c^5$ occurs, 115 of $c^5$ before $c^6$ occurs, ... So far, it's a non-decreasing sequence (1, 1, 4, 115, ...). If it remains so, we should expect at least 115 $c^6$s before $c^7$ occurs. (I mistyped 115 as 125.) — r.e.s., Jul 31 '14 at 05:23
Nice analysis. That suggests another problem that could be asked as a new question: What is the distribution of $c^k$'s at the $N$th index? Given the Bernoulli trials model given above, one should be able to provide a statistical prediction. Numerically, of course , that's quite computationally intensive, and probably better suited for another Code Golf question (interested, @Calvin'sHobbies?) — Semiclassical, Jul 31 '14 at 14:53
@Semiclassical I might get around to it (or you are welcome to ask). I'm just as stupefied as you to how big those indices are. — Calvin's Hobbies, Jul 31 '14 at 16:13
@user1708 I've posted the same question on MO (hoping that it's still on topic over there): http://mathoverflow.net/questions/177996/do-runs-of-every-length-occur-in-this-string — Calvin's Hobbies, Aug 07 '14 at 07:37

score 1 · Answer 1 · edited May 23 '17 at 12:39

1

Okay, to the OP of this question. Since you requested a computer program.

Below is a string based program written in Javascript that is able to calculate index of $cccc$ in your string. See for yourself It does construct your string.

input = 'abc';
m=20;
for (i = 0; i < m; i++) {
l = input.length;
n=(l/2);
input = input + input.substring(n, l);
}
var index=input.search(/cccc/i);
alert(index)

For $m=20$ it constructs a string where $cccc$ is at the index $26$,$ccc$ is at $7$,$cc$ is at $4$.

For $m=30$ it constructs a string where $ccccc$ is at the index of $27308$. (which follows your conjecture).

For $m=40$ it constructs a string, where $cccccc$ is not there and it returns $-1$, to denote that search found nothing.

Intrestingly, when I increase m to $45$, it exceeds the computational power and does not return anything on the browser (I am using chrome on win 7 64-bit). And this is because the browser storage for string as per this answer is about 5 MB which is a string of 2,621,440 characters.

You can however, will get success in re-writing this code in Java because it supports maximum string length of $2^{31}-1$ possibly over 2 billion as per this answer.

But, for that you need at least $1024 MB$ of heap size. I firmly believe your conjecture can be proved :).

Good luck!

PS: I did not had success on recreating this on Java, since I am not an expert. However, I will return to repost the solution in Java.

edited May 23 '17 at 12:39

Community

1

answered Jul 25 '14 at 10:00

MonK

1,794

3

Your program does *not* prove that ($\forall k, \text{c}^k \text{ occurs in }s$), because it must be used to test every $k=1,2,3,...$ (infinitely many) -- a procedure that never halts. (Of course I used a similar program to compute the short table of first-occurrence indices posted in the question. On my system, using Sage, 47 rewrites was the maximum without memory errors.) – r.e.s. Jul 25 '14 at 13:09
By switching to a system with more memory (again using Sage), I was able to extend the search to $55$ rewrites, with the result that $\text{c}^6$ does not occur in the first $17,673,600,662$ ($17^+$ billion) terms of $s$. I've updated the table. – r.e.s. Jul 26 '14 at 13:29
2

Holy smokes! $2.124\times10^{511}$ on python! – MonK Jul 28 '14 at 09:43
And it renders the notion of locating $c^7$ computationally to be quite ludicrous. Heuristic arguments might get an estimate, though. – Semiclassical Jul 28 '14 at 12:15
Oops, that was a typo on my part (now fixed): the index for $\text{c}^6$ is actually $\approx 2.124\ 10^{519}$ (a $520$-digit number!). Of course, that program is doing a smarter search, and the number of positions it has to check is only a very very tiny fraction of $10^{519}$. – r.e.s. Jul 28 '14 at 12:21

Do runs of every length occur in this string?

1 Answers1

Linked