25

In reference to the strings defined here (constructed by repeatedly appending the last "half" of the current string), consider the particular infinite string $s$ generated by starting with $\text{abc}$:

$$\begin{align} \quad &\text{abc}\\ &\text{abcbc}\\ &\text{abcbccbc}\\ &\text{abcbccbcccbc}\\ &\cdots\\ &\text{______________________________}\\ s = \ &\text{abcbccbcccbcbcccbccbcbcccbccccb...} \end{align} $$

More formally, the general rewriting rule is $$a_0 a_1 \cdots a_{n-1} \ \ \to \ \ a_0 a_1 \cdots a_{n-1} a_{\left\lfloor\frac{n}{2}\right\rfloor } a_{\left\lfloor\frac{n}{2}\right\rfloor+1} \cdots a_{n-1}. $$ Clearly,

  • $\text{a}$ occurs only in the initial position,
  • $\text{b}^k$ occurs infinitely often for $k=1$, but never occurs for $k\ge2$,

and one may conjecture that

  • $\text{c}^k$ occurs for every $k\ge 1$ (and hence, infinitely often for each $k\ge 1$).

How to prove or disprove the conjecture?


Some possibly-relevant facts:

  • Computations show that the index of the first occurrence of $\text{c}^k$ is as follows, for some small $k$: $$\begin{align} &\text{substring} \quad & \text{index}\\ &\text{c} &\text{2}\\ &\text{cc} &\text{4}\\ &\text{ccc} &\text{7}\\ &\text{cccc} &\text{26}\\ &\text{ccccc} &\text{27308}\\ &\text{cccccc} &\approx 10^{519}\\ &\text{ccccccc} &? \ (\gt 10^{40677})\\ \end{align} $$ (The exact index for the first occurrence of $\text{c}^6$ is a $520$-digit number, and $\text{c}^7$ does not occur in the first $10^{40677}$ terms of $s$.)

  • Let $L_n$ be the length of the $n$th intermediate string in the generating process illustrated above. Then $$L_{n+1} = L_n + \left\lfloor\frac{L_n + 1}{2}\right\rfloor,\ \ L_0 = 3.$$ Hence, $L_n$ grows exponentially: $$L_n \gtrsim 3(\frac{3}{2})^n.$$

  • Let $s_n$ be the $n$th intermediate string, and let $t_n$ be the $n$th appended string (so $s_n = s_{n-1} t_n$). Now, every intermediate string ends with $\text{bc}$, so $\text{c}^k$ would first occur only when some $t_n$ begins with $\text{c}^{k-1}$, in which case $s_n= s_{n-1} t_n$ will contain the first occurrence of $\text{c}^k$ beginning at index $L_{n-1}-1$.

  • After $\text{c}^k$ first occurs, the number of instances of $\text{c}^k$ grows approximately exponentially in the number of iterations, as does the length, and the ratio $$p_k = \frac{\text{number of instances of c}^k\text{ on the }n\text{th iteration}}{L_n} \approx \frac{1}{\text{index of the first occurrence of c}^k} $$ is approximately a constant independent of $n$. Since $\text{c}^{k+1}$ first occurs when one of these instances of $\text{c}^k$ happens to begin the "last half" of an intermediate string, this may be compared to a sequence of Bernoulli trials, each with success probability $p_k$. For such a process, the expected number of trials to get the first success is just $1/p_k$, so the index of the first occurrence of $\text{c}^{k+1}$ would be compared to $$\frac{2}{3} L_{n_k + 1/p_k} \approx 2(\frac{3}{2})^{n_k + i_k} $$ where $n_k$ is the number of iterations to get the first occurrence of $\text{c}^k$, and $i_k$ is the corresponding index. E.g., the first-occurrence index of $\text{c}^6$ would be compared to $2(\frac{3}{2})^{n_5 + i_5} = 2(\frac{3}{2})^{22 + 27308} \approx 10^{4813}$ (when in fact it is approximately $10^{519}$). Similarly, the first-occurrence index for $\text{c}^7$ (if it exists) would be compared to $2(\frac{3}{2})^{n_6 + i_6} \approx 10^{10^{518}}$. These comparisons are quite poor, but may help to understand how the first-occurrence indices can be so enormous.

r.e.s.
  • 14,371
  • Your conjecture can be proved using a string based computer program. however, I am interested in seeing approach from high level mathematicians on this question :-) – MonK Jul 24 '14 at 14:26
  • Since $a$ never propagates, I imagine there's a way to state rewrite rule without it. (In which case, my point in your previous question regarding doing $abc$ v. $01$ as digits is quite superfluous.) – Semiclassical Jul 24 '14 at 14:58
  • @Semiclassical -- The initial $\text{a}$ just acts as a "placeholder". The same sequence (without the initial $\text{a}$) is generated by starting with $\text{bc}$ and using the modified rewriting rule $a_1 \cdots a_{n} \ \ \to \ \ a_1 \cdots a_{n} a_{\left\lfloor\frac{n+1}{2}\right\rfloor } a_{\left\lfloor\frac{n+1}{2}\right\rfloor+1} \cdots a_{n}$, but I don't see how this helps. (BTW, your reference to "my" previous question suggests that you've mistaken me for the poster of the linked question.) – r.e.s. Jul 24 '14 at 23:20
  • Ack. Sorry for the mistaken attribution. And I don't mean that it makes things easier to prove results, just that it is superfluous either way – Semiclassical Jul 24 '14 at 23:25
  • 5
    @Sid -- I would be interested in any algorithm/program capable of proving the conjecture. (I'm skeptical that any such is available, however.) – r.e.s. Jul 24 '14 at 23:28
  • @r.e.s. It's no proof but I've posted the problem of finding higher indices at http://codegolf.stackexchange.com/questions/35209/where-are-the-runs-in-this-infinite-string. cccccchas not been found. – Calvin's Hobbies Jul 27 '14 at 15:32
  • @Calvin'sHobbies - Thanks for the link -- the program posted there succeeded in finding the first occurrence of $\text{c}^6$! (I edited it into the table above.) – r.e.s. Jul 28 '14 at 02:30
  • The size of those numbers is startling. I doubt very much that anyone will be able to track down the first occurrence of $c^7$! – Semiclassical Jul 28 '14 at 03:21
  • That lower bound on $c^7$ is kind've stupefying. I very much wish we could give an argument as to why it's so big, or some kind of heuristic for estimating it. (For instance, I note that in the computational thread you suggested that $c^6$ should occur at least 125 times before $c^7$ can appear. Can you spell out that heuristic a bit?) – Semiclassical Jul 30 '14 at 16:08
  • 1
    @Semiclassical - After $c^k$ first occurs, more instances of it arise in two ways: (1) as copies of the original and other old copies, and (2) as new cases when an "appended half" again happens to begin with $c^{k-1}$. Now consider only the type-2 occurrences. Computations show that there is 1 occurrence of $c^2$ before $c^3$ occurs, 1 of $c^3$ before $c^4$ occurs, 4 of $c^4$ before $c^5$ occurs, 115 of $c^5$ before $c^6$ occurs, ... So far, it's a non-decreasing sequence (1, 1, 4, 115, ...). If it remains so, we should expect at least 115 $c^6$s before $c^7$ occurs. (I mistyped 115 as 125.) – r.e.s. Jul 31 '14 at 05:23
  • Nice analysis. That suggests another problem that could be asked as a new question: What is the distribution of $c^k$'s at the $N$th index? Given the Bernoulli trials model given above, one should be able to provide a statistical prediction. Numerically, of course , that's quite computationally intensive, and probably better suited for another Code Golf question (interested, @Calvin'sHobbies?) – Semiclassical Jul 31 '14 at 14:53
  • @Semiclassical I might get around to it (or you are welcome to ask). I'm just as stupefied as you to how big those indices are. – Calvin's Hobbies Jul 31 '14 at 16:13
  • 2
    @user1708 I've posted the same question on MO (hoping that it's still on topic over there): http://mathoverflow.net/questions/177996/do-runs-of-every-length-occur-in-this-string – Calvin's Hobbies Aug 07 '14 at 07:37

1 Answers1

1

Okay, to the OP of this question. Since you requested a computer program.

Below is a string based program written in Javascript that is able to calculate index of $cccc$ in your string. See for yourself It does construct your string.

input = 'abc';
m=20;
for (i = 0; i < m; i++) {
l = input.length;
n=(l/2);
input = input + input.substring(n, l);
}
var index=input.search(/cccc/i);
alert(index)

For $m=20$ it constructs a string where $cccc$ is at the index $26$,$ccc$ is at $7$,$cc$ is at $4$.

For $m=30$ it constructs a string where $ccccc$ is at the index of $27308$. (which follows your conjecture).

For $m=40$ it constructs a string, where $cccccc$ is not there and it returns $-1$, to denote that search found nothing.

Intrestingly, when I increase m to $45$, it exceeds the computational power and does not return anything on the browser (I am using chrome on win 7 64-bit). And this is because the browser storage for string as per this answer is about 5 MB which is a string of 2,621,440 characters.

You can however, will get success in re-writing this code in Java because it supports maximum string length of $2^{31}-1$ possibly over 2 billion as per this answer.

But, for that you need at least $1024 MB$ of heap size. I firmly believe your conjecture can be proved :).

Good luck!

PS: I did not had success on recreating this on Java, since I am not an expert. However, I will return to repost the solution in Java.

MonK
  • 1,794
  • 3
    Your program does *not* prove that ($\forall k, \text{c}^k \text{ occurs in }s$), because it must be used to test every $k=1,2,3,...$ (infinitely many) -- a procedure that never halts. (Of course I used a similar program to compute the short table of first-occurrence indices posted in the question. On my system, using Sage, 47 rewrites was the maximum without memory errors.) – r.e.s. Jul 25 '14 at 13:09
  • By switching to a system with more memory (again using Sage), I was able to extend the search to $55$ rewrites, with the result that $\text{c}^6$ does not occur in the first $17,673,600,662$ ($17^+$ billion) terms of $s$. I've updated the table. – r.e.s. Jul 26 '14 at 13:29
  • 2
    Holy smokes! $2.124\times10^{511}$ on python! – MonK Jul 28 '14 at 09:43
  • And it renders the notion of locating $c^7$ computationally to be quite ludicrous. Heuristic arguments might get an estimate, though. – Semiclassical Jul 28 '14 at 12:15
  • Oops, that was a typo on my part (now fixed): the index for $\text{c}^6$ is actually $\approx 2.124\ 10^{519}$ (a $520$-digit number!). Of course, that program is doing a smarter search, and the number of positions it has to check is only a very very tiny fraction of $10^{519}$. – r.e.s. Jul 28 '14 at 12:21