9

Consider the notorious Collatz function $$ T(n) = \begin{cases}(3n+1)/2&\text{ if $n$ is odd,}\\n/2&\text{ if $n$ is even.}\end{cases} $$

One of the most important acceleration techniques of the convergence test is the usage of a sieve (test $k$ least significant bits of $n$, the sieve has the size of $2^k$ entries), and test only those numbers that do not join the path of a lower number in $k$ steps. This technique is greatly explained, e.g., here or here.

For example, consider the sieve for $k=2$ and particularly the numbers of the form $4n+1$ which join the path of $3n+1$ in two steps. Their path is $$ 4n+1 \rightarrow 6n+2 \rightarrow 3n+1 \text{.}$$

What I don't understand is how this can be used to search for the highest number occurring in the sequence (path records in the terminology of Eric Roosendaal). The sieve cuts the calculation before the computation of any intermediate value (which can actually be the maximum, like the value $6n+2$ in the above example). How can I detect that $4n+1$ does lead to a maximum if no $6n+2$ is computed? Testing the path of $3n+1$ no longer makes sense since the maximum $6n+2$ occurs before this term. Am I missing something?

DaBler
  • 1,000
  • $4n+1$ is not the best example since we are supposed to stop at 1 and not continue (and you would see N=1 Mx(N)=2 instead of N=2 Mx(N)=2 on Eric's page) – Collag3n Nov 28 '19 at 19:24
  • @Collag3n Consider for example the number 5 (which is of the form $4n+1$). Using the sieve with $k=2$ bits, the convergence test is skipped since the sieve indicates that the trajectory of this number joins a trajectory of a lower number in at most two steps (namely, the trajectory of the 5 joins the trajectory of 4 in two steps). This, however, results in a miss of the maximum (which is the number 8). Do I get it wrong? (The complete trajectory is 5 → 8 → 4 → 2 → 1.) – DaBler Nov 28 '19 at 19:47
  • 5 is not skipped in sieve $k=2$ but in sieve $k>2$. But since 3 was already found with a greater path record (16), it is discarded. – Collag3n Nov 28 '19 at 19:56
  • And if 5 had reach another value higher then 8 later (with an higher $k$), it would have been found with 4 (well that's not the best example neither, you would have an odd value somewhere for that case) – Collag3n Nov 28 '19 at 20:05
  • The number of surviving residues $\mod ({2^{\lceil i \log_23\rceil}})$ is $s_i = s_{i-1}\cdot 2^{\lceil i \log_23\rceil-\lceil (i-1) \log_23\rceil}-a(i)$ with $s_1=1$ and where $a(i)$ is the $i^{th}$ term of this sequence https://oeis.org/A100982. Strangely Eric's page (http://www.ericr.nl/wondrous/techpage.html) mention 1720 for $2^{16}$ where I find 2114. Do you know how he got this number? – Collag3n Nov 28 '19 at 20:57
  • $4n+1\to12n+4\to 6n+2\to 3n+1$ actually only intermediate steps are counted. –  Nov 28 '19 at 22:07
  • @Collag3n I also get 2114 out of 65536 numbers for $k=16$. I have no idea how Eric came to that number 1720. Just for the sake of interest, my implementation is here. – DaBler Nov 29 '19 at 08:46
  • @RoddyMacPhee This trajectory, however, does not correspond to my definition of $T(n)$ in the question above. – DaBler Nov 29 '19 at 08:48
  • 1
    I think your criticism is correct. For the conclusion for the filter being valid it must been proved elsewhere, that *there is always* a follower from $3n+1$ which is larger than the immediate follower of $4n+1$ (which is $6n+2$). And I didn't see such a statement/proof. – Gottfried Helms Nov 30 '19 at 12:51

2 Answers2

5

Quote: "As $k$ increases, the search only needs to check those residues $b$ that are not eliminated by lower values of $k$"

Take residue 15 for instance. It survives $\mod 2^5$ but is eliminated while sieving $2^7$ so any value $x\equiv 15 \mod 2^7$ will not be searched anymore for $k>7$

Residue 15 was eliminated because it reached a lower value then himself $\mod 2^7$. It means that these numbers can't reach higher values, later with $k>7$, that were not reached (with a smaller $k$) by the lower value they just hit.

Collag3n
  • 2,556
  • sorry for the poor wording, I am bit lazy on this one – Collag3n Nov 28 '19 at 20:37
  • The real acceleration, is noting that $2^mx+j$ has the parity of $j$ This means it will follow the same style of path as $j$ does ( only for $m$ steps, and only if $m$ is greater than or equal to, the number of divisions by 2 in $j$'s sequence, let $x$ be odd) this shows: $$2048x+7\to 6144x+22\to 3072x+11$$$$\to 9216x+34\to 4608x+17\to 13824x+52$$$$\to 6912x+26\to 3456x+13\to 10368x+40$$$$\to 5184x+20\to 2592x+10\to 1296x+5$$ and 4 more steps, eventually landing on $243x+1=242x+(x+1)$. the altered Collatz function would never hit the highest points. –  Nov 28 '19 at 22:57
  • I'm sorry, but I still don't get it. Can we stay on that example with the number 5 and $k = 2$? The Wikipedia page says that the sieve answers the question whether the number $n = a2^k+b$ should be tested for the convergence or not. Right so far? – DaBler Nov 29 '19 at 08:51
  • So if $f^k(a2^k+b) < 2^k+b$ for all nonzero $a$, and this also holds for all $k'<k$, the number $a2^k+b$ can be skipped, okay? – DaBler Nov 29 '19 at 08:54
  • Note that the acceleration using the identity $f^k(a2^k+b)=a3^{c(b)}+d(b)$ is another technique, usually used right after the above sieve. – DaBler Nov 29 '19 at 08:57
  • Perhaps one attempt to understand: Does the sieving with $k$-sieve imply that the maximum was already found before (for lower $n$ which was not eliminated by this sieve)? – DaBler Nov 29 '19 at 09:01
  • I see no response in last 2 days, so let me add a few of my findings. Empirically, I have found that using sieves for gradually increasing $k$ will miss no maximum. More preciselly, the sieve for a particular $k$ doesn't miss any maximum above $2^k$, but can miss maxima below this threshold. So I believe your answer is correct, though informal (and I don't fully understand it). If you could explain it better, I would appreciate it a lot. – DaBler Dec 01 '19 at 14:40
  • Sorry, had not much time these days. I don't know if there is a formal proof no new path records can be found there (I am interested if someone knows one), but if you want to be sure from a sieve perspective, since the maximum value that can be reach by a residue $\mod (2^k={2^{\lceil i \log_23\rceil}})$ is smaller than $3^i$, if you keep track of your path records, you know up to what $k$ you are "safe", and I think these safety margins are growing. Now I didn't gave it much thought. Perhaps you'll find some answers in Oliveira's paper – Collag3n Dec 02 '19 at 08:21
  • https://www.researchgate.net/publication/220577131_Maximum_excursion_and_stopping_time_record-holders_for_the_problem_Computational_results – Collag3n Dec 02 '19 at 08:21
  • smaller than $3^i\cdot 2^{k-i}$...sorry – Collag3n Dec 02 '19 at 08:49
  • Yet one another finding: This property also holds in the case of using a $3^k$ sieve (as opposed to $2^k$), e.g. numbers $n \equiv 2 \pmod{3}$ can be skipped, etc. – DaBler Dec 05 '19 at 19:18
  • Yes, I saw that too in Oliveira's paper at the end. That's probably why Eric got 1720 instead of 2114 for his $2^16$ sieve – Collag3n Dec 05 '19 at 20:33
  • No, that is no the reason for 1720. I have asked about it directly Eric. Quoting from his answer: "that can not be shown to either converge or join the path of a lower number [...] The keyword is ‘join’ here. [...] So after calculating 2^16+x, for x=0 to 65535, compare all the results of 3*b + c (including the convergent ones)" If two of them are equal, discard the higher one." And I can confirm that using this method, I have really got number 1720 out of 65536. – DaBler Dec 06 '19 at 16:43
  • For the case you want to play with Eric's idea, my implementation and pre-calculated sieves are available here: https://github.com/xbarin02/collatz-sieve – DaBler Dec 09 '19 at 18:21
2

(Notation: residue $n_0\mod 2^{\lceil i \log_23\rceil}$ = residue $b\mod2^k$ from your wiki page)

About the "discarded" 5 reaching maximum 8 (or 16), already reached by "surviving" 3:

  • One of the discarded sequence is the inverse V-Shape sequence which rise for $i$ steps of $f(x)=\frac{3x+1}{2}$ and then fall bellow the initial value by successive division by $2$ (See here). Of all discarded sequences $2^{\lceil i \log_23\rceil}n+n_0$ for a specific $n$, this is the type of sequence that potentially reaches the highest value: $$(2^{\lceil i \log_23\rceil}n+n_0+1)\frac{3^i}{2^{i}}-1$$

Note: $n_0\leq 2^{\lceil i \log_23\rceil}-3$ and the exact value can be found in the link above

e.g. with $4n+1=5$ where $n_0=1$, $i=1$,$n=1$ which reaches $8$ before dropping to $4<5$

  • One of the surviving sequence is the straight line which rise for the whole $k={\lceil i \log_23\rceil}$ steps of $f(x)=\frac{3x+1}{2}$. Of all surviving sequences for a specific $n$, this is the sequence (starting from $2\cdot2^{\lceil i \log_23\rceil}n-1$) that reaches the highest value (limited to $k={\lceil i \log_23\rceil}$ steps): $$3^{\lceil i \log_23\rceil}(n+1)-1$$

Note: here we always have $n_0= 2^{\lceil i \log_23\rceil}-1$

e.g. with $4n+3=7$ where $i=1$,$n=1$ which reaches $17$ (in 2 steps), or with $n=0$: $3$ reaches $8$

Now it is easy to show that the highest value that can be reached by a discarded sequence at $n$ is smaller (or equal) than the highest value already reached by a surviving sequence at $n-1$

e.g with discarded $4(1)+1=5$ reaches $8$ which was already reached by surviving $4(1-1)+3=3$

Surviving highest value at $n-1$ is greater then discarded value at $n$?

$$3^{\lceil i \log_23\rceil}n-1 \geq (2^{\lceil i \log_23\rceil}n+n_0+1)\frac{3^i}{2^{i}}-1$$ and with $n_0< 2^{\lceil i \log_23\rceil}-1$, we just need to show that $$3^{\lceil i \log_23\rceil}n-1 \geq (2^{\lceil i \log_23\rceil}(n+1))\frac{3^i}{2^{i}}-1$$ $$\Big(\frac{3}{2}\Big)^{\lceil i \log_23\rceil}n \geq \Big(\frac{3}{2}\Big)^i(n+1)$$ $$\Big(\frac{3}{2}\Big)^{\lceil i \log_2\frac{3}{2}\rceil} \geq 1+\frac{1}{n}$$ which is already true for $n-1=0$ when $i\geq 3$ (manually checked for $i=1$ and $i=2$ by using the exact value of $n_0$ in those cases)

e.g. with $n-1=0$: discarded $32n+23$ reaches $188$ but surviving $32(n-1)+31$ already reached $242$

Note: you can multiply both side by 2 to get the "real" maximum (16 instead of 8).

The key idea is that even if the discarded inverse V-Shape at $n$ was at the highest possible residue $n_0= 2^{\lceil i \log_23\rceil}-3$, it would reach a smaller value than the straight line at $n-1$ (always with residue $n_0= 2^{\lceil i \log_23\rceil}-1$).

This means that record paths are always found in residue $b\mod2^k$ (in other word, at $2^k\cdot n+b$ with $n=0$)

EDIT:

even more, when sieving $2^{k+1}$: values below $2^k$ that are dropping cannot produce new path records (obviously), but value above $2^k$ that are not surviving after $2^{k+1}$ sieve are now known, and there maximum is still the RHS above: indeed the condition $n_0+2^{\lceil i \log_23\rceil}< 2^{\lceil i \log_23\rceil+1}-1$ or $n_0< 2^{\lceil i \log_23\rceil}-1$ do not change, and the value of $i$ (climbing steps) neither since the last step was a drop bellow initial value.

So even if the max value on the LHS do not climb anymore at step $k+1$, it would still be higher (the whole equation would stay the same).

This means that new record paths are only found in surviving residue $b\mod2^k$

No need to check discarded residue at all, even within the sieve range.

Collag3n
  • 2,556