Mathematicians shocked(?) to find pattern in prime numbers

Question

There is an interesting recent article "Mathematicians shocked to find pattern in "random" prime numbers" in New Scientist. (Don't you love math titles in the popular press? Compare to the source paper's Unexpected Biases in the Distribution of Consecutive Primes.)

To summarize, let $p,q$ be consecutive primes of form $a\pmod {10}$ and $b\pmod {10}$, respectively. In the paper by K. Soundararajan and R. Lemke Oliver, here is the number $N$ (in million units) of such pairs for the first hundred million primes modulo $10$,

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &a&b&\color{blue}N&&a&b&\color{blue}N&&a&b&\color{blue}N&&a&b&\color{blue}N\\ \hline &1&3&7.43&&3&7&7.04&&7&9&7.43&&9&1&7.99\\ &1&7&7.50&&3&9&7.50&&7&1&6.37&&9&3&6.37\\ &1&9&5.44&&3&1&6.01&&7&3&6.76&&9&7&6.01\\ &1&1&\color{brown}{4.62}&&3&3&\color{brown}{4.44}&&7&7&\color{brown}{4.44}&&9&9&\color{brown}{4.62}\\ \hline \text{Total}& & &24.99&& & &24.99&& & &25.00&& & &24.99\\ \hline \end{array}$$

As expected, each class $a$ has a total of $25$ million primes (after rounding). The "shocking" thing, according to the article, is that if the primes were truly random, then it is reasonable to expect that each subclass will have $\color{blue}{N=25/4 = 6.25}$. As the present data shows, this is apparently not the case.

Argument: The disparity seems to make sense. For example, let $p=11$, so $a=1$ . Since $p,q$ are consecutive primes, then, of course, subsequent numbers are not chosen at random. Wouldn't it be more likely the next prime will end in the "closer" $3$ or $7$ such as $q=13$ or $q=17$, rather than looping back to the same end digit, like $q=31$? (I've taken the liberty of re-arranging the table to reflect this.)

However, what is surprising is the article concludes, and I quote, "...as the primes stretch to infinity, they do eventually shake off the pattern and give the random distribution mathematicians are used to expecting."

Question: What is an effective way to counter the argument given above and come up with the same conclusion as in the article? (Will all the $N$ eventually approach $N\to 6.25$, with the unit suitably adjusted?) Or is the conclusion based on a conjecture and may not be true?

P.S: A more enlightening popular article "Mathematicians Discover Prime Conspiracy". (It turns out the same argument is mentioned there, but with a subtle way to address it.)

I caught that sentence too and also wondered if it had actually been proved, and if so what the big deal is. Reminds me of the situation with Chebyshev's bias. Your argument for why the disparity exists for small primes makes sense to me, but I am curious how it shakes out quantitatively. I also read this article with an even catchier title. Excellent question and cheers. — Dan Brumleve, Mar 15 '16 at 03:44
Here is a 2010 paper considering the same question mod $4$. I found it linked by a commenter in the Quanta article. — Dan Brumleve, Mar 15 '16 at 04:12
Check also this article on Tao's blog: https://terrytao.wordpress.com/2016/03/14/biases-between-consecutive-primes/ — PITTALUGA, Mar 15 '16 at 12:54
Symmetry, at least when viewed as crosstabulated and then horizontally mirrored $$ \begin{array} {r|rrrr} & -1 & -3& 3&1 \ \hline 1& 5.44& 7.50& 7.43& 4.62\ 3& 7.50& 7.04& 4.44& 6.01\ -3& 7.43& 4.44& 6.76& 6.37\ -1& 4.62& 6.01& 6.37& 7.99 \end{array} $$ where the row- and coulmn-headers are 1,3,7,9 set as 1,3,-3,-1 (because being residues modulo 10) — Gottfried Helms, Mar 15 '16 at 14:01
The $10^8$-th prime is about $2\times 10^9$. density of prime there is about $\frac{1}{20}$. If primes are uniformly distributed among numbers ending with $1, 3, 7$ and $9$, then start from any prime in that range, the probability to discover a prime in next slot is around $\frac{10}{4}\times \frac{1}{20} = \frac18$. If discovery of primes in different slots are independent, the relative probability of a $1 \to 1$ event to a $1 \to 3$ event should be around $\left(\frac78\right)^3 \approx 0.67$. This is not that different from the observed value $\frac{4.62}{7.43} \approx 0.63$. — achille hui, Mar 15 '16 at 14:36
@achillehui: So the article's conclusion was a bit misleading, or I misinterpreted it, by assuming that as the prime range go to infinity, then the $16$ values will all assume $N \to 0.625$? — Tito Piezas III, Mar 15 '16 at 14:47
IMHO, the article is misleading b/c a sample of $10^8$ primes is simply not large enough. Even if primes are randomly distributed among the 4 classes, until $\frac{10}{4\log n} \gg 4$, we should not expect to see the $16$ values approaches $\frac{1}{16}$. — achille hui, Mar 15 '16 at 14:54
@achillehui: If we define $F(n) = \frac{10}{4 \log n}$, then, $$F(10^8) =0.135,,F(10^{10}) =0.108,,F(10^{12}) =0.090$$ rounded off. With increasing denominator, it's getting smaller and approaching $\frac{1}{16}=0.0625$. Ok, got it. — Tito Piezas III, Mar 15 '16 at 15:04
Oops, should be $\frac{4\log n}{10} \gg 4$. The LHS estimates the number of try to get to next prime. — achille hui, Mar 15 '16 at 15:09
Soundararajan has another paper which might be of interest: "Small gaps between prime numbers: The work of Goldston-Pintz-Yildirim" — Reikku Kulon, Mar 22 '16 at 00:05
Marek Wolf has plotted distributions of prime gaps up to $2^{44}$. — Reikku Kulon, Mar 22 '16 at 00:52
This question brings together two things which don't have to go together: properties of primes, and properties of integers modulo 10. It may be helpful to ask the following simpler but related question: if a prime $>3$ is of the form $3n+1$, is it or is it not equally likely that the next larger prime is of the form $3n+1$ or $3n+2$? I suspect it is not equally likely, however large the numbers are, because if $3N+1$ is prime, the next possible prime - not divisible by 2 or 3 - is $3(N+1)+2$. This gives $3n+2$ an advantage which seems unlikely to be offset by other considerations. — Adam Bailey, Mar 22 '16 at 17:37
@Adam: Lemke Oliver and Soundararajan describe the bias for decimal in their introduction, and the popular science articles have reported on that because it's easy to understand. However, they later discuss several other moduli. As Terence Tao describes, the effect modulo 3 diverges slightly from your expectation. — Reikku Kulon, Mar 22 '16 at 18:25
@DanBrumleve: Thanks for putting bounty, by the way. Are you giving it to any of the two answers before it expires soon? — Tito Piezas III, Mar 24 '16 at 00:33
@Dan: thank you for the bounty! SE matched it with the association bonus across three sites, so I saw +600. That's a lot to gain all at once... — Reikku Kulon, Mar 24 '16 at 17:46
I have some talk slides with an analysis of the residues-mod-3 case here. Also shows some computations that may help to explain other cases, but mod 3 was the only one that was tackled with a recurrence that gave a plausible (as in, cannot be proven correct, at least not by myself) asymptotic result. — Daniel Lichtblau, Aug 08 '16 at 20:20
@DanielLichtblau: Oh, nice seeing you here. Thanks for the link. — Tito Piezas III, Aug 08 '16 at 23:38

Gottfried Helms · Accepted Answer · 2016-03-23T12:05:19.090

$ \qquad \qquad $ Remark: see also [update 3] at end

1. First observations

I think there is at least one artifact (=non-random) in that list of frequencies.

If we rewrite this as a "correlation"-table, (the row-header indicate the residue classes of the smaller prime p and the column-header that of the larger prime q):
$$ \small \begin{array} {r|rrrr} & 1&3&7&9 \\ \hline 1& 4.62& 7.43& 7.50& 5.44\\ 3& 6.01& 4.44& 7.04& 7.50\\ 7& 6.37& 6.76& 4.44& 7.43\\ 9& 7.99& 6.37& 6.01& 4.62 \end{array}$$ then a surprising observation is surely the striking symmetry around the antidiagonal. But also the asymmetric increase of frequencies from top-right to bottom-left on the antidiagonal is somehow surprising.

However, if we look at this table in terms of primegaps, then

residue-pairs $(1,1)$ $(3,3)$ $(7,7)$,$(9,9)$ (the diagonal) refer to primegaps of the lenghtes $(10,20,30,...,10k,...)$ and those are the entries in the table with lowest frequencies,
residue-pairs $(1,3)$, $(7,9)$ and $(9,1)$ refer to primegaps of the lenghtes $(2,12,22,32,...,10k+2,...)$ and those contain the entry with the highest frequencies
residue-pairs $(3,7)$ $(7,1)$ ,$9,3$ refer to primegaps of the lenghtes $(4,14,24,34,...,10k+4,...)$
residue-pairs $(1,7)$ $(3,9)$ and $(7,3)$ refer to primegaps of the lenghtes $(6,16,26,36,...,10k+6,...)$ and have the two next-largest frequencies
residue-pairs $(1,9)$ $(3,1)$ and $(9,7)$ refer to primegaps of the lenghtes $(8,18,28,38,...,10k+8,...)$

so the -in the first view surprising- different frequencies of pairs $(1,9)$ and $(9,1)$ occurs because one collects the gaps of (minimal) length 8 and the other that of (minimal) length 2 - and the latter are much more frequent, but which is completely compatible with the general distribution of primegaps. The following images show the distribution of the primegaps modulo 100 (whose greater number of residue classes should make the problem more transparent).

(I've left the primes smaller than 10 out of the computation):

in logarithmic scale

We see the clear logarithmic decrease of frequencies with a small jittering disturbance over the residue classes. It is also obvious, that the smaller primegaps dominate the larger ones, so that a "slot" which catches the primegaps of lengthes $2,12,22,...$ has more occurences than the "slot" which catches $8,18,28,...$ - just by the frequencies in the very first residue class. The original table of frequencies in the residue classes modulo 10 splits this into 16 combinations of pairs of 4 residue classes and the observed non-smoothness is due to that general jitter in the resdiue classes of the primegaps.

It might also be interesting to see that primegap-frequencies separated into three subclasses - :

That trisection shows the collected residue classes $6,12,18,...$ (the green line) as dominant over the two other collections and the two other collection change "priority" over the single residue classes.
The modulo-10-problem overlays that curves a bit and irons the variation a bit out and even makes it a bit less visible - but not completely: because the general distribution of residue classes in the primegaps has such a strong dominance in the small residue-classes. So I think that general distribution-characteristic explains that modulo-10 problem, however a bit less obvious...

2. Further observations (update 2)

For further analysis of the remaining jitter in the previous image I've tried to de-trend the frequencies distribution of the primegaps (however now without modulo considerations!).
Here is what I got on base of 5 700 000 primes and the first 75 nonzero lenghtes g. The regression-formula was simply created by the Excel-spreadsheet:

De-trending means to compute the difference between the true frequencies $\small f(g)$ and the estimated ones; however, the frequency-residuals $\small r_0(g)=f(g) - 16.015 e^{-0.068 g }$ decrease in absolute value with the value of g. Heuristically I applied a further detrending function at the residuals $\small r_0(g)$ so that I got $\small r_1(g) = r_0(g) \cdot 1.07^g $ which look now much better de-trended.

This is the plot of the residuals $\small r_1(g)$:

Now we see that periodic occurences of peaks in steps of 6 and even some apparent overlay. Thus I marked the small primefactors $\small (3,5,7,11)$ in g and we see a strong hint for a additive composition due to that primefactors in $g$

The red dots mark that g divisible by 3, green dots that by 5, and we see, that at g which are divisible by both the frequency is even increased.

I've also tried a multiple regression using that small primefactors on that residuals, but this is still in process....

3. observations after Regression/Detrending (update 3)

Using multiple regression to detrend the frequencies of primegaps by their length g and additionally by the primefactors of g I got initially a strong surviving pattern with peaks for the primefactor 5. But those peaks could be explained by the observation, that (mod 100) there are 40 residues of primefactor p where the gaplength g=0 (mod 10) can occur, but only 30 residues where the other gaplengthes can occur.
Thus I computed the relative (logarithmized) frequencies as $\text{fl}(g)=\ln(f(g)/m_p(g))$ where $f(g)$ is the frequency of that gaplength, and $m_p(g)$ the number of possible residue classes of the (first) prime p (in the pair (p,q) ) at where the gaplengthes g can occur.

The first result is the following picture where only the general trend of decreasing of frequencies of larger gaps is detrended:

This computation gives a residue $\text{res}_0$ which is the relative (logarithmized) frequency after the length of the primegap is held constant (see the equation in the picture). The regular pattern of peaks at 5-steps in the earlier pictures is now practically removed.

However, there is still the pattern of 3-step which indicates the dominance of gaplength 6. I tried to remove now the primefactorization of g as additional predictors. I included marker variables for primefactors q from 3 to 29 into the multiple regression equation and the following picture shows the residues $\text{res}_1(g)$ after the systematic influence of the primefactorization of g is removed.

This picture has besides a soft long hill-like trend no more -for me- visible systematic pattern, which would indicate non-random influences.

(For me this is now enough, and I'll step out - but still curious whether there will come out more by someone else)

+1 Gottfried: Good observations! I totally missed the other symmetries, though I did notice the $4.62,, 4.44,, 4.44,, 4.62$ when I highlighted them. — Tito Piezas III, Mar 15 '16 at 14:48
The lengths divisible by six are very strongly favored (in my plot of $\frac{\phi(n)}{n}$ those are the most prominent white lines). It looks like highly composite numbers are the most favorable distances between primes. In particular, products of two primes are avoided, which would disfavor small distances between primes in the same residue class. — Reikku Kulon, Mar 20 '16 at 16:35
@Reikku : yes, the composite primegaps seem to be more frequent (see the red dots for divisibles by 6 - they are on top of the (rescaled) frequency list in the last image). What I was additionally trying was to find, whether there is some exact functional relation of the -scaled- frequencies and the primfactors in g (with or without multiplicity) - thus I tried multiple regression including primefactor markers. But my results are not yet convincing, at least not convincing me... I'll update my answer when I've more. — Gottfried Helms, Mar 20 '16 at 20:54
Very good observations! Just out of curiosity, what program did you use to make those graphs? — KKZiomek, Mar 08 '18 at 22:13
I heard about Ppari/GP, too bad it doesnt work on my computer :( — KKZiomek, Mar 09 '18 at 04:26
@KKZiomek - ?? It works on windows and unix. What a computer/OS do you have? — Gottfried Helms, Mar 09 '18 at 06:32
@GottfriedHelms I found the solution now, I just did the installation wrong — KKZiomek, Mar 15 '18 at 22:15

Reikku Kulon · Answer 2 · 2016-03-20T19:19:19.563

22

It seems unreasonable to expect the prime numbers to "know" which primes are adjacent. The consecutive prime bias must be a symptom of a more general phenomenon. Some experimentation shows that each prime seems to repel others in its residue class, over considerable distance.

Fix a prime $p$ and select primes $q \gg p$. Let $r$ be a reasonably large radius, such as $p \log q$, and let $n$ range over the interval $(q - r, q + r)$. Ignoring $q$, $n$ seems to be prime less often for $n \equiv q \pmod p$ than for $n \not\equiv q \pmod p$.

For example, with $p = 7$ and $q$ ranging from $7^7$ to $7^8$, these are the primes counted in each residue class (with many overlaps):

$$ \small \begin{array} {r|rrrrrr} [q] & n \equiv +1 & +2 & \mathit{+3} & -3 & \mathit{-2} & \mathit{-1}\\ \hline +1 & 108980 & 128952 & 126384 & 127903 & 128088 & 126665\\ +2 & 128952 & 108641 & 128836 & 126463 & 127911 & 127999\\ \mathit{+3} & 126386 & 128838 & 108915 & 128655 & 126043 & 128555\\ -3 & 127904 & 126464 & 128655 & 108843 & 129049 & 126684\\ \mathit{-2} & 128087 & 127910 & 126040 & 129046 & 109062 & 129065\\ \mathit{-1} & 126665 & 128001 & 128553 & 126686 & 129068 & 109293\\ \end{array}$$

Italics indicate the quadratic nonresidues, which do not account for the smaller biases.

The repulsion persists even for intervals $(q + p \log q, q + \sqrt{q})$, which gives 10-30 primes per residue class around each $q$:

$$ \small \begin{array} {r|rrrrrr} [q] & n \equiv +1 & +2 & \mathit{+3} & -3 & \mathit{-2} & \mathit{-1}\\ \hline +1 & 1009455 & 1015043 & 1015079 & 1014692 & 1012735 & 1014648\\ +2 & 1010366 & 1006394 & 1015175 & 1012825 & 1014562 & 1011749\\ \mathit{+3} & 1014932 & 1010510 & 1008805 & 1014377 & 1015580 & 1017266\\ -3 & 1012473 & 1013167 & 1011447 & 1007058 & 1015711 & 1014626\\ \mathit{-2} & 1017126 & 1011133 & 1014870 & 1010336 & 1008950 & 1016188\\ \mathit{-1} & 1015821 & 1014746 & 1012491 & 1014960 & 1012051 & 1010063\\ \end{array}$$

Since there's nothing special about $p = 7$, the repulsion likely occurs for all $p$. This means that simply by determining $q$ to be prime, we learn something about many composite numbers in arbitrary residue classes, without locating any of them precisely.

The following is a plot of $\frac{\phi(n)}{n}$ for odd $n$ with $p = 11, r = 2 \cdot 11^2$ (horizontal), in residue classes modulo $11^2$ (vertical), averaged over all intervals about $q \in (11^5, 11^6)$ and normalized, with $n \equiv q \pmod{11}$ in green, scaled to 2x2 tiles. Dark tiles rarely correspond to primes.

$\frac{\phi(n)}{n}$

First differences:

$first differences of $\frac{\phi(n)}{n}$$

edited Mar 20 '16 at 19:19

answered Mar 16 '16 at 12:43

Reikku Kulon

521

That's really amazing. I verified it for $p=7$ and tried it for some other small values of $p$ and they all repel. I also tried excluding the primes immediately before and after $q$ to make sure that consecutive primes aren't causing the entire effect, and the repulsion is still present. Is there a reference for this or did you discover it yourself? Why does this happen? I'm reminded of quasirandomness. – Dan Brumleve Mar 17 '16 at 03:08
1

Cortana found the "conspiracy" link two days ago, and since then I've been intermittently experimenting with gp while waiting for responses to this question. I hope something more precise can be said about the extra composites; a similar pattern arises in their prime factors. I'm also thinking about random prime models. A simple model produces a uniform distribution; what would replicate the pair bias? Could repulsion alone replicate the prime number distribution? – Reikku Kulon Mar 17 '16 at 23:49
1

The effect on semiprimes is very similar. – Reikku Kulon Mar 18 '16 at 01:55
2

Here are some average gaps between primes in each residue class ($p = 7$):
$$ \small \begin{array} {r|rrrrrr} [q] & n \equiv +1 & +2 & \mathit{+3} & -3 & \mathit{-2} & \mathit{-1}\ \hline +1 & 14.88 & 13.36 & 12.18 & 12.94 & 12.31 & 12.95\ +2 & 11.80 & 14.88 & 13.41 & 12.06 & 12.88 & 12.27\ \mathit{+3} & 12.91 & 11.80 & 14.88 & 13.36 & 12.19 & 12.95\ -3 & 12.29 & 12.98 & 11.83 & 14.88 & 13.38 & 12.08\ \mathit{-2} & 12.93 & 12.31 & 13.04 & 11.74 & 14.88 & 13.37\ \mathit{-1} & 12.10 & 12.89 & 12.33 & 12.94 & 11.81 & 14.88\ \end{array}$$
– Reikku Kulon Mar 18 '16 at 02:27
2

Primes counted in each residue class among the $p$ primes following each $q$:
$$ \small \begin{array} {r|rrrrrr} [q] & n \equiv +1 & +2 & \mathit{+3} & -3 & \mathit{-2} & \mathit{-1}\ \hline +1 & 56590 & 63948 & 68102 & 65538 & 68153 & 65210\ +2 & 68271 & 56381 & 62178 & 67736 & 64531 & 67891\ \mathit{+3} & 63872 & 69961 & 56742 & 63525 & 67531 & 65826\ -3 & 67332 & 63757 & 68853 & 56711 & 62336 & 68272\ \mathit{-2} & 64741 & 68415 & 63926 & 70040 & 56689 & 63793\ \mathit{-1} & 66735 & 64530 & 67650 & 63706 & 68373 & 56708\ \end{array}$$
– Reikku Kulon Mar 18 '16 at 02:43
2

Residue classes of distinct prime factors in the composite halfway between $q$ and next congruent prime (note column minima):
$$ \small \begin{array} {r|rrr} [q] & n \equiv +1 & +2 & \mathit{+3} & -3 & \mathit{-2} & \mathit{-1}\ \hline +1 & 14385 & 15347 & 17920 & 19657 & 18259 & 18045\ +2 & 15861 & 15018 & 19430 & 18436 & 16952 & 17823\ \mathit{+3} & 16229 & 15374 & 17572 & 18714 & 17928 & 17759\ -3 & 14505 & 16525 & 18246 & 17891 & 16710 & 19531\ \mathit{-2} & 14558 & 15525 & 17835 & 20112 & 16424 & 18886\ \mathit{-1} & 15101 & 17057 & 19163 & 18446 & 16571 & 17304\ \end{array}$$
– Reikku Kulon Mar 18 '16 at 03:20
1

The Quanta article states that the consecutive prime bias is a prediction of the k-tuples conjecture. It seems straightforward to estimate the counts using it: take the sum of the k-tuples formula (modified slightly) over all the constellations consisting of two primes whose gap is in whichever congruence class mod $p$. And we could do the same thing to estimate the long-range repulsion, we just have to include a lot of additional constellations. If those calculations are consistent with the data then I guess the phenomena can be seen as evidence for the k-tuples conjecture. – Dan Brumleve Mar 19 '16 at 02:45

score 4 · Answer 3 · edited Apr 13 '17 at 12:58

If I have read the New Scientist article correctly, the so-called "discrepancy" is: If a prime ends in 1 (as in the first class), the observed probability that the next prime also end in 1 is not 1/4. This observation can be explained by elementary probability with the assumption that the classes are indeed truly random. I was also troubled by this and I asked it in the math overflow forum: https://mathoverflow.net/questions/234753/article-in-the-new-scientist-on-last-number-of-prime-number.

Let me restate the argument. Let's write the sequence of all numbers ending with 1, 3, 7, 9 (beginning with 7): 7, 9, 11, 13, 17, 19,... and flag each number with a probability of $p$. Let's denote $q=1−p$. Now if a number ending in 1 has been flagged, the probability that the next number being flagged ends in 1 can easy be computed: $\sum_{k=0}^\infty q^{3+4k}p=q^3p\frac{1}{1-q^4}$. That's not 25%! In order to make that more intuitive, suppose that p is close to 1 that is we flag each number with high probability (but nevertheless randomly). If we have flagged a number ending in 1, the probability that the next number being flagged ends in 1 is very small, because we can expect that at least one of the three following numbers (ending in 3, 7, 9) will be flagged (recall that we have expected p being close to 1).

Now this model is oversimplificated. The probability that a random number $n$ is prime can be evaluated as $1/ln(n)$ (not as a constant $p$) by the prime counting function. If we know that the number ends in $1, 3, 7, 9$; this probability becomes $\frac{10}{4}\frac{1}{ln(n)}$ (assuming the classes are random). Because the sequence $q^{3+4k}p$ tends to zero rapidly for $k\rightarrow\infty$, if a number $n$ ending in 1 has been flagged, the probability that the next number being flagged also ends in 1 can be evaluated as $q_n^3p_n\frac{1}{1-q_n^4}$ with $p_n=\frac{10}{4}\frac{1}{ln(n)}$ and $q_n=1-p_n$.

For $n=100\cdot10^6$, we find 19.8% which is not much different from the number cited in the article (take in mind the simplification in my argument, also the article seem to make the experiment for a random number between 1 and $10\cdot10^6$ not just a number which is approximately equal to $10\cdot10^6$). Moreover as $n\rightarrow\infty$, $p_n\rightarrow 0$ and we can check that $\lim_{p\rightarrow 0}q^3p\frac{1}{1-q^4}=\frac{1}{4}$ which seems to explain that the "discrepancy" vanishes when we take a longer sequence. There are a lot of mysteries concerning prime numbers but it seems that the one pointed by the New Scientist is nothing more than misconception on elementary probability.

The article is not overly precise to say the least, there are better popular expositions. A thing to note is the quote: ' “In ignorance, we thought things would be roughly equal,” says Andrew Granville of the University of Montreal, Canada.' Pay attention to the 'roughly.' What you sketch is the 'roughly equal' but the bias is stronger than this and this is the news. I feel this popular article does a better job at explaining. Read it to the end; the start can make one jump to conclusions as displayed here. — quid, Mar 29 '16 at 17:29
@Olivier: I have deleted an unnecessary sentence at the very start of your post that could have led to confusion. Also, for readability, your long post has been divided into paragraphs. — Tito Piezas III, Mar 29 '16 at 18:21
@quid. The article you cite make much more sense that the New Scientist article; it does indeed mention the explanation I give above (without giving much details) and says that it is still unsatisfactory. Clearly the New Scientist article is badly written, there is no reasonable way to expect 25% to the question posed by the New Scientist article. — Olivier Esser, Mar 29 '16 at 18:51
@quid: I've edited the post to include a link to that article which was also recommended by Dan Brumleve. Yes, it has a subtle way to address the argument I raised. — Tito Piezas III, Mar 29 '16 at 20:19
The difficulty in applying this probabilistic argument (which is similar to Adam Bailey's earlier comment) is the assumption that consecutive prime numbers can be accurately modeled as independent events. The observed bias demonstrates that, at least for "small" primes, the dependencies are less intuitive than previously thought. Cramér's model predicts the primes should be uniformly distributed among residue classes over reasonably small intervals, but the primes are found to behave as if each prime in a chosen class depletes that class over a nearby interval. — Reikku Kulon, Mar 30 '16 at 04:46

Mathematicians shocked(?) to find pattern in prime numbers

3 Answers3

1. First observations

2. Further observations (update 2)

3. observations after Regression/Detrending (update 3)

Linked