Why use a polynomial to generate random numbers for Pollard's rho factorization algorithm?

Question

In the paper "A Monte Carlo method for factorization", Pollard introduces a method for factoring composite numbers that is better than trial division. The main idea is to generate random numbers and test if the difference between any pair has a non-$1$ divisor in common with the number to be factored, which exploits the birthday paradox to find factors faster than trial division.

In his paper, Pollard uses the formula: $x_{i+1} \equiv x_i^2 - 1 \bmod n$, to generate random numbers, where $n$ is the number that we want to factor and $x_0 = 2$. He notes that any polynomial of degree $\geq 2$ can be used.

It intuitively makes sense to me that linear functions can have additional properties that cause the generated sequence to not be random enough. However, I see no clear reason why the algorithm would only work for polynomials. Using different functions, for example with bit manipulations and storing intermediate results modulo $2^{64}$ could potentially make the algorithm faster on real computers.

I have tried implementing the algorithm using built-in random number generators of my programming language (the splitmix PRNG in particular) and noticed that the algorithm performs much worse. I don't think it is likely that this is due to poor statistical properties of splitmix, because it is mature and it passes many statistical tests. It seems more likely to me that there is a deeper mathematical reason that makes polynomials more suitable for this algorithm, but I cannot pinpoint why that is the case.

So, my question is: why does the Pollard rho algorithm for factorizing composite numbers work much better with PRNG's based on polynomials?

score 4 · Accepted Answer · answered Jun 03 '21 at 17:19

4

You need to understand why the rho method works.

In a typical case, we have $N = pq$ where $p, q$ are different prime numbers. This means that, by Chinese remainder theorem, we have an isomorphism $\Bbb Z/N\Bbb Z \simeq \Bbb Z/p\Bbb Z \times \Bbb Z/q\Bbb Z$.

If we iterate an arbitrary map $g:\Bbb Z/N\Bbb Z \rightarrow \Bbb Z/N\Bbb Z$, then there is no reason that this has anything to do with the above isomorphism. However, if the map $g$ is a polynomial, then the same polynomial map can be defined on $\Bbb Z/p\Bbb Z$ and $\Bbb Z/q\Bbb Z$, respectively, and applying $g$ on $\Bbb Z/N\Bbb Z$ will be equivalent to applying the same polynomials on both factors in $\Bbb Z/p\Bbb Z$ and $\Bbb Z / q\Bbb Z$.

Therefore, when $g$ is a polynomial, we can expect running into a cycle mod $p$ much earlier, because it essentially only depends on the polynomial map $g$ on $\Bbb Z/p\Bbb Z$.

answered Jun 03 '21 at 17:19

WhatsUp

22,201
19
48

That is indeed a piece of information that I missed in the paper and online resources, but I still wonder why it matters that the mapping on $\mathbb{Z}/N\mathbb{Z}$ is the same as the mapping on $\mathbb{Z}/p\mathbb{Z}$. If the mapping is different then we still get random numbers, right? – Noughtmare Jun 03 '21 at 17:33
2

You can view a number in $\Bbb Z/N\Bbb Z$ as a pair $(a, b)$ where $a \in \Bbb Z/p\Bbb Z$ and $b \in \Bbb Z/q\Bbb Z$. The point is, for a polynomial map $g$, a pair $(a, b)$ is sent to $(g(a), g(b))$. Note that the first component $g(a)$ only depends on $a$, not on $b$. Therefore we are in fact searching for a loop inside $\Bbb Z/p\Bbb Z$, which is a much smaller set. On the other hand, if the map is arbitrary, then you are randomly moving inside the whole $\Bbb Z/N\Bbb Z$, which is much larger. – WhatsUp Jun 03 '21 at 17:50
Ah, so basically if you don't use a polynomial (or, in general, any ring on $\mathbb{Z}/p\mathbb{Z}$, right?) then you will be trying many $a$'s multiple times with a different $b$ component that you do not care about. That clears it up. Maybe I can still try to find some other ring that is more efficient on real computers... – Noughtmare Jun 03 '21 at 17:56
@Noughtmare You could also look at it this way: the standard trick of fast and slow pointers to detect collisions assumes that the map is deterministic in order to achieve its linear running time. If you remove this, you are essentially just checking random pairs each time which is quadratic. The birthday paradox is still there, but we lack an efficient way to spot the coincidence (we can’t exactly hash on the residue mod $p$ without knowing $p$). – Erick Wong Jun 03 '21 at 20:44

Why use a polynomial to generate random numbers for Pollard's rho factorization algorithm?

1 Answers1

Linked