2

I am interested in modular arithmetic with respect to the prime $p = 2^{64}-2^{32}+1$. Thomas Pornin has some work on constant time implementation of arithmetic in $\mathsf{GF}(p)$ for this prime (the paper does other things as well --- this is the part relevant to my question).

Using montgomery arithmetic, a constant-time implementation is provided which has measured (and theoretically predicted) performance of

  • addition and multiplication are $\approx 4$ clock cycles, and
  • multiplication is $\approx 10$ clock cycles.

I'm curious --- if one does not care about the arithmetic being constant time, how much can this be sped up (if at all)? While I care about arithmetic modulo the stated prime specifically, I would of course be interested in general "rule of thumb" answers as well. I am additionally interested in the setting where one has 128-bit hardware arithmetic support.

Mark Schultz-Wu
  • 12,944
  • 19
  • 41
  • 1
    A nonconstant time speedup does not look likely; a conditional jump would mean branch mispredicts, which are expensive on higher end CPUs. The other obvious nonconstant operation are table lookups; there's no immediately obvious way to use that here. Modern (highly pipelined) CPUs are designed to do constant time operations efficiently; hence for short sequences like this, constant time is generally optimal. – poncho Oct 06 '23 at 14:14

0 Answers0