1

I'm trying to implement multiple precision arithmetic operations modulo P, with P < 2^256. More specifically, P = 2^256 - 2^32 - 977.

I want to support the following operations: +, -, *, /, pow (each mod P)

As P is close to 2^256, numbers are represented with 8 u32 or 4 u64.

a + b mod P can be done like this (in pseudo code):

n = a + b
if overflow: # i.e. over 2^256
    # add 2^256 - P to come back modulo P
    n += 2**32 + 977
else:
    if n >= P:
        # P <= n <= 2^256
        n -= P

--

For a * b mod P, my first intention was to simply do a long multiplication but that seems slow as I would need the carry to be 256 bits as well.

Are there any recommended algorithms to calculate a * b modulo P efficiently (using arrays of u32 / u64)?

I'm mostly interested in the multiplication because:

  • a^x mod P can be an optimized version of a * a * ... * a mod P
  • a / b mod P can be calculated as a * b^{P-2} using fermats little theorem

Note: Bitcoin implements these operations with numbers represented with 10 x uint26 instead of 8 uint32 so each "digit" keeps 6 bits but I'm not familiar with their methods.

D.W.
  • 159,275
  • 20
  • 227
  • 470
Ervadac
  • 113
  • 3
  • 1
    https://en.wikipedia.org/wiki/Modular_arithmetic#Example_implementations, https://en.wikipedia.org/wiki/Modular_arithmetic#Computational_complexity, https://en.wikipedia.org/wiki/Kochanski_multiplication, https://en.wikipedia.org/wiki/Montgomery_modular_multiplication – D.W. Jun 24 '21 at 04:18
  • Thanks, Montgomery is actually slower as it requires 2 costly transformations, it's better suited to use it for exponentiation from what I read.

    Kochanski seems like a good fit but there is little detail on the algo to be honest

    – Ervadac Jun 25 '21 at 13:24
  • If you have a specific question about how Kochanski multiplication works, that might make a good question (maybe ask it separately as a separate post). The algorithm in the Wikipedia article seems pretty clear to me. It sounded like you were most interested in exponentiation based on your "I'm mostly interested in..." statement. – D.W. Jun 25 '21 at 18:06
  • https://cs.stackexchange.com/q/140881/755 – D.W. Jun 25 '21 at 18:19

2 Answers2

1

Here is one reasonable method:

To multiply a 32-bit integer by a 256-bit integer modulo $P$, multiply the integers using arbitrary-precision arithmetic (see How do computers perform operations on numbers that are larger than 64 bits? this can be done with 8 32x32 -> 64 multiplications, and then some 32-bit additions) to get a 288-bit product, then reduce the product modulo $P$ (divide the product by $P$, and keep the remainder).

To multiply a 256-bit integer $X$ by a 256-bit integer $Y$ modulo $P$, write

$$X = 2^{224} X_7 + \dots + 2^{32} X_1 + X_0,$$

then do the following:

  • set $B := 0$
  • for $i := 7,6,\dots,0$:
    • set $A := X_i \times Y \bmod P$
    • set $B := 2^{32} \times B \bmod P$
    • set $B := A + B \bmod P$

At the end, $B$ will hold the product $A \times B \bmod P$. Each step can be computed using the method of the first paragraph of this answer as it only involves 32x256 -> 288 modular multiplications.

D.W.
  • 159,275
  • 20
  • 227
  • 470
1

Note that with your particular P, $a \cdot 2^{256} + b \mod P = $ $ b + a \cdot (2^{32}+977) \mod P$. The result will only rarely be P or slightly larger, in which case you subtract P once more.

gnasher729
  • 29,996
  • 34
  • 54