What algorithm is prefered to do a x b mod P with big numbers (256 bits)

Question

I'm trying to implement multiple precision arithmetic operations modulo P, with P < 2^256. More specifically, P = 2^256 - 2^32 - 977.

I want to support the following operations: +, -, *, /, pow (each mod P)

As P is close to 2^256, numbers are represented with 8 u32 or 4 u64.

a + b mod P can be done like this (in pseudo code):

n = a + b
if overflow: # i.e. over 2^256
    # add 2^256 - P to come back modulo P
    n += 2**32 + 977
else:
    if n >= P:
        # P <= n <= 2^256
        n -= P

--

For a * b mod P, my first intention was to simply do a long multiplication but that seems slow as I would need the carry to be 256 bits as well.

Are there any recommended algorithms to calculate a * b modulo P efficiently (using arrays of u32 / u64)?

I'm mostly interested in the multiplication because:

a^x mod P can be an optimized version of a * a * ... * a mod P
a / b mod P can be calculated as a * b^{P-2} using fermats little theorem

Note: Bitcoin implements these operations with numbers represented with 10 x uint26 instead of 8 uint32 so each "digit" keeps 6 bits but I'm not familiar with their methods.

https://en.wikipedia.org/wiki/Modular_arithmetic#Example_implementations, https://en.wikipedia.org/wiki/Modular_arithmetic#Computational_complexity, https://en.wikipedia.org/wiki/Kochanski_multiplication, https://en.wikipedia.org/wiki/Montgomery_modular_multiplication — D.W., Jun 24 '21 at 04:18
Thanks, Montgomery is actually slower as it requires 2 costly transformations, it's better suited to use it for exponentiation from what I read.
Kochanski seems like a good fit but there is little detail on the algo to be honest — Ervadac, Jun 25 '21 at 13:24
If you have a specific question about how Kochanski multiplication works, that might make a good question (maybe ask it separately as a separate post). The algorithm in the Wikipedia article seems pretty clear to me. It sounded like you were most interested in exponentiation based on your "I'm mostly interested in..." statement. — D.W., Jun 25 '21 at 18:06

score 1 · Accepted Answer · answered Jun 25 '21 at 18:29

Here is one reasonable method:

To multiply a 32-bit integer by a 256-bit integer modulo $P$, multiply the integers using arbitrary-precision arithmetic (see How do computers perform operations on numbers that are larger than 64 bits? this can be done with 8 32x32 -> 64 multiplications, and then some 32-bit additions) to get a 288-bit product, then reduce the product modulo $P$ (divide the product by $P$, and keep the remainder).

To multiply a 256-bit integer $X$ by a 256-bit integer $Y$ modulo $P$, write

$$X = 2^{224} X_7 + \dots + 2^{32} X_1 + X_0,$$

then do the following:

set $B := 0$
for $i := 7,6,\dots,0$:
- set $A := X_i \times Y \bmod P$
- set $B := 2^{32} \times B \bmod P$
- set $B := A + B \bmod P$

At the end, $B$ will hold the product $A \times B \bmod P$. Each step can be computed using the method of the first paragraph of this answer as it only involves 32x256 -> 288 modular multiplications.

gnasher729 · Answer 2 · 2021-06-26T15:13:13.593

1

Note that with your particular P, $a \cdot 2^{256} + b \mod P = $ $ b + a \cdot (2^{32}+977) \mod P$. The result will only rarely be P or slightly larger, in which case you subtract P once more.

edited Jun 26 '21 at 15:13

answered Jun 26 '21 at 14:56

gnasher729

29,996
34
54

What algorithm is prefered to do a x b mod P with big numbers (256 bits)

2 Answers2