Montgomery Multiplication with CRT

Question

I am attempting to understand how to use Montgomery multiplication in an RSA private key operation: $X \equiv a^{e} \pmod{n}$ where $a$ is the message, $e$ is the exponent, $n$ is the modulus.

Using the algorithm from Montgomery Reduction (with $r=2^k$, where $k$ is the bit length of modulus $n$):

ModExp(a; e; n) { n is an odd number }
Step 1. Compute n' using the extended Euclid algorithm.
Step 2. a_hat := a*r (mod n)
Step 3. x_hat := 1*r (mod n)
Step 4. for i = k-1 down to 0 do
Step 5.   x_hat := MonPro(x_hat; x_hat)
Step 6.   if e(i) = 1 then x_hat := MonPro(a_hat; x_hat)
Step 7. x := MonPro(x_hat; 1)
Step 8. return x
MonPro(a_hat;b_hat)
Step 1. t := a_hatb_hat
Step 2. m := tn' (mod r)
Step 3. u := (t + m*n)/r
Step 4. if u >= n then return u-n else return u

Now, the modulus $n$ will always be odd in RSA since it is generated from primes, which satisfies the first requirement. Also, from what I understand, in order for Montgomery form to be possible, the size of the base $a$ must be $a < n$. Fortunately, in RSA, this also holds true as the message/signature can't be longer than the modulus.

However, here's where I'm getting stuck. I am adding in hardware acceleration to a preexisting RSA library (mbedTLS) by replacing the modular exponentiations with an accelerated version. It's working great, so long as it's not using the Chinese Remainder Theorem. I don't entirely grasp CRT yet, but I understand that it allows us to perform faster decryption by splitting the message up into two operations and shrinking the modulus size:

$$ m_1 = (M^d \bmod N) \bmod p = ((M \bmod p)^{d \bmod p-1}) \bmod p $$ $$ m_2 = (M^d \bmod N) \bmod q = ((M \bmod q)^{d \bmod q-1}) \bmod q $$

Taken from: Chinese Remainder Theorem and RSA

The issue is, the message length is now longer than the modulus $p$ and $q$. So now, it would violate the requirement for Montgomery form that in $(aR)*mod(N)$, $a$ must be $a < N$.

I've searched all over for a method of doing Montgomery modular exponentiation in the case that the $a > N$, but they all state that the input $a$ is smaller than $N$. I can't seem to wrap my head around how you would perform a modexp using the Montgomery form with a larger input size than the modulus.

I was thinking maybe you could chunk $a$ into binary groups of $bitlen(N)$ with some sort of carry into the next group, but I can't figure out how you would mix in the inner loop that does the squaring. Would it be possible to modify it so that it becomes:

modexp(a[0:len(n)], e, n) ... modexp(a[len(n):len(a)], e, n)

And somehow combine those into an output that would be of len(n)? I'm really lost in understanding the mathematics behind it.

In the CRT $m_1$ is less then p. You need to use Montgomery with $p$. — kelalaka, Jun 28 '20 at 16:01

score 2 · Accepted Answer · answered Jun 28 '20 at 19:31

The issue is, the message length is now longer than the modulus $p$ and $q$

That's not true. In the $p$ track, you are raising $M \bmod p$ to the power $d \bmod p-1$; we have $M \bmod p < p$; that is, the value we are exponentiating is less that $p$, as results by Montomery multiplication. In the same way, the $q$ track also satisfies the requirements on that side, and so everything works.

Montgomery Multiplication with CRT

1 Answers1