Can the CRT speed-up Paillier decryption by more than a factor of two?

Question

In the Paillier cryptosystem, decryption goes $m\gets\displaystyle\left\lfloor\frac {\left(c^\lambda\bmod N^2\right)-1}N\right\rfloor\mu\bmod N$ with $\mu<N$ being a part of the private key just like $\lambda=\operatorname{lcm}(p-1,q-1)$ for $N=p\,q$.

The Chineese Remainder Theorem allows to speed-up this computation knowing the factorization $N=p\,q$, as follows:

Evaluate $x=c^\lambda\bmod N^2$ by the Chinese Remainder theorem, that is
- $x_{p}\gets c^\lambda\bmod p^2$
- $x_q\gets c^\lambda\bmod q^2$
- $x\gets\left(q^{-2}(x_p-x_q)\bmod p^2\right)q^2+x_q$
  note: $q^{-2}\bmod p^2$ can be precomputed.
Then evaluate $m\displaystyle\gets\left\lfloor\frac{x-1}N\right\rfloor\,\mu\bmod N$.

This speeds-up decryption by a factor of at most two (each of the first two modular exponentiations is manipulating values half as large as for $c^\lambda\bmod N^2$, and is thus at best four times faster). In RSA, the CRT gives larger savings (sometime approaching four), because the exponents $d_p$ and $d_q$ have about half the size of $d$.

Can we improve the savings obtained and exceed a factor of two?

This question is an attempt to compute $m_p=m\bmod p$ and $m_q=m\bmod q$, then use the CRT to get $m$. If the computation of $m_p$ could somewhat we performed mostly modulo $p$ or $p^2$, perhaps the savings would be improved.

Daniel S · Accepted Answer · 2023-05-31T07:11:37.723

Yes, there's nothing complicated here. Let's write $\mathcal L_p(x)$ for the Fermat quotient for a prime $p$ $$\mathcal L_p(x)=\frac{(x^{p-1}-1)\mod p^2}p.$$ Then if we have $N=pq$ and a generator $g$, plaintext $m\mod N$ and ciphertext $c$ given by $$c=g^mr^N\mod{N^2}$$ then if we define the decryptions $$d_p=\mathcal L_p(c)(\mathcal L_p(g))^{-1}\mod p$$ $$d_q=\mathcal L_q(c)(\mathcal L_q(g))^{-1}\mod q$$ and use the Chinese remainder theorem to find $d\mod N$ such that $d\mod p= d_p$ and $d\mod q=d_q$ then $d=m$. If we fix $g$ then $\mathcal (\mathcal L_p(g))^{-1}\mod p$ and $(\mathcal L_q(g))^{-1}\mod q$ can of course be pre-computed. Here's some sagemath:

def FermatQuotient(a,p):
    ln = pow(a,p-1,p^2)
    return GF(p)((lift(ln)-1)/p)
p = random_prime(2^1024)
q = random_prime(2^1024)
N = p*q
plain = randint(1,N)
blind = randint(1,N)
cipher = pow(2,plain,N^2)*pow(blind,N,N^2)
mp = FermatQuotient(cipher,p)/FermatQuotient(2,p)
mq = FermatQuotient(cipher,q)/FermatQuotient(2,q)
mn = CRT(lift(mp),lift(mq),p,q)
mn == plain

We note that for a $B$-bit prime $p$ and value $x$, computing $\mathcal L_p$ should take $(1+o(1))B$ modular multiplications of $2B$-bit numbers whereas using the Fermat quotient mod $N$ will take $(2+o(1))B$ modular multiplications of $4B$-bit numbers.

Bonus content: $p$-adic logarithms Interestingly, it is also possible for higher powers of $p$ to keep the number of modular multiplications required close to $\lg p$. This is relevant for Damgård-Jurik encryption for example. Although it is usually suggested that decryption in such cases be performed by iterated Paillier decryption, one can take another approach taht I think also illuminates some of the logarithmic structure at play. I've never sat down and tried to see whether a more efficient D-J system can be built from this observation, but I do like the mathematics.

Consider the power series $$\mathcal l(t)=t+\frac{t^2}2+\frac{t^3}3+\frac{t^4}4\ldots$$ high school calculus allows us to identify this as the Taylor series for $-\log(1-t)$ and higher analysis tells us that for real/complex $t:|t|<1$ the series converges and good approximations can be taken from its truncation and that these approximations behave (approximately) logarithmically wrt $1-t$. Now, if we consider the same series $p$-adically, then it converges for $\nu_p(t)>0$. The truncations can be thought of as approximation to the $p$-adic logarithm, and in particular, if we consider the case $\nu_p(t)\ge 1$, $m<p$ we have $$\mathcal l(t)\equiv x+\frac{t^2}2+\cdots+\frac{t^{m-1}}{m-1}\pmod {p^m}.$$ IN PROGRESS

duckstar · Answer 2 · 2018-10-23T12:21:49.233

Since CRT is an isomorphism computing $m_p=m\bmod q$ and $m_q=m\bmod q$ directly is possible. To see this in the formulas above replace the $\bmod n$ step with $p$ and $q$.

To the question, I don't know if working in $\mathbb{Z}_{p^2}^{*}$ and $\mathbb{Z}_{q^2}^{*}$ could let you compute $m_p$ and $m_q$ more quickly than simply doing the steps modulo the prime factors of $n$. One thing that could help is to reduce $\lambda$ by the order of $\mathbb{Z}_{p^2}^{*}$ (or $q^2$ respectively). The order is given by the Euler totient function of $p^2$ which is $\phi(p^2) = p(p-1)$. This helps when $p$ or $q$ is small but in general doesn't speed things up.

The only other comment I can make is the improvement you want would need to exploit some property of the group of elements of the form $x=c^\lambda\bmod n^2$ under multiplication. This is the group of elements of order dividing $n$. Its two nontrivial subgroups are the elements of order dividing $p$ and $q$.

I see how we have $m_p=\displaystyle\left\lfloor\frac {\left(c^\lambda\bmod N^2\right)-1}N\right\rfloor\mu\bmod p$, same for $m_q$, and that we can get $m$ from $m_p$ and $m_q$. But that's not any a faster than computing $m$ by the normal method. And I fail to see how the $\bmod p$ can get into the input of the floor function in that definition of $m_p$, which seems necessary to "reduce $\lambda$" as in the answer. — fgrieu, Oct 23 '18 at 13:51

Can the CRT speed-up Paillier decryption by more than a factor of two?

2 Answers2

Linked

Related