8

It is known that computing $a^x \bmod N$ takes $O(|x| + \mathrm{pop}(x))$ multiplications modulo $N$, where $|x|$ is the number of bits of $x$ and $\mathrm{pop}(x)$ is the number of $1$ bits (Hamming weight). This suggests a side channel attack by measuring the run time of the exponentiation.

Can the number of $1$ bits be derived from the run time, and can it somehow speed up finding of $x$ if $a$ and $N$ are known (assuming no countermeasures are used)?

Which countermeasures can be done?

Smit Johnth
  • 1,681
  • 4
  • 17
  • 27

2 Answers2

6

Timing attacks against a function $f_k$ generally require two things:

  1. The attacker might observe the target perform $f_k(x)$ for a large number of sufficiently diversified known inputs $x$.
  2. For each $k$, there are inputs $x$ and $x'$ such that $f_k(x)$ and $f_k(x')$ are expected to execute at different speed.

Now, let's assume $f_k$ is the private key operation of Diffie-Hellman or RSA, that $k$ is the (fixed) private exponent of the target, and that the inputs are, say, 2048 bit integer values. Normally, the implementation of the operation $f_k(x) = x^k \mod p$ is such that it executes the exact same number of CPU instructions regardless of the value of $x$, so there will be no timing difference to observe (presuming the CPU performs e.g. a MUL or DIV at a fixed number of clock cycles).

There are however things to watch out for. If some $x$ is significantly smaller than the maximum size of 2048 bits and the implementation of modular multiplication is optimized to account for that, some of the least or most significant bits of $k$ might leak, depending on if the exponentiation implementation iterates the bits of the exponent from right to left or the other way around. Since the intermediate result grows exponentially, this will only reveal at most 11 bits of the exponent in the most unfavorable case of $x=2$.

To prevent this leak (again assuming the CPU performs a MUL instruction etc at a fixed number of cycles independently of the value of the operands), all that is needed to prevent such leaks is to justify the size of all inputs to the size of the modulus at the beginning of the exponentiation function implementation.


Now, next case, suppose the attacker is able to repeatedly slow down the system, at a time that averages out at the moment the target will either perform a modular multiplication because the corresponding bit of the private exponent $k$ is $1$, or not perform a modular multiplication because the bit is $0$. It suffices to say that if the modular exponentiation is e.g. performed by the CPU in a multi-threaded application, this can be done with some statistical accuracy, if the attacker somehow might control what else is executed by the CPU, either passively or actively.

One trivial way to blind the private key even in such cases, would be to always generate a fresh random integer $z$ that is invertible $\mod q$ where $q$ is a multiple of the order of $x$ (e.g. $p-1$ if the order is not known). Instead of calculating $f_k(x) = x^k \mod p$, you would calculate

  1. $k' = kz \mod q$
  2. $y = x^{k'} \mod p$
  3. $w = y^{z^{-1}\mod q} \mod p$
  4. return $f_k(x) = w$

The only step here that depends on the private exponent is the first, which has to be assumed to not vary in performance time depending on the value of $k$. The overhead of this method is on average a factor of 2, presuming modular inverses in $Z_q$ can be calculated in negligible time.


Another technique would be to make the number of modular multiplication completely independent of the bits of the private exponent, e.g. by using a sliding window technique. This means that a table of $2^m$ integers is built, and one of these values is picked for every $m$ bits of the binary representation of the private exponent. It should however be noted that this technique might make matters worse, e.g. if the time for picking one of the integers from the table depends on the index of that element.


It should also be noted, that while it has been advocated to blind the input value $x$ to-be-raised, instead of the private exponent, this will have no effect on a CPU that executes MUL independently of the value of the operand, and have no effect at all against both passive and active timing attacks that measure the timing differences relative to which bit position in the exponent is being processed.

Henrick Hellström
  • 10,406
  • 1
  • 30
  • 58
  • How could it help finding the power if we would know amount of 1 bits in it? – Smit Johnth Mar 06 '13 at 01:52
  • CPU can't usually operate on integers bigger that 64 bits, so it depends not on CPU but on modpow implementation in big integer library. – Smit Johnth Mar 06 '13 at 01:55
  • @SmitJohnth: Well, presumably you implemented the critical parts, such as multiplication, in assembler. If you don't, it will be hard to avoid input dependent conditional branches when dealing with carries. – Henrick Hellström Mar 06 '13 at 02:02
  • @SmitJohnth: The purpose of the side channel attack I describe is to determine the position of the 1 bits with arbitrary precision. – Henrick Hellström Mar 06 '13 at 02:09
  • Well, which part of it? It's hard to understand it all at once :) – Smit Johnth Mar 06 '13 at 15:13
  • @SmitJohnth: The second part of my answer concerns the case where the attacker doesn't only vary the input to the function, but also varies other external factors. In this case it is possible to not only count the bits of the exponent, but determine their position. – Henrick Hellström Mar 07 '13 at 11:11
4

Efficient constant-time exponentiation algorithms exist. For example, one could calculate a sequence as follows: Given $a^{k}, a^{k+1}$ calculate either $a^{2k+2}, a^{2k+1}$ or $a^{2k}, a^{2k+1}$. Both calculations differ only in which value is squared and which is multiplied, making them easy to implement with a single conditional swap as the only distinguishing feature. The choice of which to make starting with $1, a$ will depend on the high order bits of your exponent. These methods are used in elliptic-curve cryptography: look up the Montgomery ladder amongst others.

Watson Ladd
  • 838
  • 4
  • 10
  • Constant time = worst time, right? 2 * |x| – Smit Johnth Mar 06 '13 at 15:12
  • 1
    Of course! But there are plenty of faster constant time methods. You need to hide what you are doing. – Watson Ladd Mar 07 '13 at 01:21
  • 1
    If there are conditional branches, you can't rely on it being constant time. In worst case the second alternative is just out of the instruction cache, and has to be loaded prior to execution, at each step. The algorithm here http://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Montgomery_ladder is misleading. In order to get to constant time, you have use a constant time swap on the intermediates first, perform the square and mul and store it in temps, and swap the temps prior to assigning back to the intermediates. – Henrick Hellström Nov 27 '13 at 09:09