2

Consider XSalsa20-Poly1305-SIV. This is obtained by:

  • computing a MAC $t_{secret}$ of the plaintext from the key and nonce, as in ChaCha20-Poly1305 except that the plaintext, not the ciphertext, is MAC'd.
  • compute $t = F(K,t_{secret})$, where F is the ChaCha20 core (or any other PRF).
  • use $t$ as the auth tag and as part of the XSalsa20 nonce.

Questions:

  • Is this secure?
  • Is this provably secure, assuming that the ChaCha20 and Salsa20 cores are strong PRFs?
Demi
  • 4,793
  • 1
  • 19
  • 39
  • If you're writing F's inputs in the usual order then that's certainly not necessarily secure. ​ ​ –  Feb 22 '16 at 10:43
  • @RickiDemer oops, fixes. – Demi Feb 22 '16 at 13:52
  • Assuming I've understood what you're asking, your question is equivalent to: "Is SIV a secure mode of operation", which it is [http://web.cs.ucdavis.edu/~rogaway/papers/keywrap.html] – Cryptographeur Feb 23 '16 at 10:59

1 Answers1

3

This can be a secure construction, if by MAC you mean universal hash family, like Poly1305. Call this hash family $H_r$ and the short pseudorandom function family $F_k$. Rough justification for why this is secure:

  1. The function $m \mapsto F_k(H_r(m))$ is a long-input, short-output PRF.
  2. A good PRF makes a good MAC.
  3. A good PRF has birthday-bounded collision probability, so the probability of nonce reuse for the stream cipher is small.

In a bit more detail: Let $\varepsilon_F$ be a bound on the PRF-distinguisher advantage of any algorithm making $q$ queries against $F$, and let $\varepsilon_H$ be a bound on the collision probability $\Pr[H_r(x) = H_r(y)]$ of $H$ for any $x \ne y$ and uniform random $r$.

  • The PRF-distinguisher advantage for a $q$-query attack against $m \mapsto F_k(H_r(m))$ is bounded by $\varepsilon_F + \binom{q}{2} \varepsilon_H$ (proof).

  • For fixed-size keys, the collision probability $\varepsilon_H$ grows linearly with the maximum message length. So this figures into the concrete numbers.

  • The probability of a synthetic nonce collision (i.e., a collision in $m \mapsto F_k(H_r(m))$) is higher than the probability of a collision in $H$, because either a collision in $H$ or a collision in $F$ means there's a nonce collision—whether this matters depends on how large the output of $F$ is, and in particular on the birthday bound for $F$ which may be much smaller than the collision probability of $H$.

There's some practical details to work out.

Here $q$ represents the number of messages that you are willing to send or receive in your application. For example, if $\varepsilon \approx 2^{-100}$, the bound above means nothing unless your application's total bandwidth is limited to $q \lll 2^{50}$ messages. (And you don't want to mess with collisions in $H$.)

Is the Poly1305 collision probability bound $\varepsilon_{\operatorname{Poly1305}} = 8\ell/2^{106}$ for $16\ell$-byte messages comfortable enough for trillions of megabyte-long messages? Maybe it is, maybe it isn't—for now I leave it as an exercise for the reader to compute specific bounds for specific data volumes. (See a similar table for AES-GCM.)

What should you choose for $F_k$? If you're already using XSalsa20, the obvious choice is XSalsa20 truncated to 128 bits or similar. Of course, you'll have to quantify the probability of a collision between the XSalsa20 nonce used for encryption and the XSalsa20 input from $H$. But maybe you can do better than multiple the HSalsa20 invocations this would imply, each of which adds overhead to small packets. Maybe you should use a 192-bit or 256-bit hash $H$, and a 192-bit authentication tag, so that the probability of a synthetic nonce collision is negligible even for extremely large volumes of data.

Squeamish Ossifrage
  • 48,392
  • 3
  • 116
  • 223