Is XSalsa20-Poly1305-SIV a reasonable choice for nonce-misuse-resistant authenticated encryption?

Question

Consider XSalsa20-Poly1305-SIV. This is obtained by:

computing a MAC $t_{secret}$ of the plaintext from the key and nonce, as in ChaCha20-Poly1305 except that the plaintext, not the ciphertext, is MAC'd.
compute $t = F(K,t_{secret})$, where F is the ChaCha20 core (or any other PRF).
use $t$ as the auth tag and as part of the XSalsa20 nonce.

Questions:

Is this secure?
Is this provably secure, assuming that the ChaCha20 and Salsa20 cores are strong PRFs?

If you're writing F's inputs in the usual order then that's certainly not necessarily secure. — , Feb 22 '16 at 10:43
Assuming I've understood what you're asking, your question is equivalent to: "Is SIV a secure mode of operation", which it is [http://web.cs.ucdavis.edu/~rogaway/papers/keywrap.html] — Cryptographeur, Feb 23 '16 at 10:59

Squeamish Ossifrage · Answer 1 · 2019-11-15T02:05:00.810

This can be a secure construction, if by MAC you mean universal hash family, like Poly1305. Call this hash family $H_r$ and the short pseudorandom function family $F_k$. Rough justification for why this is secure:

The function $m \mapsto F_k(H_r(m))$ is a long-input, short-output PRF.
A good PRF makes a good MAC.
A good PRF has birthday-bounded collision probability, so the probability of nonce reuse for the stream cipher is small.

In a bit more detail: Let $\varepsilon_F$ be a bound on the PRF-distinguisher advantage of any algorithm making $q$ queries against $F$, and let $\varepsilon_H$ be a bound on the collision probability $\Pr[H_r(x) = H_r(y)]$ of $H$ for any $x \ne y$ and uniform random $r$.

The PRF-distinguisher advantage for a $q$-query attack against $m \mapsto F_k(H_r(m))$ is bounded by $\varepsilon_F + \binom{q}{2} \varepsilon_H$ (proof).
For fixed-size keys, the collision probability $\varepsilon_H$ grows linearly with the maximum message length. So this figures into the concrete numbers.
The probability of a synthetic nonce collision (i.e., a collision in $m \mapsto F_k(H_r(m))$) is higher than the probability of a collision in $H$, because either a collision in $H$ or a collision in $F$ means there's a nonce collision—whether this matters depends on how large the output of $F$ is, and in particular on the birthday bound for $F$ which may be much smaller than the collision probability of $H$.

There's some practical details to work out.

Here $q$ represents the number of messages that you are willing to send or receive in your application. For example, if $\varepsilon \approx 2^{-100}$, the bound above means nothing unless your application's total bandwidth is limited to $q \lll 2^{50}$ messages. (And you don't want to mess with collisions in $H$.)

Is the Poly1305 collision probability bound $\varepsilon_{\operatorname{Poly1305}} = 8\ell/2^{106}$ for $16\ell$-byte messages comfortable enough for trillions of megabyte-long messages? Maybe it is, maybe it isn't—for now I leave it as an exercise for the reader to compute specific bounds for specific data volumes. (See a similar table for AES-GCM.)

What should you choose for $F_k$? If you're already using XSalsa20, the obvious choice is XSalsa20 truncated to 128 bits or similar. Of course, you'll have to quantify the probability of a collision between the XSalsa20 nonce used for encryption and the XSalsa20 input from $H$. But maybe you can do better than multiple the HSalsa20 invocations this would imply, each of which adds overhead to small packets. Maybe you should use a 192-bit or 256-bit hash $H$, and a 192-bit authentication tag, so that the probability of a synthetic nonce collision is negligible even for extremely large volumes of data.

Is XSalsa20-Poly1305-SIV a reasonable choice for nonce-misuse-resistant authenticated encryption?

1 Answers1