Proof that MAC and hash composition is insecure

Question

Let $F$ be a secure PRF and $H$ a universal hash function.

How can I exhibit a pair $(F,H)$ whose composition $$S'((k_1, k_2), m) = F(k_2, H(k_1,m))$$

is an insecure MAC (or an insecure PRF, since a MAC can be defined as a PRF) ?

I guess that, in order to find a pair $(F,H)$, the trick would be creating some $H$ whose image space is short enough so to easily find a collision, but I'm not good at finding an example of such functions, the books I read always try to abstract these functions.

There's no statement about security of $H$. What is exactly the definition used for universal hash function? As an aside, MACs are defined for a single key, so is not it rather $S'(k_1\mathbin|k_2, m) = F(k_2, H(k_1,m))$? — fgrieu, May 13 '18 at 10:54
an universal hash function is a family of hash functions from some domain to some codomain, so when you choose $H(k_1, m)$, you are choosing a function $H_{k_1}(m)$, and an attacker should have no more than negligible chance to find a pair of inputs that collides. — Daniel, May 13 '18 at 14:30
Yeah, I just removed that comment. OTOH, it seems like you're trying to find a collision for a hash with that definition? — Maarten Bodewes, May 13 '18 at 14:31
yes, I'm trying to build a pair of secure functions (a PRF and a UHF) that, when composed, becomes insecure. — Daniel, May 13 '18 at 14:32
@fgrieu you are right, MACs are defined over a single key, my mistake. This MAC's key is a pair $(k_1, k_2)$, I updated the question. — Daniel, May 13 '18 at 18:27
The thing to keep in mind is that once a key has been fixed for the UHF, there is no guarantee that it's hard to find collisions. This should be how to construct the counter example. — Maeher, May 13 '18 at 18:33
@Maeher On the contrary: As long as the adversary doesn't know the UHF key, the UHF property is precisely that it's hard to find collisions because the probability of collision for any pair of messages is negligible. Note we don't get to see the possibly structured output of the UHF, only the output of the PRF. (Collision resistance is a different property with a different attack model: the adversary learns the key. For pseudorandomness, the adversary never learns the key, only the oracle to evaluate the PRF under the key.) — Squeamish Ossifrage, May 14 '18 at 15:03
@SqueamishOssifrage That's not entirely correct. Standard definition of a UHF only requires that for any input pair the prob. of collision over a uniformly chosen key is bounded. However, here the order of quantification is wrong. First a key is fixed and then the inputs are chosen. This makes no difference in absence of an oracle, but presence of the oracle invalidates the security guarantee. Nevertheless, you may be correct that leakage from oracle (an efficient attacker, should only be able to test for collisions.) is not enough to efficiently construct collisions. — Maeher, May 14 '18 at 15:39
@Maeher Let $F$ be a random oracle. Does an oracle for $G\colon m \mapsto F(H_k(m))$ help to find a collision in $H_k$? A collision in $G$ implies either a collision in $F$, which happens with the same probability as a birthday coincidence since $F$ is uniform random, or a collision in $H_k$. — Squeamish Ossifrage, May 14 '18 at 20:06
As said in the other answers, the PRF UHF composition is provably secure. Interestingly, however, it is not the case for the MAC UHF composition. — Marc Ilunga, Jan 10 '23 at 21:54

Squeamish Ossifrage · Accepted Answer · 2019-11-06T02:27:07.907

Without breaking $F$, you can't: $S'$ is a PRF with almost the same security as $F$.

Let $k_1$ and $k_2$ be uniform random keys. Let $F$ be a PRF, with advantage $$\operatorname{Adv}^{\operatorname{PRF}}_F(A) = \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert$$ for any distinguisher $A$, where $f$ is a uniform random function with the domain and codomain of $F$. Let $H$ be an $\varepsilon$-almost universal hash family, so that $\Pr[H_{k_1}(x) = H_{k_1}(y)] \leq \varepsilon$ for any $x \ne y$. (Without qualification, $\varepsilon = 1/|T|$ where $T$ is the codomain of $H$.)

Define $$S'_{k_1,k_2}(m) = F_{k_2}(H_{k_1}(m)).$$

Fix any PRF-distinguisher $A'$ for $S'$ making $q$ queries, and let $U$ be a uniform random function with the domain and codomain of $S'$. We will bound the advantage of $A'$ at distinguishing $S'$ in terms of the advantage of another algorithm $A$ at distinguishing $F$ and the collision probability $\varepsilon$ of $H$: \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} where $A$ is a PRF-distinguisher for $F$. As long as $\operatorname{Adv}^{\operatorname{PRF}}_F(A)$ is small and $q$ is not too large, $\operatorname{Adv}^{\operatorname{PRF}}_{S'}(A')$ is small too.

We will do this by the triangle inequality with the intermediate probability $\Pr[A'(f \circ H_{k_1}) = 1]$ that $A'$ returns 1 on a variant $f \circ H_{k_1}$ of $S'_{k_1,k_2} = F_{k_2} \circ H_{k_1}$, where a uniform random $f$ has been substituted for $F_{k_2}$.

Define the PRF-distinguisher $A$ for $F$ by $A(\mathcal O) = A'(\mathcal O \circ H_{k_1})$. Then \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_F(A) &= \lvert\Pr[A(F_{k_2}) = 1] - \Pr[A(f) = 1]\rvert \\ &= \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert. \end{align*} If $A'$ is a good distinguisher for $S'$, we will find that $A$ is a good distinguisher for $F$, unless $A'$ just got lucky finding collisions in $H$.
Now consider the $q$ queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ for the oracle $f \circ H_{k_1}$.

From queries to $H_{k_1}$ alone, of which we assume only the weak property of collision probabilities on two distinct inputs, an adversary could find a collision among three inputs with high probability—e.g., in a polynomial evaluation MAC $M_{r,s}(m) = s + \sum_{i=1}^{|m|} m_i r^{|m| - i + 1}$ the adversary could trivially recover the keys $r$ and $s$ from two distinct queries and find arbitrarily many collisions with probability 1 after that.

But since $f$ is a uniform random function, the only information $A'$ can learn from oracle access to $f \circ H_{k_1}$ is whether the queries collide in one of $H_{k_1}$ or $f$, or definitely do not collide in either. The adversary can adaptively act on the information that queries might collide only if a collision actually occurs in $H_{k_1}$, which happens with probability at most $\varepsilon$ for any pair of inputs submitted. Thus, to study $\Pr[A'(f \circ H_{k_1}) = 1]$, it suffices to set a bound on the probability that there is a collision at all.

Among the queries $x_1, x_2, \ldots, x_q$ submitted by $A'$ to $f \circ H_{k_1}$, the event $C$ of a collision in $H_{k_1}$ has probability \begin{multline*} \Pr[C] = \Pr[\exists i < j\colon H_{k_1}(x_i) = H_{k_1}(x_j)] \\ \leq \sum_{i<j} \Pr[H_{k_1}(x_i) = H_{k_1}(x_j)] \leq \sum_{i<j} \varepsilon = \binom{q}{2} \varepsilon, \end{multline*} In the event $\lnot C$ that the queries do not collide in $H_{k_1}$, the distribution of each $f(H_{k_1}(x_i))$ is independent uniform random, identical to the distribution of $U(x_i)$. Hence necessarily $\Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] = \Pr[A'(U) = 1]$, so that \begin{align*} \Pr[A'(f \circ H_{k_1}) = 1] &= \Pr[A'(f \circ H_{k_1}) = 1 \mid C]\,\Pr[C] \\ &\quad + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C]\,\Pr[\lnot C] \\ &\leq \Pr[C] + \Pr[A'(f \circ H_{k_1}) = 1 \mid \lnot C] \\ &\leq \binom{q}{2} \varepsilon + \Pr[A'(U) = 1], \end{align*} and thus $\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1] \leq \binom{q}{2} \varepsilon$.
Summing up, \begin{align*} \operatorname{Adv}^{\operatorname{PRF}}_{S'}(A') &= \lvert\Pr[A'(S'_{k_1,k_2}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \lvert\Pr[A'(F_{k_2} \circ H_{k_1}) = 1] - \Pr[A'(f \circ H_{k_1}) = 1]\rvert \\ &\quad + \lvert\Pr[A'(f \circ H_{k_1}) = 1] - \Pr[A'(U) = 1]\rvert \\ &\leq \operatorname{Adv}^{\operatorname{PRF}}_F(A) + \binom{q}{2} \varepsilon, \end{align*} QED.

_{This follows the structure of the proof of Lemma 3.3 in:}

_{Shay Gueron and Yehuda Lindell, ‘GCM-SIV: Full Nonce Misuse-Resistant Authenticated Encryption at Under One Cycle per Byte’, in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2015, pp. 109–119}

_{Variants of the theorem appear in many earlier papers, including the MDx-MAC paper that preceded the creation of HMAC, and the HMAC/NMAC security papers.}

I recognize that an answer flies high over my head when I can't immediately tell if its conclusion is that the proposition in the question is right or wrong. I'm precisely at that point. — fgrieu, May 14 '18 at 16:43
At least, I think that now I get that we both lean towards impossibility to exhibit the question's counterexample, contrary to that comment. That's a progress. But I'm still lost among the definitions of smurf-universal-hash-function, and reconciling with what the OP stated. — fgrieu, May 15 '18 at 20:12
@fgrieu As far as I can tell it's the standard definition of $\varepsilon$-almost universal for $\varepsilon$ negligible in the security parameter: a random function $H$ such that for any $x \ne y$, $\Pr[H(x) = H(y)] \leq \varepsilon$. — Squeamish Ossifrage, May 15 '18 at 20:31
Ahh; so I would be off when I reason with this definition of (plain vanilla) universal hash function, also regurgitated at the end on my current tentative answer. Sad, I was so proud that it nicely goes with Katz and Lindell's strongly universal (hash) function $$\forall(m,m')\in\mathcal M^2,\forall(t,t')\in\mathcal T^2,\ m\ne m'\implies\mathsf{Pr}_{k\in\mathcal K}\Big[H(k,m)=t\wedge H(k,m')=t'\Big]=\frac1{|\mathcal T|^2}$$ — fgrieu, May 15 '18 at 21:09
@fgrieu They are related: $\varepsilon$-almost strongly universal means $\Pr[H(x) = h, H(y) = h'] \leq \varepsilon^2$ for all $x \ne y$, $h$, and $h'$; that implies $\varepsilon$-almost universal, meaning $\Pr[H(x) = H(y)] \leq \varepsilon$ for all $x$ and $y$; unqualified, these mean $\varepsilon = 1/|\mathcal T|$, which when $|\mathcal T| = 2^\lambda$ is negligible in the security parameter $\lambda$. (Here $H$ is a random variable taking values in a function space; write $H_k$ for some key $k$ if you want to make the key explicit.) — Squeamish Ossifrage, May 16 '18 at 00:13

fgrieu · Answer 2 · 2018-05-15T12:16:51.150

My reading is that one can't exhibit the counter example asked for the definition of universal hash function in comment, when that's read as stating that $H_{k_1}: m\to H(k_1,m)$ is collision-resistant for fixed $k_1$, including random and public.

That follows from the following proposition, and the remark that turning $k_1$ from public to secret can't harm security.

Proposition: Applying a secure PRF $F_{k_2}: h\mapsto F(k_2,h)$ with random secret constant $k_2$ to the output of a public collision-resistant function $H$ yields a secure MAC (only at worst slightly less secure than the weakest of $F$ and $H$).

That proposition holds because distinguishing $F_{k_2}(H(m_i))$ from random, for random secret $k_2$ and chosen distinct messages $m_i$, requires breaking the indistinguishability of $F_{k_2}$ or the collision-resistance of the public function $H$. Proof sketch of that: for hypothetical distinct messages $m_i$ allowing to distinguish $F_{k_2}(H(m_i))$ from random, if there is a collision among the $h_i=H(m_i)$, that exhibits a pair of $h_i$ breaking the collision-resistance of $H$; otherwise, we can distinguish the $F(h_i)$ from random for chosen distinct $h_i$ that we can exhibit from the hypothetized $m_i$ (since $H$ is public), thus breaking the indistinguishability of $F$.

As apparent from the many revisions and convoluted argument surrounding $k_1$, I'm struggling quite a bit on that one, especially when I use the more formal definition of (not-necessarilly-strongly) universal hash function: $H:\mathcal K\times\mathcal M\to\mathcal T$ is a family of universal hash functions when $$\forall(m,m')\in\mathcal M^2,\quad m\ne m'\implies\mathsf{Pr}_{k\in\mathcal K}\Big[H(k,m)=H(k,m')\Big]=\frac1{|\mathcal T|}$$

Proof that MAC and hash composition is insecure

2 Answers2

Linked