8

Context: We usually assume that the hash functions we use in practice are both: collision resistant and pseudorandom. I wonder what's the relation between those properties.

Question: Is a pseudo random function always collision resistant?


Clarification: This questions suggests that hash functions are pseudo-random functions, which is often not the case (e.g., see Length extension attack). The main question whether pseudo-randomness implies collision resistance is still a valid question.

mti
  • 655
  • 3
  • 12
  • Possible counterexample: the MD5 compression function, which is conjectured to be a PRF but which probably isn't collision-resistant. However, I'm not sure it fails the technical property of keyed collision-resistant hash family, and the way its PRFness is used to justify NMAC security is a little wacky, so… I don't know! – Squeamish Ossifrage Dec 21 '17 at 18:32
  • Some related work https://eprint.iacr.org/2006/043.pdf. – mti Mar 24 '24 at 08:58
  • 1
    A PRF with a 1-bit output is obviously not collision resistant... – poncho Mar 26 '24 at 22:07
  • 1
    @poncho Obviously! Why didn't you point that out 5 years ago? jk It shows that there is one more thing (in addition to what was pointed out in other discussions) that I should have been more precise about when asking this question. Nevertheless, I'm happy with the discussions this question sparked, because it also shows that the details often do matter a lot in cryptography. – mti Mar 27 '24 at 20:53

5 Answers5

4

First, a PRF is a keyed function, meanwhile a hash function is usually keyless. So, a hash function cannot be a PRF.

Second, a secure PRF can be swapped with a uniform random function without detection, and the best way to find a collision of a uniform random function is the birthday attack. So, if the PRF has superpolynomially large codomain, then it would be collision-resistant, since the birthday attack would take more than polynomial-time.

AYun
  • 849
  • 7
  • 12
  • 2
    I am considering the keyed setting here. Could you support your second statement by providing an algorithm that given access to a collision finding oracle distinguishes the PRF from a random function? – mti Dec 21 '17 at 16:43
  • @mti: I believe the reasoning is "truly random functions with large outputs don't have efficient oracles that find collisions, hence if you have one, you don't have a truly random function" – poncho Dec 21 '17 at 16:47
  • 1
    @poncho: What do you mean by "efficient oracle"? Do you mean "efficient algorithm"? Then you are saying that there is no efficient algorithm for finding collisions in a PRF, right? I would like to understand why! Again, I think what one has to show here is that given an algorithm for computing collisions, we can construct an algorithm for distinguishing the PRF from a randomly chosen function. – mti Dec 21 '17 at 17:23
  • 1
    In more technical terms: Suppose you had a collision-finder $A(k)$ with cost $C$ for which $\operatorname{Adv}^{\operatorname{Coll}}_H(A) = \Pr[x_0 \ne x_1, H_k(x_0) = H_k(x_1)]$ is significantly better than the birthday probability for $q$, where $k$ is uniform random and $(x_0, x_1) = A(k)$. How does that let us make a PRF-distinguisher $B(f)$ making some $q$ calls to $f$ for which $\operatorname{Adv}^{\operatorname{PRF}}_H(B) = \Pr[B(H_k) = 1] - \Pr[B(F) = 1]$ is nontrivial, where $F$ is a uniform random function? Our desired $B$ can't just pass the unknown key $k$ to $A$. – Squeamish Ossifrage Dec 21 '17 at 18:12
  • @SqueamishOssifrage: good question; by "collision-resistant", do you mean against adversaries that don't know the key, or ones that do? My comment assumed that the oracle didn't need to know the key; I would agree that the lack of resistance against adversaries that do know the key isn't a necessary property. – poncho Dec 21 '17 at 18:48
  • @poncho: What I described is the standard notion of collision resistance of a keyed hash family, as formalized in, e.g., Rogaway & Shrimpton, ‘Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance’, FSE 2004. What you're asking about is sometimes called (enhanced) target collision resistance. (For technical difference between enhanced and pedestrian cases, see paper.) – Squeamish Ossifrage Dec 21 '17 at 19:27
  • @poncho: Errr, sorry, TCR and eTCR are unrelated (but interesting and worth knowing about, for any passers-by who stumble upon this exchange!). I don't even know what the property you're referring to might be called. Certainly if $H_k$ is indistinguishable from $F$ then the best algorithm for finding a collision given $H_k$ and not $k$ is a generic birthday search. But collision resistance is a much stronger property about the family $H_k$ than what you describe. – Squeamish Ossifrage Dec 21 '17 at 19:53
  • blake3 can be keyed. – TypicalHog Mar 26 '24 at 17:49
3

No, a PRF isn't always collision resistant.

Take CBC-MAC, which is a PRF assuming prefix-free messages. If you know the key, you can create a different message that produces the same tag, which is a collision.

With hash-based MACs (e.g., HMAC), this isn't the case assuming a collision-resistant hash function is used and not something cryptographically broken like MD5 or SHA-1.

There's a difference between being weakly collision resistant (collision resistant when the key is unknown) and strongly collision resistant (even when the key is known).

And it turns out this difference matters in practice because a lack of collision resistance is what has lead to commitment attacks on popular AEAD schemes. Key/context commitment has now become a desirable property in new schemes, which can be achieved using the duplex construction or similar.

samuel-lucas6
  • 1,783
  • 7
  • 17
1

Avoiding the question really, but collision resistance isn't really a considered property of Pseudo-random functions.

Looking at the definition of a PRF:

F :: {0,1}k x {0,1}n --> {0,1}m

F ( key, x ) = y

the main consideration is how well it emulates the random function [f(x) = y] when given a random key.

Jackoson
  • 133
  • 4
1

Not any PRF is necessarily collision resist. If the PRF is a PRP, than for inputs of length of block-length, the PRP is necessarily collision resist because its on-to function, so it is impossible that two inputs are mapped to the same output. When the PRF is not PRP, depends on its output length, it can be collision resist.

Evgeni Vaknin
  • 1,076
  • 7
  • 18
-2

A PRF is indistinguishable from a random oracle. A random oracle is collision resistant. Hence, a PRF is also collision resistant.


Update: This answer lacks some clarity. The comments showed that the definition of collision resistance in the context of a PRF is not necessarily clear. A PRF appears only random to an observer that doesn't know the key. However, collision resistance typically requires that collisions are also hard to find when the key is known.

mti
  • 655
  • 3
  • 12
  • Why downvote? Please explain in comments... – mti Mar 25 '24 at 08:43
  • Did you see the Dec 21, 2017 comment made by Squeamish Ossifrage under your question? – DannyNiu Mar 25 '24 at 11:36
  • @DannyNiu Sure I did. So what? – mti Mar 25 '24 at 13:39
  • MD5 is not a PRF. It is susceptible to the length extension attack. https://en.wikipedia.org/wiki/Length_extension_attack , https://crypto.stackexchange.com/questions/3978/understanding-the-length-extension-attack My feeling is that the quality of this page is going down. Instead of high quality answers I tend to get lengthy discussions without much value recently. – mti Mar 25 '24 at 13:45
  • @DannyNiu I've added a bunch of clarifications. Hopefully this removes some confusion around the question. – mti Mar 25 '24 at 14:00
  • SHA-256 is also susceptible to length extension attack - it applies to all untruncated iterated Merkle-Damgaard hash functions. But this doesn't prevent SHA-256 from being used with HMAC. Personally, I use some tacktics when asking questions to fend off trivial answers. If you want, we can talk about tactical asking in chat. – DannyNiu Mar 25 '24 at 14:01
  • @DannyNiu, I disagree with the downvote based on the 2017 comment. IMHO that comment is incorrect or, better put, it's irrelevant to the question. A collision attack in MD5 as a hash function does not necessarily imply a collision attack on the compression function as a PRF. The key difference is that the primitive is keyed in the later case. In fact, Bellare's new proof clarified that HMAC security can be based on the PRFness of the compression function, which explains why we don't have great attacks on HMAC-MD5 even today. – Marc Ilunga Mar 25 '24 at 16:51
  • A PRF isn't always collision resistant though. Take CBC-MAC, which is a PRF assuming prefix-free messages. If you know the key, you can create a different message that produces the same tag, which is a collision. With hash-based MACs, this isn't the case due to the collision resistance of hash functions. There's a difference between being weakly collision resistant (collision resistant when the key is unknown) and collision resistant (even when the key is known). – samuel-lucas6 Mar 25 '24 at 19:37
  • 1
    @samuel-lucas6, the CBC-MAC example is a great example for the lack of collision resistance if we interpret collisions in the technical sense of collision of a keyed hash family, in which case the key is indeed given to the adversary. So I think it comes down to the OP being precise about what collision notion we are evaluating and on what primitive. I agree with your comment otherwise. – Marc Ilunga Mar 25 '24 at 22:10
  • @MarcIlunga Maybe the downvote based on the 2017 comment is baseless, but OP himself had said (and I think most can agree): "My feeling is that the quality of this page is going down. Instead of high quality answers I tend to get lengthy discussions without much value recently". While I would've expected OP to accept the highest-voted answer, it's his liberty not to. Personally, I'd propose moderator lock or close this question, and have OP ask a new one (with some tricks in wording to avoid trivial answers like the ones here). – DannyNiu Mar 26 '24 at 01:49
  • 1
    @MarcIlunga However, my vote in itself is based on my judgement that OP's reasoning is flawed. – DannyNiu Mar 26 '24 at 01:59
  • 1
    @samuel-lucas6 thank you for bringing that up. Indeed it shows that my question lacked some preciseness. I did not consider whether the key would be known, but that certainly plays a role here. I’ve unaccepted my answer. I’m considering to add some more clarity and will look into your answer in more detail later. – mti Mar 27 '24 at 20:01
  • @DannyNiu if my reasoning is flawed, then please point out the flaw. I’m happy to learn. After Samuel’s comment I realize that there is a possible distinction between knowing the key and not knowing the key, which I should have considered because it drastically affects the outcome. – mti Mar 27 '24 at 20:08
  • @DannyNiu Regarding your suggestion to close this question. I'm really happy that it wasn't closed immediately. In my view, this would have been the wrong thing to do. Finally, after more than 5 years and some valuable discussion, this question finally has an insightful answer. – mti Mar 27 '24 at 20:26