does SHA* collide more or less than a random function?

Question

There were several questions here regarding how surjective SHA-1/2 is, how many values don't have a pre-image from a fixed size input. I and others answered when modelling the relevant hash function as a random function. The question is can we venture if common hash functions are likely to collide more or less often than a ideal random function. Are there other cryptographic hash functions where we know exactly which direction such a weakness lies?

Somewhat relevant https://crypto.stackexchange.com/questions/48946/ — Rukako, Jul 10 '17 at 16:24
Yes that was one of questions I was refering to in my question. — Meir Maor, Jul 10 '17 at 18:13
Your wording makes this difficult to answer. There's two topics here. one is hashes and pre-images. The other is PRFs and probabilities. I think you may need to specify an exact formulation for what you want the answer to be. We can be quite confident that the difference between the output of a PRF and a sequence of related hashes is quite small (well below 0.0000000001%), so the answers are likely to depend on the exact formulation you wish to use for comparing the two. — Cort Ammon, Jul 10 '17 at 20:12
I guess i'm asking if we take a common cryptographic hash function which produces n bits, and hash all values 0..N-1 when N=2^n are we likely to get more or less thebln (1-1/e) N different values as in idealized hash. — Meir Maor, Jul 11 '17 at 03:51
@Meier Mahor: the formulation in the above comment restricts to exactly $n$-bit messages, thus exactly $2^n$ possible messages; that restriction is not in the question itself, and is a huge one (for example, SHA-256 accepts messages from $0$ to $2^{64}-1$ bits, for a whopping $2^{(2^{64})}-1$ possible messages; the restriction made drastically lowers that to $2^{256}$ messages, all hashed in a single block, and even a small subset of the $2^{447}-1$ such messages). — fgrieu, Jul 11 '17 at 05:45
The restriction is to make a concrete example. For a random function for any input&output size I can calculate expected number of distinct output values, maximum load on a specific value etc. I'm wondering how these might differ in common cryptographic hash functions. — Meir Maor, Jul 11 '17 at 06:54

score 3 · Answer 1 · answered Mar 10 '19 at 21:31

If we could prove anything one way or another about this, it would be a remarkable result worthy of publication in a cryptography journal, since—except for length extension issues in MD hashes—we usually expect these collision-resistant hash functions to behave like good little random oracles.

Of course, there are other families of functions which are not collision-resistant but from which random choices of function have bounded collision probability, namely universal hashes like Poly1305 and GHASH. We can use these as building blocks for fast authenticators or PRFs. But I don't think you were asking about those.

does SHA* collide more or less than a random function?

1 Answers1