2

I know that some hashes, like MD5 or SHA-1, that were previously thought to be safe are now known to be vulnerable to collision attacks. But it is obvious that collisions exist for all hashes, given that the space of possible hashes is smaller than the space of possible contents. For example, if one considers all possible files whose size is smaller or equal to the hash size, there must be some collisions.

However, I wonder if I can be sure that hashes will be different for “small enough” differences in contents. For example, for a given hash, can I assume that:

  • All contents whose size is = the hash size will have different hashes (so that if $H(m_1) ≠ H(m_2)$ then $H(H(m_1)) ≠ H(H(m_2))$)?
  • All contents smaller that $m$ bits/bytes will have different hashes?
  • All contents that differ by less than $m$ bits/bytes will have different hashes?
  • All contents that differ by less than $m$ consecutive bits/bytes will have different values?
  • Inserting less that $m$ bits/bytes within a content will change its hash?
  • Inserting less that $m$ bits/bytes at the end/beginning of a content will change its hash?
  • Anything else?

If there are such assumptions that are true, do they survive the hash being truncated?

I guess answers to these questions are very dependent with the chosen hash functions. I’m very interested by answers about hashes of the SHA-2 and SHA-3 families, but answers about other hash functions (even MD5 and SHA-1) are welcome as well.

Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
user2233709
  • 216
  • 1
  • 4
  • @SteffenUllrich Point taken, is there a way I can request moving that question there, rather than duplicating it? –  Apr 15 '17 at 09:36
  • I've marked the question this way and if others do it to or if a moderator will do it it will be moved. – Steffen Ullrich Apr 15 '17 at 09:40

4 Answers4

6

All contents whose size is = the hash size will have different hashes (so that if hash(file1) ≠ hash(file2) then hash(hash(file1)) ≠ hash(hash(file2)))?

No, but finding such a value should be impossible for a secure hash.

All contents smaller that m bits/bytes will have different hashes?

That depends on the value of m. If m = 1 (bit or byte) then it will be true for any secure hash. If m is very large we get back into the situation that there must be identical hashes because of the pigeonhole principle.

All contents that differ by less than m bits/bytes will have different hashes?

No, because of the pigeonhole principle again. No, but finding a pair of messages that collide should be impossible for a secure hash.

All contents that differ by less than m consecutive bits/bytes will have different values?

See above.

Inserting less than m bits/bytes within a content will change its hash?

See above.

Inserting less than m bits/bytes at the end/beginning of a content will change its hash?

See above.

Anything else?

Basically it all comes down on the basic properties of secure hash values.

If there are such assumptions that are true, do they survive the hash being truncated?

In general truncating a secure hash of course limits the security, but it should only harm security by 1 bit for each 2 bits removed (for collision attacks - possibly more for other attacks, but those would have a higher security in to deal with in the first place).

I guess answers to these questions are very dependent with the chosen hash functions. I’m very interested by answers about hashes of the SHA2 and SHA3 families, but answers about other hash functions (even MD5 and SHA1) are welcome as well.

The answers above are for generic secure hash functions. MD5 / SHA-1 are obviously not considered secure anymore.

Detailing each and every security property of each and every secure hash and testing if it is vulnerable to attacks is way too broad for any answer.

kodlu
  • 22,423
  • 2
  • 27
  • 57
Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
  • Sorry for asking, but what do you call “pigeonhole principle”? 2) Do you mean that there might be 2 messages that only differ by a single bit and have the same hash? 3) About the $m$ values, sorry if this wasn’t clear, but my question was about the largest values for which each statement is true. It is obvious for me that all statements are false for large $m$ and I thought they were all true for ridiculously small $m$.
  • – user2233709 Apr 15 '17 at 22:15
  • While the first property is unusual, it can be obtained; see this. – fgrieu Apr 16 '17 at 07:21
  • Yeah, I didn't venture in the realms of hashes based on number theory, I stand corrected. – Maarten Bodewes Apr 16 '17 at 10:56