2

Let me elaborate: Say we have two distinct inputs: $A$ and $B$. We also have some arbitrary deterministic mutation protocol $M$ (for example reversing the characters and performing a Caesar cipher).

Say by chance (I understand this is extremely unlikely) $A$ and $B$ result in the same hash when run through SHA-256. However, when $A$ and $B$ are mutated with $M$ first, and then run through SHA-256, they have different hashes.

Am I correct in this thinking? Deterministic mutations would allow us to have multiple hashes for a single input which would make the probability for collisions even lower, because another input would have to have the same hashes for all mutations.

Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
kebab-case
  • 121
  • 2
  • Are you proposing to concatenate the output of the multiple hashes or just choosing one of them? I presume concatenation, but this isn't explicitly mentioned in the question. Could you edit your question at the end to make clear what you meant? – Maarten Bodewes Aug 05 '18 at 17:19

2 Answers2

3

Arbitrary mutation protocol is equivalent to a random permutation (i.e., a random invertible function); so running through $M$ is basically equivalent to encrypting with some random key $K$. You don't need something so complex as a permutation - you could be adding on a fixed prefix, suffix, etc, just as well.

From a hashing perspective, there's no real reason to do this. Your chances are already $2^{-256}$ anyway for a collision between two inputs (more inputs increase the chance via birthday). For each change you do and output you're basically increasing the bit-length, so $2^{-512}$, $2^{-768}$, etc. (so you are increasing security). But you could just as well just use SHA-512 if 256 bits isn't enough for you (and it generally runs faster on modern processors)

MotiNK
  • 324
  • 1
  • 11
  • You could use the output of the first hash as constant values (instead of using a prefix) - then you would just run the hash over the input again after outputting the first run. But yeah, still a 50% drop in performance, just using a hash with larger output is faster and likely more secure (larger state, etc.). – Maarten Bodewes Aug 05 '18 at 10:09
2

You are still getting a 256-bit hash, and if we assume the hash is effectively random, the probably of two different messages will still be the same. Now messages A and B might result in different hashes with M in place, but messages A and C might otherwise have different hashes but with M might have the same hash.

I'll give an example using a hash H(x) = x mod 13 (using a hash where generating collisions is easy).

So suppose all messages are two digit numbers. If message A is 14, and message B is 27, then H(A) = H(B) = 1.

So add a Mutation M(x) where we reverse the digits of the message x.

Now H(M(A)) = H(41) = 8, H(M(B)) = H(72) = 6, so they do have different hashes.

But consider message C = 91. H(C) = 3, different from H(A) and H(B) but H(M(C)) = H(19) = 8, same as H(M(A)).

So collisions will still occur, just with different messages.

Eugene Styer
  • 1,676
  • 1
  • 11
  • 13
  • I assumed - like MoitN - that the hash output would be concatenated. I've asked Mynic510 below the question to make this explicit (or at least indicate what is meant). – Maarten Bodewes Aug 05 '18 at 17:20
  • I may have misread the question - Concatenating both hashes would reduce the chance of a collision, but wouldn't eliminate it. – Eugene Styer Aug 05 '18 at 17:37
  • Meh, voted up none-the-less. The issue is with the missing info in the question, not so much with the answers. And the question can easily be amended as well. – Maarten Bodewes Aug 05 '18 at 18:20