5

I was reading some guy's description of the SHA-256 algorithm and I noticed that the basic operations it uses appear trivial to reverse: addition, rotate bits, etc. When I say "reverse" I don't mean retrieving the exact input used to generate the hash, I mean finding some input which generates the same hash (finding a collision). I would like to understand why this is actually difficult.

Preimage resistance: For a given h in the output space of the hash function, it is hard to find any message x with H(x)=h. Source

Here is an general explanation for why a function can be easy to calculate in one way, but very difficult to reverse (calculate the other way). What in particular gives SHA-256 this property? If someone tried to generate a hash collision, which obstacles would they face?

SHA-256 diagram from Wikimedia commons

The initial values for A-H are predetermined in the algorithm. The actual input comes from W. In this sense the "initial state" can be thought of as a combination of user input and predetermined values. Does this contribute to the difficulty in finding collisions? If the initial values for A-H could be any arbitrary values, would it be trivial to find hash collisions?

bkoodaa
  • 629
  • 7
  • 12

1 Answers1

8

You appear to be thinking "the output of the round function appears to be a simple function of the previous round, and a word from the message; why can't we select message words to steer the output to whatever we wanted".

Well, there's something missing in the simple description you gave; message expansion. The W inputs are not just directly taken from the input message; the first 16 are, but the other 48 are linear functions of the input message. If you modify one input word, you necessarily modify a number of the W inputs; this effectively stops this simple steering strategy, as tweaking one W input (to get a desired state) will necessarily modify others (which will modify the state you're trying to set up).

This doesn't prove the SHA-256 is secure (SHA-1 does the same sort of message expansion, and we know that's weak); however it does show that this approach doesn't appear to be fruitful.

poncho
  • 147,019
  • 11
  • 229
  • 360
  • 1
    Note: The message expansion function in SHA2 family, is not linear but is a non-linear transformation, which is reason why same attacks which work against SHA1 dont work for SHA2. – rapadura Jan 22 '19 at 12:56