What if using a block cipher as compression function in Merkle–Damgård?

Question

The Merkle–Damgård construction builds a hash by iterating a compression function $F$, with $S_{j+1}=F(B_j,S_j)$ where $B_j$ is one of $n$ padded message blocks, $S_0$ is the IV, and $S_n$ is the hash.

Customarily, $F$ is built from a block cipher $E$ with block width the hash width and key size the message block size, as $F(B,S)=E(B,S)\boxplus S$, where $B$ is at the key input of the block cipher, and $\boxplus$ is some simple group operation (typically addition with carry suppressed across words, or perhaps just XOR).

What security property would suffer if we used $F(B,S)=E(B,S)$, e.g. in SHA-256?

I see only two obvious negative consequences:

For unknown message fragment $A$ of length multiple of the block size, with knowledge of $B$, $C$, and $H(A\|B)$, it would be trivial to compute $H(A\|C)$; sort of a length extension property on steroids, but nothing that breaks collision-resistance, first or second preimage resistance, or indistinguishably from a random oracle for one not knowing some arbitrary constant;
the IV would no longer be suitable as the only arbitrary constant unknown to the adversary in that random oracle model; but we could use a constant part of the definition of $E$ towards that, and there are plenty of such constants in the typical $E$.

On the positive side, the output of the hash of a long random message multiple of the block size would have practically as much entropy as the hash width, when there is less with the standard construction (like, 0.8272... bit less).

score 5 · Accepted Answer · answered Oct 18 '16 at 15:11

One issue is that it would make finding preimages significantly easier; with $O(2^{n/2})$ time rather than $O(2^n)$ time.

Here's how you would do such a search:

Select $2^{n/2}$ distinct initial halves of the message, and determine the intermediate state for the hash after processing each message.
Select $2^{n/2}$ distinct final halves of the message, and starting with the target hash, compute hash state backwards, determining what the state would have to be at the start of the final half of the message to derive the target.
Scan through the two lists to find a common state.

Once you have a collision, the initial half and the final hash concatenated together will hash to the target.

Making the hash compression function noninvertible prevents this, as it blocks the second step.

What if using a block cipher as compression function in Merkle–Damgård?

1 Answers1