9

From rfc 6962 It is stated that:

Note that the hash calculations for leaves and nodes differ. This domain separation is required to give second preimage resistance.

That means that whenever the hash computes on leaves a distinct known element is preappend to the element $e$: $$H(0\mathbin\|e)$$ and whenever hash applied to parent nodes for leaves $h_0=H(0\mathbin\|e_0)$ $h_1=H(0\mathbin\|e_1)$ then $1$ is being put in the beggining:

$$h_2=H(1\mathbin\|h_0\mathbin\|h_1)$$

It is not clear what is the security implication if $0,1$ is not appended to the hash to separate the two domains. The authors state that this happens to prevent second-preimage attacks. But from Merkle tree hash we require from the hash function $H$ to be collision-resistant

curious
  • 6,160
  • 6
  • 32
  • 45
  • A similar question has been asked and answered here: http://crypto.stackexchange.com/questions/2106/what-is-the-purpose-of-using-different-hash-functions-for-the-leaves-and-interna – Encombe Jan 31 '17 at 15:54

3 Answers3

9

The document you refer to describes a method for hashing lists of data entries. Assume you do not prepend $0$ or $1$. Then, the hash for the list $(e_1, e_2)$ is $H(h_1 \| h_2)$ for $h_1 = H(e_1)$ and $h_2 = H(e_2)$. It is now easy to find a second preimage of that, namely the "list" with the single entry $h_1 \| h_2$, which will be hashed to $H(h_1 \| h_2)$.

This attack does not work if you pretend $0$s and $1$s as suggested: The hash of the list $(e_1, e_2)$ is then $H(1 \| h_0 \| h_1)$ for $h_0 = H(0 \| e_0)$ and $h_1 = H(0 \| e_1)$. You now cannot easily find a preimage of that: The single entry list $h_1 \| h_2$ gets hashed to $H(0 \| h_0 \| h_1)$, and $1 \| h_0 \| h_1$ gets hashed to $H(0 \| 1 \| h_0 \| h_1)$.

Christian Matt
  • 744
  • 5
  • 15
  • 1
    I do not understand why it is now 'easy'. What you describe as an attack in the first paragraph is not an attack... The publicly known $h_1||h_2$ is hashed to a value. You did not show that you can find other $h'$, whereby $H(h')=H(h_1||h_2)$ So how you treat that as an attack? – curious Jan 31 '17 at 00:26
  • 1
    I found another List, namely $(h_1 | h_2)$ that is hashed to the same value as the list $(e_1, e_2)$. I don't have to (and cannot) find a collision for $H$ itself, but for the scheme built from $H$ for hashing lists. – Christian Matt Jan 31 '17 at 00:30
  • but that will change the height of the tree, which will be detactable in the verification phase, so that attack is not valid... – curious Jan 31 '17 at 14:36
  • 1
    What do you mean "detectable in the verification phase"? The reference describes a method for building a hash function MTH for hashing lists. The two lists I've described above are different and are hashed to the same value. Therefore, this is a second-preimage attack on MHT. The height of the tree or other internal details are not included in the output of MHT and are therefore not relevant for this attack. – Christian Matt Jan 31 '17 at 14:46
3

I believe that the issue is not what we normally call a second preimage attack on the hash function, but is actually a forgery attack on the system.

Suppose that the leaf hash was $H(e)$, and that the Merkle node hash was $h_2 = H(h_0 || h_1)$.

In that case, if we see a valid signature that involves a Merkle node computation $h_2 = H(h_0 || h_1)$, we can immediately generate a signature for the message $h_0 || h_1$ (as $h_2$ is the leaf hash for that message, and we can just copy the rest of the authentication path.

While $h_0 || h_1$ might not be an interesting message to forge, it is nevertheless a good idea to eliminate that possibility anyways.

poncho
  • 147,019
  • 11
  • 229
  • 360
  • It is not clear to me. Why $h_0||h_1$ can be forged and not $0||h_0||h_1$? – curious Jan 30 '17 at 22:57
  • 1
    Do you mean the adversary can plug into a new tree the value $H(h_0||h_1)$ as a leaf node, and claim a valid hash tree? If that is the case what prevents him from computing the value $H(0||h_0||h_1)$ and plug it into the new tree, which is a valid hash? – curious Jan 30 '17 at 23:00
  • If that is the case then the new forged tree won't be of the same length of the original one, so the will be captured in the verification phase – curious Jan 30 '17 at 23:11
  • 1
    @curious: this attack does not work against the RFC; the attacker could compute the value $H(0||h_0||h_1)$, however what he has the authentication path for is $H(1 ||h_0||h_1)$; and so he can't generate an authentication path for the value he has. Yes, the new forged tree won't be the same length; however I believe that the RFC has variable height Merkle trees, hence the attacker submitting a shorting tree isn't an issue. – poncho Jan 30 '17 at 23:56
2

A contextual explanation is typically easier to understand with this attack.

Suppose the underlying data structure being hashed was User data, with a respective firstName, middleName, lastName and age.

The resultant merkle-tree might look something like:

$$ root = H(a || b) \\ \overbrace{a = H(c || d), b = H(e || f)} \\ \overbrace{c = H(first), d = H(middle)} \overbrace{e = H(last), f = H(age)} $$

If your verification algorithm/data is unbounded, an attacker could omit $age$ and trick the verifier into thinking $f = H(age)$ is the User's actual age. This attack wouldn't make much sense against a firstName, as the binary data of $c = H(firstName)$ likely isn't human-readable.

However, if age de-serialized in such a way that any remaining bytes were discarded, an attacker could find a value that is plausible, and still verifiable.

A mitigation against this is to either use an alternative hash algorithm for leaf nodes such that:

$$H(leaf) \ne H'(leaf)$$

This can also be done by defining $H^L = H(0 || leaf)$ and $H^I = H(1 || leaf)$, where $H^L$ is the leaf hash, and $H^I$ is the inner hash.