$2^{64}$ versions of the same message

Question

I am reading a textbook and in there they explain the property of hash functions. In particular, they give an example of how unlikely it would be to find a second input value that would match the hash output of the original input. Here's the example:

We show now how Oscar could turn his ability to find collisions (modifying two messages) into an attack. He starts with two messages, for instance: $x_1 = \text{ Transfer \\\$10 into Oscar’s account } \\ x_2 = \text{ Transfer \\\$10,000 into Oscar’s account} $

He now alters $x_1$ and $x_2$ at "nonvisible" locations, e,g., he replaces spaces by tabs, adds spaces, etc. The meaning of the messages is the same (e.g. for a bank) but the hash changes.

Oscar tries until the condition $h(x_1)=h(x_2)$. Note that if an attacker has e.g., $64$ locations that he can alter or not, this yields $2^{64}$ versions of the same message with $2^{64}$ different hash values.

Could somebody please explain what do they mean by $2^{64}$ versions of the same message? This completely flew over my head. I know that a hash function (for example the SHA-256) produces a 64 output, so that for example:

SHA256(Transfer $10 into Oscar’s account)=250e62ddffbdf20a0ea40d69287327e8aff58b6ad49c03dab3f714b596804dc1

I understand that Oscar wants to modify Transfer $10,000 into Oscar’s account so that the output, when plugged into the SHA-256 function, yields the same output as the above. But what do they mean that the attacker can "alter or not" the 64 locations, and how does this "altering or not" yield $2^{64}$ versions of the same message?

Empty spaces what are we living for, apart from fun, isn't it clear? add 64 white space or tab this makes 64 possible positions to vary. Then you have data amount similar to the birthday attack that has 1/2 success probability. — kelalaka, Oct 14 '21 at 15:22
@kelalaka sorry, I'm quite new to the space so it's not clear for me. — Slim Shady, Oct 14 '21 at 15:36
@kelalaka change in output of the hash function. This is clear for me. I know that we can in theory arrange $x_2$ so that its hash output will be the same as $x_1$. This is all I know — Slim Shady, Oct 14 '21 at 15:48

kelalaka · Accepted Answer · 2021-11-06T16:21:50.960

The quoted text seems to talk about finding a collision of a 128-bit hash function with the Birthday attack. In a birthday attack, one creates around $\sqrt{2^{128}} = 2^{64}$ messages so that they expect to find a colliding pair with 1/2 probability.

In the described attack, Oscar wants to create two specific messages that have the same hash value.

$x_1$= Transfer \$10 into Oscar’s account
$x_2$= Transfer \$10,000 into Oscar’s account

In order to create $2^{64}$ messages, one can use invisible characters like the space and tab. If you append 64 characters to $x_1$ or $x_2$ those are either tab or space then you can get 64 locations. This makes $2^{64}$ messages that have the same meaning with high probably different hashes.

This invisible modification applies both $x_1$ and $x_2$.

Creat $2^{64}$ different strings for $x_1$ and $x_2$ and combine them in a set. In this set, we expect a collision. Keep in mind that, in this way, we may have a collision within the variant of $x_1$ (or $x_2$).

Now, Oscar seeks a way to deceive you. Oscar sends you the message $x_1$ with hash and sign paradigm and you verify it. Later oscar claims that they sent you $x_2$. They show you that the signatures are the same as the previous and here we have the conflict to resolve.

For other examples of using hash collision in realistic attacks see this question;

What are other good attack examples that use the hash collision?

Collision attack vs second pre-image attack

In the collision attack we are looking two messages $m_1$ and $m_2$ with $m_1 \neq m_2$ such $h(m_1) = h(m_2)$. In a collision attack the attacker has free of choosing the hash value, they only seek two messages that have the same hash value. This freeness reduces the attack cost. The generic cost of collision is $\mathcal{O}(\sqrt{2^{n/2}})$-time for $n$-bit output hash function.

In some other scenarios, the attacker needs second pre-image attack; given a message $m$ and it's hash value $x=h(m)$, find another message $m' \neq m$ such that $h(m)=h(m')$. This is the scenario where the attacker creates a forgery of a digital signature ( hash and sign). Given the signature, they try to find another message $m'$ such that the signature is the same as the given.

Two generic cost of secondary pre-image attack is $\mathcal{O}(\sqrt{2^n})$-time for $n$-bit hash function.

Formal definitions can be found in

Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance by P. Rogaway and T. Shrimpton, 2004.

I don't think they are talking about a birthday attack. One message seems to be fixed, otherwise one could modify both at 64 positions, making it 2^64 different versions each. Also the attack only makes sense, when x1 has been authorized somehow (for example, hash has been signed). — jjj, Oct 15 '21 at 07:30
@jjj the texts say (modifying two messages), and I made a simple argument to combine all of the variants of $x_1$ and $x_2$ into a set and look for collision as birthday bound. If one is fixed then it is a secondary pre-image attack. Didn't I say hash and sign? — kelalaka, Oct 15 '21 at 10:32
Oh, missed that. I still think it is worth mentioning that for this example one normally would have a given hash and therefore one fixted message. — jjj, Oct 15 '21 at 10:38

score 2 · Answer 2 · answered Oct 17 '21 at 10:03

2

To: ["The",""] First International Bank ["Panama",""] Subject: ["Money",""] ["Wire","Transfer"] ["Order",""] from my account ["Num","#"]1234[".",""]

I ["hear by","would like to"] Instruct ["you ",""] to ["transfer","wire","move"] ["The sum","an amount"] of ["one million","1,000,000"] ["USD","$"] to ["The following",""] account ["num","#"] 3456.

etc. This above has "$2^{14}*3$" combinations for only a begining of a relevant message. Without playing with whitespaces.

answered Oct 17 '21 at 10:03

Meir Maor

11,835
1
23
54

Nice, a language solution. – kelalaka Oct 17 '21 at 21:06
The word normally used there in English is not 'hear by' but 'hereby' (also not 'here by'). But these would (both!) be plausible mistakes you could use to generate equivalent-seeming texts. – dave_thompson_085 Nov 05 '21 at 22:08

score 0 · Answer 3 · answered Oct 16 '21 at 14:02

If I take any message, I can append a space character or tab character to get two different messages. Each of those two, I can append another space or tab to get four possible messages. A third character gives 8 possible messages, then 16 and so on.

If I take "send 10,000" and append 64 characters, and each of them is either a space or tab character, then I get exactly 2^64 different messages. And with a 64 bit hash, there's a good chance that one of these messages has the same hashcode as the "send 10" message.

The reason to add space or tab, and not just any character, is that the reader will not see these characters. If I received "Send 10,000 xlfe13^" I would be suspicious.

$2^{64}$ versions of the same message

3 Answers3