Can ChaCha be turned into a collision resistant hash function by xoring keystreams?

Question

In his paper about RFSB Bernstein states that the compression function

$(m_1,\ m_2,\ m_3,\ ...\ ,\ m_n) \rightarrow c_1[m_1]\ \oplus\ c_2[m_2]\ \oplus c_3[m_3]\ \oplus\ ... \oplus\ c_n[m_n]$

is surpsingly collsion resistant. Reading this paper, I asked myself weather one can use ChaCha with increasing counter instead of arrays to reach nonlinearity. To do so, one would use the alternative initial matrix

k k k k
m m m m 
m m m m 
c c c c

where $k$ are the constants provided by Bernstein, $m$ is the message (which takes the place of the key) and $c$ is the counter which was expanded to 128 bit since there is no need for a nonce in this scheme. The usual ChaCha algorithm is applied to this matrix.

Obviously, the maximal message length should be $2^{128}$ message blocks of 256 bits each (minus padding) to prevent counter overflow.

Also an attack who intercepts a hash $h$ and knows one message block and its position is easily able to replace it by calculating $h' = h\ \oplus\ ChaCha(m, c)\ \oplus\ ChaCha(m', c)$.

However, if the attack want's to perform a second preimage attack on $h$, he basically would have to find a $m$ with $ChaCha(m, 0) = h$, which is as hard as recovering a key from a 512-bit ChaCha keystream. The attacker can control the counter by adding dummy-blocks, which may give him an advantage given that reduced-round versions of ChaCha have differential characteristics.

However, when it comes to finding any collision, I'm not sure whether you would be able to find a collision with less than $2^{256}$ (birthday attack) calls to the compression functions.

I imagine this is secure in some applications. It can't be used where a random oracle is required, since it has properties that random oracles don't have, but it might work if an output filter is added. — Demi, Jan 13 '18 at 18:49

score 2 · Accepted Answer · answered Jan 14 '18 at 01:34

The only expectation for each $c_i[m_i]$ is that it's independent and uniformly random. ChaCha20, with the nonce/counter words acting as table index is perfectly fine here, under standard assumptions about ChaCha20. Furthermore, the $m_i$ are expected to be small—in RFSB, they are 8-bit wide—so you can put the message in the counter just as well.

As an example of using a larger message as input to each function, you may also look at Rumba20, $$\text{Rumba20}(x_1, x_2, x_3, x_4) = f_1(x_1) \oplus f_2(x_2) \oplus f_3(x_3) \oplus f_4(x_4)\,,$$ which instead uses different constants to differentiate between the 4 different input blocks, and uses 384 bits of each Salsa20 block as input.

Note that if you let the number of xors ($w$ in the RFSB paper) grow too much relatively to the output size ($r$), the compression function can become weak. This is discussed at length in Section 4.

Note also that either of these compression functions only attempt to be collision-resistant. Other properties, such as preimage resistance, need to be ensured by some other mechanism, as the papers stress.

Can ChaCha be turned into a collision resistant hash function by xoring keystreams?

1 Answers1

Linked