4

Cryptographic hash functions normally take as input a bitstring.

I am looking for a hash function that takes as input a finite multiset of values. In other words, given $S \subset \{0,1\}^*$, I want to compute $H(S)$, a hash of $S$. Also, I would like this function to be chosen so that given the hashes $H(S),H(T)$ of two multisets $S,T$, I can efficiently compute the hash $H(S \cup T)$ of their union. (If you like, you can think of this as a sort of associativity property.)

I would prefer a hash function that behaves essentially like a random oracle on this domain.

Are there any good constructions? Are there any constructions that are based upon a standard hash function and whose security is reducible to the security of the underlying hash?


Here is another way to think about it. I want a hash function that accepts a sequence of bit-strings as input. If the input is $x_1,\dots,x_m \in \{0,1\}^*$, I'll let $H(x_1,\dots,x_m)$ denote the output. I want this function to have two properties:

  • Commutative: If $y_1,\dots,y_m$ is a re-ordering of $x_1,\dots,x_m$, then $H(x_1,\dots,x_m) = H(y_1,\dots,y_m)$.

  • Associative: Given $H(x_1,\dots,x_m)$ and $H(y_1,\dots,y_n)$, there is an efficient way to compute $H(x_1,\dots,x_m,y_1,\dots,y_n)$ (without knowing the underlying entries $x_1,\dots,x_m,y_1,\dots,y_n$).

D.W.
  • 36,365
  • 13
  • 102
  • 187

1 Answers1

7

One solution is a group-based hash function. Choose an abelian group $(G,+)$ and a conventional hash function $h:\{0,1\}^* \to G$. Then, set

$$H(S) = \sum_{x \in S} h(x),$$

or in other words,

$$H(x_1,\dots,x_m) = h(x_1) + h(x_2) + \cdots + h(x_m),$$

where $+$ represents the group operation of $G$. Notice that this is commutative. It is also associative: given $H(S),H(T)$, it is easy to compute $H(S \cup T) = H(S) + H(T)$.

The security of this depends partially upon the choice of $G$. Here are some sample instantiations:

  • If you choose $G=(\mathbb{Z}/p\mathbb{Z},+)$ where $p$ is a large prime, the resulting scheme has been proposed by Bellare et al. under the name AdHash. I would expect this to provide approximately 80-bit security if you choose $p \approx 2^{1600}$.

  • If you choose $G=((\mathbb{Z}/p\mathbb{Z})^*,\times)$ where $p$ is a large prime, the resulting scheme has been proposed by Bellare et al. under the name MuHash. I would expect this to provide approximately 80-bit security if you choose $p$ such that the discrete log problem modulo $p$ provides at least $2^{80}$ security.

  • If you choose $G=(GF(2^n),+)$ (i.e., the group operation is xor of $n$-bit strings), the resulting scheme is insecure: it can be broken using a simple linear algebra based attack.

See also https://crypto.stackexchange.com/a/5231/351

D.W.
  • 36,365
  • 13
  • 102
  • 187
  • Is there any associative, noncommutative (with near zero probability of commuting) similar approach? – dawid Apr 09 '21 at 09:27
  • @olyk, the requirement for commutativity is right there in the question. If you want something different, I suggest asking a new question using the 'Ask Question' button and list your requirements. Do some research and searching first. If you do a search you might come across https://crypto.stackexchange.com/q/17935/351 which I believe has some relevant information. – D.W. Apr 09 '21 at 21:25
  • One must ask the question, what about $(GF(2^n), \times)$? The discrete logarithm is known to be easier for $GF(2^n)$, but that's not sufficient by itself, assuming we reject attacks that give a solution such as "repeat input X $n$ times to collide" where $n \approx 2^{128}$. – orlp Feb 25 '24 at 01:54