9

Are there cryptographic hash functions that have homomorphism-like properties?

E.g. satisfying following relation $h(a || b) = h(a) · h(b)$, where $h(x)$ is hash function itself, $x || y$ is concatenation and $x · y$ is some hash-specific combination function making single hash given two of them. If that makes any difference, it can be also assumed $a$ and $b$ are of equal length, and $h(x)$ is expected to produce hash sums of same length regardless of $x$.

I can think of basic, non-cryptographic example — $h(x)$ can be plain arithmetic sum of character codes modulo 256, and $x · y := (x + y) \bmod 256$, so

$$ h(\texttt{"foo"}) = (102 + 111 + 111) \bmod 256 = 68 $$ $$ h(\texttt{"bar"}) = (98 + 97 + 114) \bmod 256 = 53 $$ $$ h(\texttt{"foobar"}) = (102 + 111 + 111 + 98 + 97 + 114) \bmod 256 = 121 $$ $$ h(\texttt{"foo"}) · h(\texttt{"bar"}) = (68 + 53) \bmod 256 = 121 $$

So, is there something similar, but with high collision resistance?

Paŭlo Ebermann
  • 22,656
  • 7
  • 79
  • 117
toriningen
  • 473
  • 2
  • 12

1 Answers1

6

My understanding is that, for the even more special case where a and b are not only of equal length but some power of two times a fixed block size, all hash tree systems (also called a Merkle tree system or a binary hash chain) meet your criteria.

E.g. satisfying following relation h(a || b) = h(a) · h(b), where h(x) is hash function itself, x || y is concatenation and x · y is some hash-specific combination function making single hash given two of them.

In particular, the hash specified by the Tree Hash EXchange format (THEX) spec uses the hash-specific combination function x · y == SHA1( 0x00 || x || y ). whenever the underlying pieces of text a and b are the same length and are both some power of two times a fixed block size.

When c and d are exactly one block in size, the tree hash T() used in THEX is defined something like

T(c) == SHA1( 0x01 || c ) # only for 'c' exactly 1 block long
T(d) == SHA1( 0x01 || d ) # only for 'd' exactly 1 block long
T( c || d ) == SHA1( 0x00 || T(c) || T(d) )
            == SHA1( 0x00 || SHA1( 0x01 || c ) || SHA1( 0x01 || d ) )

Typically a block has a size of 1024 bytes; Dan Williams and Emin G¨un Sirer have written a paper on picking an optimal block size.

There are apparently two common ways to avoid the easy collisions described by " What is the purpose of using different hash functions for the leaves and internals of a hash tree? ":

  • some Merkle trees -- such as the THEX described above -- use one hash function for the leaves, and a different hash function for the internal nodes.
  • Other Merkle trees -- such as the one used by BitTorrent -- keep track of both the file length and the root SHA1 hash value, and files are considered "the same" only if both match. This allows them to use the same unmodified SHA1 hash function for both the leaves and the internal nodes. (Some people think of this as a single "tree hash value" that includes two parts, the file length and the cryptographic hash value).

Merkle trees can handle files with size that is not a power of two -- How does a "Tiger Tree Hash" handle data whose size isn't a power of two? -- but if the first file isn't a power of two times the fixed block size, concatenating 2 files doesn't give the nice relationship you wanted between the hash of the two smaller files and the hash of the bigger combined file.

David Cary
  • 5,664
  • 4
  • 21
  • 35