7

If one have files $x_1, x_2, .., x_n$. What are the benefits of using hashing trees $-$ also known as Merkle trees $-$ (for example in git) instead of computing one hash value $h(x_1,x_2,..,x_n)$ ?

kelalaka
  • 48,443
  • 11
  • 116
  • 196

1 Answers1

8

A Merkle tree is used for effective retrieving or sending data on the network that you can send/retrieve the data on any order and verify the current data with additional $O(\log n)$-data transmit and in $O(\log n)$-time. Actually, only the root hash is stored $O(1)$. While keeping the root hash any data retrieved/send is verified.

\begin{array}{lcr} & \text{With Merkle Tree } & \\ \hline \text{receiver} & \text{data transmit } & \text{Databank} \\ \hline \text{keeps the root hash} & & \text{keeps the files}\\ O(1)\text{-space} & & \\ & \xrightarrow{\text{request the ith file }} & \\ & \xleftarrow{\text{sends ith file with the }O(\log n) \text{ siblings to the root hash}}\\ \text{Verification in} & & \\ O(\log n)\text{- time} & & \end{array}

The above diagram that you are the owner of the data and outsourced it. If the client wants to upload the data, first they can transmit the root hash digitally signed to the server where the diagram continues.

If you use one hash, then to verify you need to send/receive all of the data and compute the hash all over it $O(n)$-data transmit and in $O(n)$-time. There are also parallel hashing like ParallelHash of the SHA3 or Blake3. This can decrease the hashing time of $h(x_1,x_2,..,x_n)$ if you have more than one core/thread. In theory, this is $O(\log n)$, however, in practice, it may not. Still, to verify, one needs to transfer all at once, i.e. $O(n)$-data transmit.

\begin{array}{lcr} & \text{With Single Hash } & \\ \hline \text{receiver} & \text{data transmit } & \text{Databank} \\ \hline \text{keep hash} & & \text{keeps the files}\\ O(1)\text{-space} & & \\ & \xrightarrow{\text{request the ith file }} & \\ & \xleftarrow{\text{sends all files } O( n) \text{-data transmit}}\\ \text{Verification in} & & \\ O(n)\text{- time} & & \end{array}

Therefore the benefit is reduced hash time and reduced data transmission.

kelalaka
  • 48,443
  • 11
  • 116
  • 196