0

I am looking for a hashing algorithm that works like this. I start with a text file, and compute its hash. Now I know that I will remove one character at, say, position 67 and this character is a "m", and I would like to compute the new hash without reapplying the hash function to the full text, but instead using the hash of the full text and the knowledge that I removed 'm' from position 67. If I readd the "m" in the same place, I will again recompute with the same methodology and get the same initial hash.

What is the technical name for such hash functions, so that I can search some pointers around? a CRC is not what I need I guess, because (afaik) a CRC works on a stream addition, not on arbitrary changes throughout the initial data.

1 Answers1

1

A "Divide and Conquer" strategy would work well here. Instead of hashing the entire file, hash portions of the file, maintaining an array of hashes to detect changes. The most straightforward way to do it would be to maintain a hash for each line of text.

Robert Harvey
  • 199,517
  • That was in fact my fallback solution, and I think I'll definitely follow this approach. Thanks – Stefano Borini Jul 17 '14 at 17:41
  • 2
    @StefanoBorini This is known as a hash list, and is generalized by Merkle trees. The latter have a couple of advantages, but I don't know if any of those are applicable do your use case. –  Jul 17 '14 at 18:11