Hash function for progressive changes

Question

I am looking for a hashing algorithm that works like this. I start with a text file, and compute its hash. Now I know that I will remove one character at, say, position 67 and this character is a "m", and I would like to compute the new hash without reapplying the hash function to the full text, but instead using the hash of the full text and the knowledge that I removed 'm' from position 67. If I readd the "m" in the same place, I will again recompute with the same methodology and get the same initial hash.

What is the technical name for such hash functions, so that I can search some pointers around? a CRC is not what I need I guess, because (afaik) a CRC works on a stream addition, not on arbitrary changes throughout the initial data.

What problem are you trying to solve? What you're describing is a solution to some problem, not the problem itself. — Robert Harvey, Jul 17 '14 at 16:46
@robert I need to keep track of periodic changes to a file, but spot when the file "collapses" back to the original contents, eventually hundreds of changes later that as a whole happen to cancel out. — Stefano Borini, Jul 17 '14 at 16:52
Why wouldn't you just recompute the entire hash, and compare it with the original? — Robert Harvey, Jul 17 '14 at 16:57
@robert: because I would have to recompute it for every change, and the file can be large. — Stefano Borini, Jul 17 '14 at 16:59

score 1 · Answer 1 · answered Jul 17 '14 at 17:14

1

A "Divide and Conquer" strategy would work well here. Instead of hashing the entire file, hash portions of the file, maintaining an array of hashes to detect changes. The most straightforward way to do it would be to maintain a hash for each line of text.

answered Jul 17 '14 at 17:14

Robert Harvey

199,517

That was in fact my fallback solution, and I think I'll definitely follow this approach. Thanks – Stefano Borini Jul 17 '14 at 17:41
2

@StefanoBorini This is known as a hash list, and is generalized by Merkle trees. The latter have a couple of advantages, but I don't know if any of those are applicable do your use case. – Jul 17 '14 at 18:11

Hash function for progressive changes

1 Answers1