34

Are checksums basically toned-down versions of cryptographic hashes? As in: they are supposed to detect errors that occur naturally/randomly as opposed to being designed to prevent a knowledgeable attacker's meticulous engineering feature?

So, essentially they are non-secure versions of cryptographic hashes, one could say? Thus for the same reason, these checksums are "cheaper" to compute than cryptographic hashes? (e.g. CRC32 vs SHA-256)

Sorry for my poor english and potentially trivial question. I just need to get the concepts straightened out.

e-sushi
  • 17,891
  • 12
  • 83
  • 229
AlanSTACK
  • 1,305
  • 2
  • 13
  • 13
  • Crc is not a "secure cryptographic hash". It has very different properties. Hashes can't be easiky reversed. Crc lets you know what bytes to add in order to produce the desired crc. – JDługosz Feb 22 '16 at 04:51
  • @JDługosz I never made the claim that CRCs were "secure cryptographic hashes". I asked whether or not they could be considered "non-secure cryptographic hashes" - and if the same general terminology could be applied to other checksums as well. – AlanSTACK Dec 12 '19 at 06:33

4 Answers4

42

Are checksums basically toned-down versions of cryptographic hashes? As in: they are supposed to detect errors that occur naturally/randomly as opposed to being designed to prevent a knowledgeable attacker's meticulous engineering feat?

That is one way to look at it. However, hash functions have many purposes. They are also meant to be one-way (an attacker cannot know the preimage without guessing), for which there is no parallel with checksums.

So, essentially they are non-secure versions of cryptographic hashes, one could say? Thus for the same reason, these checksums are "cheaper" to compute than cryptographic hashes? (e.g. CRC32 vs SHA-256)

Due to their different requirements, checksums are not just "worse, but faster hashes". They are meant to prevent particular kinds of errors. Cyclic redundancy check can detect e.g. all 1-2 bit errors in short inputs, as well as some other common classes of errors in typical applications (e.g. bursts errors). This is better than a truncated cryptographic hash of similar length would be able to do.


A cryptographic hash truncated to 32 bits can easily collide with two inputs that differ in only one or two bits, whereas a CRC won't. The CRC is geared towards reliably detecting error patterns that commonly occur in transit, so it will do better on those kinds of errors and worse on others. The short hash does optimally over all inputs, and as a result does worse than CRC on the inputs CRC is good at dealing with.

AlanSTACK
  • 1,305
  • 2
  • 13
  • 13
otus
  • 32,132
  • 5
  • 70
  • 165
  • "[CRC] can detect e.g. 1-2 bit errors in short inputs, as well as some other common classes of errors in typical applications (e.g. bursts errors). This is better than a truncated cryptographic hash of similar length would be able to do." - Really? Why? They both break down when the message is manipulated with brute-force, but CRC make it easier to craft a malicious, but valid message without brute-force. – Artjom B. Feb 21 '16 at 18:48
  • 2
    @ArtjomB. A cryptographic hash truncated to 32 bits can easily collide with two inputs that differ in only one or two bits, whereas a CRC won't. The CRC is geared towards reliably detecting error patterns that commonly occur in transit, so it will do better on those kinds of errors and worse on others. The short hash does optimally over all inputs, and as a result does worse than CRC on the inputs CRC is good at dealing with. – Thomas Feb 21 '16 at 19:21
  • 3
    @ArtjomB. An 8-bit checksum will let you detect any single-bit (or even single-byte) change anywhere in the string; a cryptographic hash truncated to 8 bits would have a 1/256 chance of not detecting a change (or whatever size). Thus, while a crypto is stronger against a malicious opponent, a checksum is better against small random changes (i.e. data transmission errors). Which one you should use depends on what you're trying to protect against. BTW, checksums and CRCs are significantly different things; CRCs protect against several classes of (random) errors that checksums do poorly at. – Gordon Davisson Feb 21 '16 at 19:22
  • @GordonDavisson, err, you understand that 8-bit checksum have exactly same 1/256 chance of not detecting error? It doesn't matter what algorithm you use when you have only 8 bits of space. Also depending on implementation, checksum is much more likely to have two errors "cancel" each other out and produce collision on data that differes on several matching bits. – Oleg V. Volkov Feb 21 '16 at 20:10
  • 3
    I see now that you wanted to stress that CRC always detects 1-2 bit errors (depending on the chosen polynomial) and a hash not necessarily. I've just read A Painless Guide to CRC Error Detection Algorithms, which makes this fact clear and the Wikipedia page doesn't for some reason. – Artjom B. Feb 21 '16 at 20:24
  • 4
    @OlegV.Volkov, with some types of errors you have a 100% chance of detection. To see why you can consider the parity check, which detects all one-bit flips. – otus Feb 21 '16 at 21:32
  • @OlegV.Volkov: Suppose one is storing memory in a not-totally-reliable memory chip where each bit has an independent one in one billion probability of reading a value opposite what was written. When reading a page, the most likely outcome is that all bits will be correct. The next most likely outcome will be that exactly one bit will be wrong. The probability of two bits being wrong will be much lower, and the probability of three or more being wrong will be lower still. If all incorrect patterns with 3 or fewer bit errors will be recognizable as wrong... – supercat Feb 22 '16 at 01:14
  • ...then if the validation code doesn't match the data, a program can check whether there's any single bit which, if changed, would make the data pass validation. If so, that change can be applied. If two bits were wrong, there would be no single bit which, if flipped, would make the data valid, and it wouldn't be possible to tell what the correct data should be, but the fact that an unrecoverable error occurred would still be detectable. – supercat Feb 22 '16 at 01:15
  • 1
    @supercat, I don't need to "suppose". I seen it firsthand last time I worked in RTB. One time people decided to augment one lightweight internal protocol with CRC8. With RTB traffic we had dozens of cases where syntax parser failed to parse clearly broken unpacked message while checksum was "correct" in the very first day. And I don't know how many of more broken messages we've got where we actually had broken VALUES with correct syntax. Of course it was quickly replaced with hash with proper avalanche behavior after that. – Oleg V. Volkov Feb 22 '16 at 01:52
  • 2
    @OlegV.Volkov: CRC8 is suitable for short packets, but with an order-128 polynomial it would be inadequate to guard anything over 15 bytes of payload. How big were your packets? – supercat Feb 22 '16 at 06:52
  • Using the birthday attack it is quite easy to construct a collision on cryptographic hashes truncated to 32 bits with inputs differing in only two bit positions. I quickly found this one sha512('0' * 9071 + '1' + '0' * (100000 - 9071)).hexdigest()[:8] and sha512('0' * 91013 + '1' + '0' * (100000 - 91013)).hexdigest()[:8]. However I don't think this approach could be used to find a collision between inputs differing in only a single bit. Is there an algorithm to construct such a collision with less than $2^{31}$ evaluations of the hash function? – kasperd Feb 22 '16 at 09:48
  • Could be worth adding that a CRC can often be used to recover the lost data for certain types of damage while a truncated hash could not do that. – OldCurmudgeon Feb 22 '16 at 14:13
9

I think it's more helpful to think of checksums as toned-down versions of message authentication codes (not hashes).

Message authentication codes (MACs) are designed to detect any modification to a message, while it is in transit. They are secure against even adversarially-chosen modifications.

Checksums are designed to detect some modifications to a message, while it is in transit. They are designed to detect random modifications: the kinds of modifications that might happen by chance (e.g., due to a burst of noise, or interference, or something), but not adversarial modifications.

As a result, checksums can be faster than MACs. But MACs can be made pretty fast....

D.W.
  • 36,365
  • 13
  • 102
  • 187
  • 6
    The key difference between a MAC and a hash is that the MAC is keyed and the hash is not. The CRC is not keyed either, so it seems more appropriate to compare the CRC to a hash. – kasperd Feb 22 '16 at 09:42
  • 4
    @kasperd, that's a difference, but far from the only relevant difference. For instance, hashes are required to be one-way; a MAC isn't, and a checksum isn't. You could compare the CRC to a hash, but then you'd find that a hash has extra requirements that don't show up for checksums and seem orthogonal, so the comparison gets messy. In contrast, MACs and checksums serve a similar purpose, with the only difference being adversarial vs not, so I personally find it more intuitive to think of checksums as being vaguely analogous to MACs. – D.W. Feb 22 '16 at 18:43
4

From my point of view, they would be extremely distant relatives. But I understand the point: both generate fixed length values that can help to indicate when integrity was somehow compromised. They should stay as distant tools with different purposes that should not be confused, but there is CRC32 to complicate the difference.

CRC32 is a checksum that derives a 32 bit long digest, that is used, for instance, to check if a compressed file was damaged while being transferred. However, the fact that it generates a 32 bit long digest led to the believe that it can be used as a cryptographic hash for integrity control. In particular, they are used as a hash function in industrial networks, where the hardware capability is usually heavily bounded and real cryptographic hashes can be a heavy choice. That does not mean that they can actually replace a cryptographic hash function to any extent, but it shows that the descriptions of both families of functions are so similar that could be mistaken by an inattentive observer.

Sergio A. Figueroa
  • 1,798
  • 12
  • 19
0

A MAC ensures a receiver that a message is authentically generated by a sender using a shared secret (key). The MAC is a pair of algorithms: generate and verify. Typically, a sender generates a tag to append to a message. This tag is generated by mixing the message, the secret key and a counter and then hashing the result. The receiver can then use the same process to verify that the message is authentic. There are no guarantees made by the MAC other than the fact that there is a low probability of two messages of having the same MAC (1/2^n). Which makes a MAC good for authentication.

According to the study done by Koopman et al.*, the 15 bit CRC in CAN guarantees that all message collisions are at least 6bits in hamming distance away from each other. This means that an error will always be detected in a message frame (82 bits) with 1-5 random bit flips. The same guarantee of error detection does not hold if you try to MAC an 82bit message into a 15bit MAC. This is why CRCs are still used today for error detection.

*Koopman, Philip, and Tridib Chakravarty. "Cyclic redundancy code (CRC) polynomial selection for embedded networks." Dependable Systems and Networks, 2004 International Conference on. IEEE, 2004.