2

Dropbox uses CRC32 and md5 to create a checksum of each 4 mb "block" as part of the their compression/file system.

I suspect that Dropbox uses an additional cryptographic hash to check if there's a collision, but let's assume that doesn't happen.

Is using CRC32 and md5 in combination cryptographically secure?

user3201068
  • 701
  • 1
  • 5
  • 18
  • Are you sure what's done amounts to a 160-bit hash that is the concatenation of (32-bit) CRC32 and (128-bit) MD5? Any pointer or evidence? Also, do you know (or have information that could help to determine) if, in the use made of that, the required property is collision resistance, or preimage resistance? That's of paramount importance, for MD5 is not cryptographically secure w.r.t. collision resistance, but largely remains so w.r.t. preimage resistance. – fgrieu Mar 04 '14 at 12:07
  • 1
    @fgrieu Collision resistance naturally. – user3201068 Mar 04 '14 at 13:28

2 Answers2

3

Disclaimer: I have no first-hand knowledge of what hash (or MAC, or whatever method) DropBox uses for de-duplication; about if it is enough to know that (and its key, for a MAC) in order to download something from DropBox; and I see slightly diverging opinions about these points.

If we consider the problem of finding a collision, the 160-bit hash defined by $$H(M)=\text{CRC32}(M)||\text{MD5}(M)$$ is NOT cryptographically secure, for it is only marginally stronger than $\text{MD5}$ is.

$\text{MD5}$ no longer provides a good protection against collisions: with this Fast Collision Attack on $\text{MD5}$ we can find a collision (for messages at least 128 byte) with cost about $2^{18}$ compression functions (finding collisions for $\text{CRC32}$ is totally trivial). Finding a collision for $H$ is harder than for $\text{MD5}$, but by a dumb method (finding $\text{MD5}$ collisions until one is also a collision for $\text{CRC32}$) only like $2^{32}$ times harder, and $2^{50}$ compression functions is non-trivial but feasible. Update: And as pointed by poncho we can (for messages at least 4 kiB) make that only like $32$ times harder, or about $2^{23}$ compression functions, which is nothing. As the saying attributed to the NSA goes, attacks only get better, they never get worse.


If we consider the problem of finding a second-preimage, $\text{MD5}$ remains impractical to attack as far a we know, and $H$ is at least as strong as $\text{MD5}$ is.

If we consider the problem of finding a first-preimage, $H$ can't be more than $2^{32}$ times easier to attack than $\text{MD5}$ is, likely that can only be approached for short messages where brute-force is the best attack, and for long messages likely this is (at least) about as hard as for $\text{MD5}$, which remains impractical to attack as far a we know.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
  • 2
    Actually, you can find a collision for $CRC32(M)||MD5(M)$ with effort of only 33 times an $MD5$ collision, or about $2^{23}$ compression function evaluations. – poncho Mar 04 '14 at 17:27
  • @poncho: I can't find right now how that 33 would hold for any collision attack against MD5 (and I have not considered the internals of the attack). Any hint, or pointer to a general result? – fgrieu Mar 04 '14 at 17:33
  • 3
    Two hints (if you want any more, submit a question): you can create a $2^{33}$-way collision with the effort of finding 33 MD5 collisions, and you can use the linearity properties of CRC so you don't have to evaluate the CRC's of all $2^{33}$ messages. – poncho Mar 04 '14 at 17:36
  • @poncho: ah I now see the clever idea; that will make a message of 4 kiB. – fgrieu Mar 04 '14 at 17:55
2

Neither CRC32 , nor MD5 are cryptographically secure. MD5 has known collision weaknesses and is therefore not to be considered cryptographically secure anymore. And CRC32 isn't even a hash… it's a “cyclic redundancy check” algorithm, which produces an “error-detecting code”. Cyclic redundancy checks are not and were never meant to be cryptographically secure.

Even if they were, Dropbox doesn't base it's file-storage on a checksum and/or colliding hash. It's not as if they simply take your upload, cut it up in 4mb parts and throw it into MD5 to prevent duplicates. They would've drowned in chaos if they would have done so. The way they handle file-storage involves smarter things like De-Duplication (with 256-bit block checksums) etc.

Rumours confirm that Dropbox may be using raw SHA256 hashes to “uniquely” identify data, and some articles explain how this can be exploited in a number of ways. Also SHA256, SHA1 and MD5 checksums have been spotted seen along with download links – which rules out that they might be relying on CRC32 and/or MD5 alone. Practical analysis of Dropbox came to the same conclusion. But not being able to peek inside the box, it's hard to tell what we're exactly looking at. All we know is what DropBox published… which isn't that much when it comes to the technologies/means they use to be able to handle such an amount of data in a (somewhat) optimal way. But it's not hard to realize that it's stronger than your CRC32 + MD5 assumption.

Anyway… setting aside DropBox-related speculations and getting back to the more important part of your question: when it comes to file integrity and checking for data-collisions, companies like Tripwire explicitly perform both an MD5 and a CRC32 check on a file to determine a change because it's hard to find collisions matching two different algorithms… from that point of view, it might be practical to use, as it will enhance collision-hardness. Yet, from my personal point of view, that merely lowers the chance of collisions minimally. Therefore, I wouldn't prefer a CRC32 & MD5 combination over (for example) SHA3… but if I would need to increase collision-resistance, and combining a CRC with a Hash would be the only available option, I would most probably agree to opt-in on CRC32 and MD5 combination as it'll surely be able to detect more collisions than MD5 or CRC32 on their own.

e-sushi
  • 17,891
  • 12
  • 83
  • 229
  • 3
    I disagree with " using a combination of CRC32 and MD5 will only be as strong as it's weakest part ". Change MD5 to SHA-512 in this statement and now, just because you can break CRC32, you can break CRC32||SHA-512 ? That's not correct. It remains that the collision-resistance of CRC32||MD5 is only marginally better than that of MD5; which is, bad. – fgrieu Mar 04 '14 at 14:04
  • 1
    Concerning your last paragraph, it really depends on what you want to achieve. E.g. if you want to find a collision for both schemes applied to the same message, then you actually have to break both to be successful. – tylo Mar 04 '14 at 14:05
  • @e-suhi: You give a single link stating that " you can download any file from Dropbox's servers if you know its Dropbox hash, which apparently is a sequence of SHA256 hashes of 4MB blocks "; but you mention rumors (plural) that confirm something similar. Can you point to the original statement, other substantiated rumor(s), or anything that substantiates such claim? – fgrieu Mar 04 '14 at 14:18
  • @fgrieu You're right… that last part was pretty unclear. That's what happens when you start out by posting a two-liner and then catch yourself editing the answer over and over again just to add all the pieces you think you forgot. ;) Anyway, I've edited my answer to – what I think now resembles a “final version”. Hope the last paragraph now makes more sense to you too? If not, please feel invited tell me where it might be lacking some nuts and bolts. – e-sushi Mar 04 '14 at 14:28
  • 1
    @fgrieu Did you check that De-Duplication link? Mainly the talk about “…lock-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256)…” and “…namely, to use the 256-bit block checksums in ZFS as hash signatures for dedup… ” point into the same direction. And some Sec-Alerts hint towards the same. – e-sushi Mar 04 '14 at 14:36
  • @fgrieu Note that I wrote “a combination of CRC32 and MD5” as that was what OP asked about. I did not write “any hash combination”… as you've interpreted it. I'm sure we agree there's a difference, as you point to that difference yourself in the argumentation of your first comment. Also, I added theoretically as I looked at it from a cipher-combining point of view (weakest-cipher influence) at first. But I also noted that Tripwire and Co use CRC32+MD5 and practically show it might indeed be securer than just MD5. – e-sushi Mar 04 '14 at 16:28
  • @e-suchi: Are you meaning that I interpret " CRC32 and md5 in combination " of the question as concatenation of CRC32 and MD5, when other forms of combination are conceivable?. Even if that was, it would remain that concatenation is a possibility, and leads to a hash that is significantly stronger than CRC32 (with the exception of the first preimage problem when hashing less than about 8 bytes). Thus, in the answer, " using a combination of CRC32 and MD5 *could theoretically only be as strong* as it's weakest part and that weakest part would be CRC32 " is wrong. – fgrieu Mar 04 '14 at 16:39
  • 1
    @fgrieu No… didn't think about concatenation. If anything, I personally would state that if you're using a CRC for anything else than a communications checksum, it's being abused. But chances are that we're currently hitting a language barrier because I have a feeling we're talking about different sides of a coin (or maybe it's simply the fact that I'm still recovering from my birthday and need sleep). Trusting in the fact you're the smarter crypto-connaisseur, I've removed the part that I think you disagreed with most. I'll look at my answer again tomorrow morning; sleep can do wonders. ;) – e-sushi Mar 04 '14 at 17:27