1

I understand that the following is becoming feasible, or already is:

Find any 2 data (d1 and d2), for which SHA1(d1) = SHA1(d2)

However, it is not entirely clear to me if there is evidence of the feasibility of:

Find d2 for a specific d1, such that SHA1(d1) = SHA1(d2)

My difficulty in understanding the available literature is that I typically see the attack referred to as "seeking collisions" rather than "seeking a collision"; implying that what is being identified is two data that happen to share a SHA-1, rather than finding a datum which shares the same SHA-1 as a specific target datum.

EDIT: My question is partially redundant to, though more specific in purpose than, this question Second pre-image resistance vs Collision resistance

  • By the way: withstanding the second attack is known as second preimage resistance. – yyyyyyy Jan 15 '15 at 07:22
  • 1
    Thanks, I found another question thread using the specific term you provided. – NickBorgers Jan 15 '15 at 18:53
  • @yyyyyy : $;;;$ Yes, but incorrectly, since for second preimage resistance, d1 is chosen uniformly at random from the set of strings whose length is length(d0) (where d0 is the adversary's initial output). $:$ In particular, any collision trivially allows one to find a d1 and an easy way to perform the OP's task for that d1. $;;;;;;;$ –  Jan 16 '15 at 01:40

1 Answers1

2

That's true. There should not be any publicly known attack on SHA-1 that allows for a given $d1$ to find a $d2$ such that $h(d1) = h(d2)$.

SHA-1 is vulnerable for finding a pair of $d1$ and $d2$ such that $h(d1) = h(d2)$.

The same, however, applies to MD5, which is unusable for SSL/TLS certificates and a successful attack has been performed. So, this kind of collision can allow (under some circumstances) more that you might think.

The notable difference between SHA-1 and MD5 attacks is the cost of such attacks. For MD5, the attack is very fast. For SHA-1, the attack would be very expensive, but for some groups (e.g. governments) feasible.

(There might be also some more differences that I am not aware of. However, both of them use the Merkle–Damgård construction, which implies some common security characteristics.)

v6ak
  • 631
  • 4
  • 9
  • Interesting, I hope others weigh in to confirm. Related to the cost distinction, do you think it is more attributable to the additional 32 bits of length for a SHA-1 hash or the essential weakness in MD5's collision-resist characteristics? – NickBorgers Jan 14 '15 at 19:49
  • Added a note about Merkle–Damgård construction. – v6ak Jan 14 '15 at 20:26
  • @user2700751: Hmm… Just some intuition: The additional 32 bits would likely make it harder just $2^{16}$ times, i.e. $65,536$ times, because of birthday paradox. The number $65,536$ however applies for brute-force attacks. Since there are more efficient attacks, the number is likely to be smaller. Generating a SHA1 collision is said to cost $700,000 USD by 2015 (see https://casecurity.org/2014/11/18/the-cost-of-creating-collisions-using-sha-1/). Generationg MD5 collision is almost free, say $$0.01. That would mean it is 70,000,000 times harder to find a SHA1 collision than MD5 collision. – v6ak Jan 14 '15 at 20:40
  • @user2700751 If we accept the numbers $65,536$ and $70,000,000$, it would mean that additional 32 bits make it $65,536$ times harder, while better construction make it $1,000$ times harder. There are, however, various pitfalls, namely: 1. Cost of MD5 collision was very roughly guessed. 2. Additional 32 bits seem to have added much less security. (This seems to apply for all Merkle–Damgård constructions.) 3. MD5 is slightly (but negligibly for there purposes) faster. I guess that better design has added more security than additional 32 bits, but I am not sure. – v6ak Jan 14 '15 at 20:50
  • @v6ak I can generate an MD5 collision using an Amazon EC2 free trial, so it costs $0.00, meaning it would be infinitely harder to find a SHA1 collision. (My point being that that sort of comparison isn't very useful!) – Reid Rankin Jan 15 '15 at 02:36
  • @MrNerdHair Difficulty in cryptography isn't ordinarily measured in USD. Even if it were, it's not about how much it actually costs on the actual market, it'd be about an idealization of what it costs (the fact that Amazon will give everyone, for free, at least as much money as it costs to break MD5, doesn't mean that breaking it is in fact free). – cpast Jan 15 '15 at 07:19
  • USD might mot be the ideal unit for comparison, but it might be OK under some conditions. If you compare how many collisions you can achieve with $700,000$ USD provided that you don't reuse precomputed data across collisions, it seems OK for rough comparison. Using EC2 trial is thus not OK, but estimating a low price for MD5 collision ($$0.01 was probably overestimated) seems OK for rough comparison. Note that I've used the numbers that are available without too much effort. – v6ak Jan 15 '15 at 08:55
  • More inaccuracy seems to come from the $2^{16}$ assumption. It is said to be generally much lower for Merkle–Damgård construction. Moreover, if a hash has some weakness, it can be even lower. It's actually the reason why I believe that good design has contributed for better security more than additional 32 bits. – v6ak Jan 15 '15 at 08:58
  • The correlation between bit length and "costs" as some security measure is nothing that could be generalized in a meaningful way. At worst, thought processes like that can be dangerous pitfalls. The statement with factor $2^{16}$ ist valid for brute force attacks only. It has absolutely no meaning for sophisticated attacks, which target a specific algorithm. – tylo Jan 16 '15 at 15:43
  • @tylo Surely it can make a confusion. In general, it should work as a valid upper bound. Having an upper bound is not "absolutely no meaning". We can't use it for claiming "this algorithm is that secure", but rather for claiming "this algorithm is at most that secure". I've used it in a modified form for rough and very informal comparison of two algorithms. – v6ak Jan 16 '15 at 17:30
  • That is only true for brute forcing an algorithm, and there you can argue about the size of the keyspace. But any advantage beyond that comes from unique weaknesses in specific algorithms - regardless if this is some correlation of values (linear and differential cryptanalysis) or in the underlying mathematical problem (factoring for RSA). Even arguing about common standards like SHA and MD5, they are both considered broken. but the attacks differ quite a lot. There is a reason, why every new hash function or symmetric cipher has to withstand a lot of cryptanalysis before being used. – tylo Jan 19 '15 at 10:20