Which is the best hash algorithm considering uniqueness and slowness

Question

I couldn't find an answer, so let's try here... I thought a bit about security and came to the conclusion, that the used hash algorithm should be as slow as possible, to slow potential attackers. (Slow means like ~1.5s). I don't want to use multiple hash function or the same just a few times again, I just want a slow one. Oh, and the slowness should not ne rely on an inefficient implementation but rather on a high complexity.

Edit: No, not just a Delay. I want to slow a possible attacker stealing hashes down.

Edit 2: Assume the attacker will not implement the hash in hardware like a custom IC. The slowness of the hash function shall result out of the mathematical complexity of the hash. The 1.5s are just a number to prevent strange algorithms that take literally hours to hash.

"Should not rely on an inefficient implementation". Well, a delay is what I call an inefficient implementation. — java4ever, Sep 15 '16 at 20:08
Really a delay is inefficient compared to a slow hash burning cpu. — paparazzo, Sep 15 '16 at 20:37
@Paparazzi: That makes absolutely no sense, an attacker can just remove the delay. — Jörg W Mittag, Sep 15 '16 at 20:57
So an attacker has infiltrated your code and your protection is a slow algorithm. — paparazzo, Sep 15 '16 at 21:01
@Paparazzi: Again, that makes no sense. If an attacker has infiltrated your code, he can just remove the password checks. That's not the threat model that password hashes are designed to protect against. — Jörg W Mittag, Sep 16 '16 at 00:36
@JörgWMittag You are the one that said attacker can just remove the the delay — paparazzo, Sep 16 '16 at 01:35
@Paparazzi: Yes, exactly. There is no point in specifying an algorithm with a delay, if the attacker can just re-implement the specification, remove the delay and get the same output. That way, the attacker can brute-force the passwords fast and you can only verify them slow. In other words, not only have you done nothing to slow down the attacker, you have also added an additional attack vector for a DoS attack. — Jörg W Mittag, Sep 16 '16 at 10:07
@JörgWMittag So this is a DoS attack? Where do you get that? — paparazzo, Sep 16 '16 at 10:59

score 4 · Answer 1 · answered Sep 15 '16 at 21:22

This is what functons such as scrypt, bcrypt, and pbkdf2 are designed for. They are resource intensive and combine multiple iterations together. The number of iterations is a parameter that can be adjusted to make the algorithm as slow as you want.

score 1 · Accepted Answer · answered Sep 16 '16 at 01:03

These hashes are sometimes called "password hashes", because they are designed to protect against exactly the threat model you mention: someone getting a hold of a copy of your password database and brute-forcing it. A subset is also known as "password-based key-derivation functions (PBKDF)".

Scrypt is a relatively new, but widely-known and widely-used one that was created to fix flaws in the more established PBKDF2 and bcrypt algorithms that can be drastically sped up by custom ASICs or FPGAs, and easily parallelized on a GPU.

The central innovation in scrypt is a very large pseudo-random bitvector, which in turn is accessed very often in a pseudo-random fashion. What this means that the standard "trick" for changing the performance characteristics of code, the space-time-tradeoff is expensive in both directions. In particular, the very large bitvector makes the algorithm hard to parallelize, since you will either have lots of computing elements thrashing the memory bus (limiting the parallel speedup), or lots of copies of the very large bitvector in the individual computing elements (making parallelism expensive). The pseudo-random access pattern also ensures that branch prediction, cache prediction, memory prefetching and such cache-miss reducing optimizations are useless, and the size of the bitvector ensures that you will blow each and every cache you throw at it.

Theoretically, since both the bitvector and the access pattern are "only" pseudo-random, they are still algorithmically determined.

Ergo, you could reduce the memory requirement by just computing everything on the fly and not keeping the bitvector in memory at all. However, the algorithm is designed such that this computation is itself still very slow, and the algorithm is designed to access the same elements over and over again (but you can't easily predict when and in which order), so you would have to re-compute elements over and over again. OTOH, you could reduce the time requirement by pre-computing all possible values, but then you would get an explosion in the memory requirements.

Either way, the trade-off is prohibitively expensive: you have no choice but to use both large memory and lots of CPU cycles.

Basically, you can think of the two pseudo-random sequences as a single very complex pseudo-random sequence, and the bitvector as a cache. But, the "cache" is designed in such a way that you can neither remove it and try to make up for that with increased processing speed or parallelism, nor can you expand it and thus save processing time and pay for it with increased memory usage.

Which is the best hash algorithm considering uniqueness and slowness

2 Answers2