Can you break PBKDF2 if you know the hash of the password?

Question

I know that PBKDF2 hashes the password a number of times, the result being a key. Can an attacker find the key if they don't know the password, but know the value of the hash of the password?

Possibly relevant: https://mathiasbynens.be/notes/pbkdf2-hmac — CodesInChaos, May 26 '22 at 09:35
What is the strength of the password? What is the output size for the key? What is the iteration? — kelalaka, May 26 '22 at 10:14
Are you asking this; the login system is using the PBKDF2 and in the same time they are generating keys from the user password. Is it safe if the attacker accesses the PBKDF2 hash of the password? — kelalaka, May 26 '22 at 16:00

fgrieu · Answer 1 · 2022-05-27T05:15:28.407

When the question's “hash” is an internal component of PBKDF2

The first section of this answer, only, assumes the question's “know the value of the hash of the password” uses “hash” for the hash function $H$ used to build PBKDF2, e.g. SHA-1 SHA-256, SHA-512.

Then there's a special case where knowing the hash of the password is a break in itself. That's when the representation as bytes of the password exceeds the block size of that hash: 64 bytes, or 128 bytes for SHA-512. UTF-8 and other multi-byte encodings make this plausible. In this case, and when PBKDF2 uses $\operatorname{HMAC}(H)$ as it's PRF (as customary), by definition of PBKDF2 and HMAC $$\operatorname{PKBDF2}(\mathsf{password},\mathsf{salt},\mathsf{count},\mathsf{dkLen})=\operatorname{PKBDF2}(H(\mathsf{password}),\mathsf{salt},\mathsf{count},\mathsf{dkLen})$$ Therefore, for such long-enough password, the known $H(\mathsf{password})$ in principle is just as valid a password as the unknown $\mathsf{password}$. If $H(\mathsf{password})$ passes the input filter used for $\mathsf{password}$ on the system under attack (which is not impossible), that can make the attacker content. See this for illustration.

Also, regardless of password length, if attackers know $H(\mathsf{password})$, then they can run password cracking (as in the rest of this answer) on that fast hash function $H$, rather than on PBKDF2. This effectively negates the protection offered by PKBDF2 and it's iteration count parameter.

When the question's “hash” is PBKDF2

Many practitioners of PBKDF2 call it a password hash, and would understand “know the value of the hash of the password” as meaning the output of PBKDF2 is known. Also, that would imply the salt and iteration count inputs of PBKDF2 are known, because salt and iteration count are typically part of what practitioners call the hash, or stored along it, or (for the iteration count) a public or guessable constant. I assume this from now on.

Then in practice often yes, the¹ password can be found with enough effort.

Attackers essentially test passwords from a list of common password, approximately from most to least common², by the algorithm normally used to check a password, only implemented in an optimized way. A typical attack software is hashcat. It can use in parallel CPUs, GPUs, and (with appropriate extensions) FPGAs or ASICs. The speedup compared to the time it takes to compute PBKDF2 in legitimate use is formidable. That's the industry of password crackers. Which I conjecture is like the proverbial iceberg: invisible for the most.

The difficulty depends immensely on the quality of the password (is it hard to guess), and almost linearly on the number of rounds. RFC2898 states

A modest number of iterations, say 1000, is not likely to be a burden for legitimate parties when computing a key, but will be a significant burden for opponents.

(emphasis mine) but that's now extremely wrong, and already was at least ill-advised³ in 2000. I've seen this $10^3$ in production use decades later, ignoring Moore's law. I've also seen $10^4$ and $10^5$, which are a progress, but still less than satisfactory.

But truth is: the security conscious thing is to drop PBKDF2, in favor of a memory hard password hash, like Argon2, scrypt, or even the obsolete and lesser but still much preferable bcrypt. This considerably adds to the cost of password cracking (at constant burden for legitimate use), by bumping the amount of memory and memory accesses an attack tool must use.

¹ Or rather, a working password that will overwhelming probability (for practical $\mathsf{dkLen}$) is the original one, or for extremely long passwords is the hash of the original password for the hash function used internally by PBKDF2, often SHA-1, SHA-256 or SHA-512.

² And if they got hold of multiple password hashes, on all these, which typically improves the probability of one success compared to attacking the password hashes one after another, due to people being widely diverse at their ability or will to select a good password.

³ I'm not jumping to the conclusion this 1000 was thrown in by RSA Security on gentle suggestion of the NSA to deliberately weaken practical uses of PBKDF2, as the parity in DES keys by IBM before that, or the use by RSA Security of Dual_EC_DRBG as the default RNG later. I'm only wondering. Especially when I see that the NIST still endorses PBKDF2:

(our standard) recommends, but does not require, the use of a memory-hard function for password derivation.

You might want to describe PBKDF2's behaviour for passwords longer than the block size, since that'll use a particular hash of the password as input instead of the password itself. — CodesInChaos, May 26 '22 at 09:33
Knowing the internal PFR output of PBKDF2 doesn't help to break the password. Do you see something other than me? — kelalaka, May 26 '22 at 16:05
@kelalaka: I tried to clarify the answer. I maintain that knowing the hash of the password as in the first part of my answer negates any benefit of the iteration count of PBKDF2. — fgrieu, May 26 '22 at 18:15

Can you break PBKDF2 if you know the hash of the password?

1 Answers1

When the question's “hash” is an internal component of PBKDF2

When the question's “hash” is PBKDF2