How secure is using a hash function as Key Derivation Function?

Question

I am developing a personal project and I have the following need: given a randomly generated seed (with enough entropy) I need to generate a number $x$ of Ed25519 key pairs. These key pairs are not stored. Therefore I need to generate with this seed the exact same key pairs anytime.

To achieve that I have the following algorithm to generate one address number $y$:

I concatenate the seed and $y$ and I calculate the SHA-3 hash: hash = SHA-3(seed+y).
Using this hash I generate an Ed25519 key pair using this hash as a seed: keyPairY = new ed25519KeyPair(hash) (ed25519KeyPair is nacl.sign.keyPair.fromSeed which expects a seed as its argument).

How secure is this kind of algorithm?

Squeamish Ossifrage · Accepted Answer · 2018-05-04T16:53:56.817

This is reasonable as long as (a) you mean some unique encoding of the seed and index $y$ when you say seed + y, (b) you never use the same seed and index for another purpose, and (c) you chose SHA3-256, or SHAKE128-256, or SHAKE256-256, and you are using the standard Ed25519 32-byte pre-master secret seed as the secret. For (a): if, e.g., the seed is always exactly 32 bytes, you can use concatenation. If the seed may vary in length, you might consider concatenating $n \mathbin\Vert \mathit{seed} \mathbin\Vert y$ where $n$ is the number of bytes in the seed so that there are no pairs of $(\mathit{seed}, y)$ that might be confounded by concatenation.

Note that the answer is specific to the SHA-3 functions above. This does not apply to everything that might be called a hash function. It is certainly not true of GHASH. It doesn't even apply to SHA-256 unless you impose the additional constraint on the seed and the index that their lengths each be fixed, or that their encodings be length-prefixed. This is because given $h = \operatorname{SHA256}(\mathit{seed} \mathbin\Vert y)$, it is easy to compute $\operatorname{SHA256}(\operatorname{pad}(\mathit{seed} \mathbin\Vert y) \mathbin\Vert y')$ without knowing the seed—the standard length-extension attack on SHA-256. If you must use SHA-256 with variable-length seeds and indices (which it sounds like you needn't, but other passersby might read this), it may be simpler to just use it with HMAC to make a PRF, if not HKDF.

Generally, you should consider using something tailored for the purpose unless you have constraints ruling it out: KMAC, if you want a Keccak-based PRF; HKDF, if you must use SHA-2 and you want structured inputs for application and purpose labels.

How do you compare this algorithm to argon2? hash = argon2d(seed + y); keyPair = ed25519.fromPrivateKey(hash); — Hisko, May 04 '18 at 05:09
@Hisko The reason to use argon2 is to raise the adversary's attack costs when the seed has low entropy, e.g. a human-chosen password. If the seed was generated by a computer with >=256 bits of entropy, or if you have some per-seed salt stored alongside a seed generated with >=128 bits of entropy, then there's no reason to use argon2. — Squeamish Ossifrage, May 04 '18 at 05:19
Can you expand on why SHA3-256 is practically safer than SHA-256 with fixed length for seed or/and index? — fgrieu, May 04 '18 at 15:05
@fgrieu Better? If they're both fixed-length or length-prefixed, as far as I'm aware there's no evidence SHA-256 fails to make a good PRF; all I meant about SHA-256 is the standard length-extension attacks, defending against which requires going beyond a merely unique encoding of the (seed, y) pairs. — Squeamish Ossifrage, May 04 '18 at 15:53

How secure is using a hash function as Key Derivation Function?

1 Answers1