From the BIP39 spec:
From mnemonic to seed
A user may decide to protect their mnemonic with a passphrase. If a passphrase is not present, an empty string "" is used instead.
To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).
I assume that (in UTF-8 NFKD)
means that encoded characters are hashed, rather than the original binary data. Is this correct? If so, is there a security reason why it was done this way?
One consequence of this is that, if the seed is generated from the actual encoding rather than the original bit sequence, then one cannot encode the same bit sequence into multiple languages to give access to the same private keys. This is a pretty big down-side for wallets that provide a language setting. If the user changes their language setting, for example, then they cannot retrieve and enter their wallet words in their new language.