How to securely store data? / How to securely encrypt a file?

Question

How do I safely encrypt a file?

or formulated differentely:

How do I safely store the data of my application in a file?

Note: Using established tools isn't an option for this. The tasks need to be done at program level (using libraries and such).

Please note: the purpose of this Q &A is to have a reference wo which one can point people asking for the above two (or similar) questions. Thus this isn't a duplicate, nor is the question too broad, because we need a broad question + answer to answer every possible variation of this kind of question. — SEJPM, May 17 '15 at 10:40
If you want to vote this question, please read the above note and the answer below. This question is held very broad with the purpose to serve for a lot of different question as a guide on how to accomplish the above two tasks. Other answers than the given one are possible if they reach the same level of completeness. — SEJPM, May 18 '15 at 18:40
I am voting to close as too broad because, seeing the answer, it's clear that there's just too much stuff to say on the topic. Besides, the answer to the question as written is “use an existing implementation such as dmcrypt, TrueCrypt, GPG, etc.”. — Gilles 'SO- stop being evil', May 18 '15 at 19:46
@Gilles, thank you for explaining the reasons for your vote. I truly appreciate this and try to do the same on every question I have to vote to close. To mitigate I may open a few new "questions" containing parts of this answer as answer (which seems indeed to be too large), so we can decide to redirect users to this (then probably closed) question or to the more specific ones. — SEJPM, May 18 '15 at 20:02
Given the scope, I suggest a chat or meta discussion on the topic first. There's a lot of good material in your answer, but I fear that it's all packed so tightly that it's difficult to find the relevant parts for any given problem. — Gilles 'SO- stop being evil', May 18 '15 at 20:23
@Gilles, thank you for the suggestion, I opened this on meta, as chat seems to be pretty "dead" on crypto. — SEJPM, May 18 '15 at 20:50

SEJPM · Accepted Answer · 2015-05-18T18:27:49.147

Ask yourself if you even can securely store the user's data.
This includes to check that the user won't get sniffed by the root / admin user.
Also make sure that your implementation is secure.
Employ standard code review mechanisms, make sure that your implementation has countermeasures against timing attacks (like against AES).
Also make sure that you use appropriate data types.
Using a byte* allocated using malloc usually isn't very secure. If your library provides mechanism for data storage, use them!
Your OS may also provide means to protect data while in memory, for example the data protection API in Windows only relies on the current user and can be used to conceal sensible keys in memory. Reading Schneier's Cryptography Engineering may be a very good idea for you, to ensure you don't have side-channel leaks.
Ask yourself what you can use to authenticate the user. The options are (it is explicitely allowed and recommended to combine as many of them as convenience and availability allow):
1. Passwords: Your user can supply a password or a passphrase. Do not limit him in length of this, to allow him to use good and long passphrases. This usually is a good choice, as it requires the user to know something and this hence can not be lost like an usb-stick.
2. Keyfiles: Allow the user to provide keyfiles. These files may look arbitrary, but should be rare. Allow the user to use the (cryptographical) random number generator of your application to generate strong keyfiles (of 64 bytes length or more). Examples for keyfiles are rare documents, like personal (word/) text documents or on-purpose generated files just containing random data (created using the application's random number generator).
3. Custom Hardware: If you are deploying in large scale you can force the user to be required to use a tamper-resistand hardware device. This device should support symmetric encryption with a device-held key.
4. User's Hardware: Allow the user to use his PKCS#11 or similar smartcard or cryptographic token, which provides public key encryption, to be used. Also allow the user to protect the key using the TPM (if available)
5. OS data: Allow the user to make the decryption dependant on something stored in the current OS, like the user credentials or OS provided secure key storage.
Inspect what algorithms you have at your disposal. You'll need (at worst) five classes of algorithms: Password-based key derivation functions, encryption algorithms, modes of operation, message authentication codes and cryptographically secure random number generators.
At the time of writing the following algorithms are recommended (in descending order). If functionality is already provided more top-level don't use it.
1. Cryptographically secure random number generators:
  1. Fortuna
  2. An approved algorithm. This may either be the result of a contest or some approved algorithm by some trusted institute (e.g. ANSSI, NIST, BSI, ...). Avoid algorithms with "provable" security where the institute provides the parameters, these parameters may be chosen on purpose to get back-door access. For seeding use the cryptographically secure random number generator of your OS. They are usually a bit hidden.
  3. Some "weak" standard approach. (i.e. run a blockcipher in CTR mode and delete key and IV as soon as possible) Get the key and the IV from the cryptographically secure random number generator of your OS. They are usually a bit hidden.
2. Password based key derivation functions:
  1. The winner of the PHC-competition. If there are multiple winners, ask for the differences and use the one most suited for your needs.
  2. scrypt
  3. bcrypt
  4. PBKDF2, using an efficient and secure hash function (on your platform)
  5. implement your own PBKDF2!
3. Ciphers:
  1. The winner of the CAESAR-competition (2018+). If there are multiple winners, ask for the differences and choose the appropriate one.
  2. AES. Note: For data that needs to be confidential unti 2070 or earlier, it is recommeded to use AES-128. In every else case use AES-192 or AES-256.
  3. ChaCha(preferred) or Salsa20.
  4. One of the following three: Threefish, Serpent or Twofish
  5. 3DES with 3 keys. (needs 168 bits of keying material)
4. Modes of operation (this is obsolete for CAESAR-winners):
  1. OCB (If the patent has expired)
  2. GCM
  3. EAX (preferred) or CCM
  4. CTR, below this a message authentication code is required (including)
  5. One of the following: CFB or OFB
  6. ECB, avoid at all cost, rather implement your own CTR!
5. Message authentication codes:
  1. Poly-1305 using a standardized construction
  2. HMAC, using a secure standardized hash function, preferably SHA-3 or a member of the SHA-2 family. Encrypt the data first, then authenticate the ciphertext!
  3. implement your own HMAC!
Ask yourself on how to design the header. The header should contain:
- The salt, a random 64 byte or more value that will be newly chosen on each single creation of a new file. This will be fed into the password-based key derivation function. The salt can be open, but may be authenticated using the authentication mechanism.
- The parameters for the password-based key derivation function. These parameters need to be open. They should be chosen such that key derivation needs approximiately 100ms to 1s on the target platform. A value indicating the password based key derivation function used is considered a parameter.
- A nonce or an IV for the cipher / mode. This value should be chosen at random and be stored openly.
- The remainder of the header should be encrypted and authenticated.
  The before-mentioned nonce/IV should be used along with the key derived from the password using the password-based key derivation function and the stored salt and parameters.
- The master-keys for the data part of the file. These should be random and maximal length. This is part of the encrypted part of the header
- A nonce / IV for the data part. This value should be random. This value can (and should) be part of the encrypted part of the header.
The actual data is now encrypted using the above chosen cipher/mode/authentication triple and the key/IV/nonce pair of the encrypted part of the header.

Now some question that may arise:

Is it worth it to use multiple-encryption?
No.
The benefit by double encryption is only 1-bit due to the meet-in-the-middle attack. Streamciphers may be combined easier, but it is absolutely unneccessary as all ciphers are unbreakable for the forseeable future (50 years+)
What authentication methods should I use?
As many as possible, but at least password
You need to have a very good reason not to use passwords because everything else can be attacked with admin rights or can be stolen.
You didn't mention the other four authentication methods in the header part, how should I incorporate them?
1. Keyfiles: Assume that each keyfile has low entropy. That being said, you can't really do alot against it. Derive a key from every keyfile using some digest-expansion function and your favorite secure hash function (SHA-2/3) and construct a big XOR consisting of all file's hashes and the user's password. This would look like this: $Key = PBKDF(PW) \oplus H (File_1) \oplus H(File_2) \oplus ...$, where $\oplus$ denotes bitwise XOR.
2. Custom Hardware: At the creation of the file let the device generate a symmetric key that will be kept inside the device. Perform everything up to the keyfile / password level (see directly above point) and then send this to the device for encryption. The key being used doesn't have to be unique for each file. It is suggest that one key is being used for all files. It is also possible to add an identifier to the file to identify the associated hardware key.
3. User's Hardware: Let the device generate an OpenPGP key or a valid S/MIME encryption certificate. Send the key being the result of the step directly above (or 2 steps above) to the device for decryption. It will decrypt it and return the key used to decrypt the master-keys.
4. OS data: If you have user credentials (maybe some random strings), hash them and incorporate them at the same time in the same way as the keyfiles. If your OS provides you secure storage (or your TPM does), store a key and pass it in as "keyfile", thus hash it and incorporate it in the XOR step.
Should I use an open source library or some closed-source library?
Open-Source
Chances are very good that there aren't any security issues in well-known open-source libraries, whereas closed source puts restriction in regard of when you can use it, usually you have to pay to be able to use and you can't usually inspect the code on your own to make sure it's secure.
Why did you use XOR to include keyfiles?(relates to question 3)
Convenience.
If you don't use something that is commutative, like XOR, users will run into problem because files have to be supplied in the exact same order. This gets even harder if files are just re-named but their contents are kept. And from a security perspective this is no less secure as one basically constructs a "stream cipher" with the password based key derivation function output, the hashes of the keyfiles and the other outputs (TPM, credentials,...) which is secure as long as one source is secure.

If anyone knows anything that relates to this, please comment and I'll incorporate. (like other means of authentication, other possible questions, ...). Note: biometrical stuff is excluded because it relies on software-driven access control, because of slght variation the measured data each time. — SEJPM, May 17 '15 at 10:42
"implement ur own PBKDF ?" or "HMAC" , not sure if its good idea ? definitely not — sashank, May 17 '15 at 17:08
@sashank This is meant, provided that you have a crypto library providing hash functions, PBKDF2 and HMAC are relatively easy to implement. But this case should be extremely rare. I'll add "implement your CTR" instead of ECB. — SEJPM, May 17 '15 at 17:10
Doing your own custom crypto is usually not so good advice . https://www.schneier.com/blog/archives/2015/05/amateurs_produc.html — sashank, May 17 '15 at 17:19
@sashank, I'm aware of "don't roll your own crypto", but I think recommending highly standardized easy and secure mechanism for self-implementation is still alot better than "don't authenticate at all" or "just use plain hash / plain passwords for keys". HMAC and PBKDF2 are RFC'ed and hence there's is easy documentation available, for those to implement it themselves. However every single crypto library I'm aware of supplies HMAC and PBKDF2 or better mechanism, so it would be a really bad (pre-90s! I think) library if it doesn't provide the two. — SEJPM, May 17 '15 at 17:42
@sashank, did the edit clarify thinks? I'm not sure what you're asking. Do you ask "Where the hell is XOR used?" or more something like "Why the hell do you XOR key files and not their hashes (in relation to Q5)?" — SEJPM, May 18 '15 at 18:07
first , what is a keyfile ? is it a general file with cryptographic keys ? then why dont we just call it keys ? — sashank, May 18 '15 at 18:20
@sashank by point 2 of point 2, a keyfile is a user-supplied file, that is rare (and hence provides entropy) and may be generated using a CSPRNG. I extended this point a bit. — SEJPM, May 18 '15 at 18:25

How to securely store data? / How to securely encrypt a file?

1 Answers1

Linked