3

I'm new to encryption and cryptography, I was wondering if there is a good or best suited AES mode for file encryption (Planning on zipping a folder and encrypt it as a file). If there is, how complex is it and is it easy to implement on python (preferred language)? Thank you.

kelalaka
  • 48,443
  • 11
  • 116
  • 196
user63579
  • 31
  • 1
  • 3

2 Answers2

3

Since you zip the directory before encryption, we can assume that the compressed directory now is random file. You can use CBC mode or CTR mode. However, these modes are not providing any authentication.

You should use authenticated encryption mode as AES-GCM.

There is another issue waits for you. How do you derive cryptographic keys from the user's password? The common method is using a KDF function as PBKDF2 or Argon2.

You generate a random AES key and encrypt the zip file with it. After the encryption, encrypt the AES key with the key derived from the user's passwords with KDF and store it together with the encrypted file.

For the random bytes generation, at least, you should use urandom.

Python has AES-GCM and PBKDF2. You can find the example codes as here and here

kelalaka
  • 48,443
  • 11
  • 116
  • 196
  • 1
    Is there a reason you recommend pycryptodome? I find PyCryptodome is not a wrapper to a separate C library like OpenSSL. To the largest possible extent, algorithms are implemented in pure Python. to be a bad thing, rather than a good thing... – Ella Rose Jan 10 '19 at 16:20
  • @EllaRose got it. – kelalaka Jan 10 '19 at 16:41
  • How about salt and base64? I havent got that deep into those topics. Do I need it here? Or it is vital to every encryption? – user63579 Jan 10 '19 at 23:44
  • Salt used to increase the entropy of the KDF, use urandom . Base64 is an encoding of the output. For the IV size of AES-GCM see. re-write since the link of base64 was missing. – kelalaka Jan 11 '19 at 16:56
  • Why should one use authenticated encryption? Isn't an attacker able to use an adjusted AES-GCM algorithm which does no authenticity-check? – tipa Sep 03 '23 at 15:36
  • @tipa What if an attacker just execute a bit flipping attack (for CBC and more easy in CTR) on your files to gain advantage on a later stage? What if you file system corrupted and you want to know how much you can rely on your encrypted files? – kelalaka Sep 03 '23 at 16:11
  • How would such attack look like in practice (using the example of OP, an encrypted zip file) if all that's important is that the attacker cannot get hold of the zipped folder contents? For example, OP zips and encrypts the folder, stores it on a USB stick, loses the USB stick and attacker finds it. How can the attacker launch an attack without any presence of the correct key? I am not saying that AES-GCM isn't useful, but I think "authentication" is not always needed in certain scenarios – tipa Sep 03 '23 at 16:40
  • @tipa Why not get the advantage of new modern designs? Each specific case need new analysis and must be set the strict requirements. Assume that you need to send this online and forgot the the requirements you must not to send it as it is. Instead of relying on human what prone to error, why not get the advantage? Well, have you ever heard the USB attack on Iran's nuclear facility? What if you later find find your usb and started to use it? I'm not saying such attack on encryption is easy but it is not impossible. Better use authentication, you will only loose some little space and CPU time... – kelalaka Sep 03 '23 at 16:58
  • @kelalaka the reason why I came along this thread and wondered if authenticated encryption was really necessary is because I would have to pull in an additional third-party dependency to the project (with all the drawbacks that this has) if I decided to use AES-GCM. I wouldn't have to do that if AES-CBC is sufficient. AES-CBC is older and has wider support. I don't doubt the correctness of your answer, but it would have helped me more if I understood why authentication is better when someone just wants to encrypt a file. And I don't understand how GCM could have prevented Stuxnet – tipa Sep 03 '23 at 18:10
  • @tipa Attack always get better, newer worse. Stuxnet example is for that the attacker that can possibility has a change to gain advantage on their behalf on a data on rest, nothing more. Authentication can prevent the bit flipping attack of the CTR mode that used in GCM. Nothing is perfect; see What are the rules for using AES-GCM correctly? and Is CTR more secure than CBC?. Though CBC is more more on TLS since the padding oracle attacks and the endless downgrade attacks. – kelalaka Sep 03 '23 at 18:25
  • @tipa And XChaCha20 is easier to use with the longer nonces than AES-GCM if you have the library and not restricted to AES. – kelalaka Sep 03 '23 at 18:27
1

I prefer AES GCM because GCM is an authenticated encryption mode (in contrast to CBC or CTR which are not). However, the one significant limitation with GCM is don't encrypt more than 64 GB of data with a single key/IV pair.

Authenticated encryption does not mean that you can tell that a specific person encrypted the file, but it does mean that you can determine if the file has been corrupted (accidentally or intentionally). This is something that most other encryption modes do not provide. You can add this functionality by implementing some sort of MAC (e.g. HMAC), but that's extra work for you.

Swashbuckler
  • 2,053
  • 10
  • 8
  • 2
    GMAC / AEAD / MAC do offer message authentication; checking the authentication tag allows you to establish that the encrypted message was generated by someone that had access to the key, after all (unlike, e.g. public key encryption). What it doesn't offer is entity authentication: you cannot establish which identity performed the encryption - unless there was just one possible entity that held the secret key, of course. – Maarten Bodewes Jan 10 '19 at 17:18
  • Im actually planning to go for GCM. But for the issue you mentioned, what can I do about it? Do I add another algorithm or change a mode? – user63579 Jan 10 '19 at 23:46
  • The easiest thing to do is to use the same key and switch IVs are regular intervals, e.g. 32 GB. You can even just increment IVs, with GCM it's ok that the IV be predictable, just that it not be reused. – Swashbuckler Jan 11 '19 at 18:26