33

Using 7-zip 19.00, on Windows 10 1909, build 18363.592, I encrypted a text file with the contents "hello there" using AES-256 and the password "123". I did this two times, the exact same procedure, but as shown below, the output is different:

enter image description here

enter image description here


Why?

Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
super
  • 463
  • 1
  • 4
  • 9

2 Answers2

49

This is expected behavior since 7zip uses Cipher Block Chaining (CBC) mode for encryption. For which you need the Initialization Vector (IV) to be unique and unpredictable.

It was using 64-bit IV but fortunately, that was changed to 128;

Encryption strength for 7z archives was increased: the size of random initialization vector was increased from 64-bit to 128-bit, and the pseudo-random number generator was improved.

If the encryption of the same file results in the same ciphertext then we will be thought that there is a problem with the IV generation. Hopefully, from here we see that there is not.

Note 1: 7zip uses $2^{19}$-times iterated SHA256 to derive the AES-256 key from passwords. This is not safe from massive parallelization. A password with high entropy like diceware is recommended.

Note 2: The first comment under question claims that CBC mode of operations has IND-CCA actually it can't, it has IND-CPA. See the seminal work of Rogaway on modes Evaluation of Some Blockcipher Modes of Operation

kelalaka
  • 48,443
  • 11
  • 116
  • 196
  • 1
    @fgrieu If we can combine philsmd/7z2hashcat and hashcat examples, 7zip -m is 11600 (hash mode) then it is $2^{19}$ – kelalaka Feb 11 '20 at 20:30
  • 2
    Hum, I was looking at the wrong part of the source, handling some zip crypto. Indeed, the 7-zip 19.00 password-to-key is CKeyInfo::CalcKey, in CPP/7zip/Crypto/7zAes.cpp and by default (not changeable in the UI) uses $2^{19}$ iterations of something ad-hoc having to do with SHA-256 (but less than a round at each iteration, hence the $2^{18}$ which is an approximation valid for a password of 8 bytes). – fgrieu Feb 11 '20 at 21:07
  • This answer does not address the question “Why?” but addresses only the how. – zrajm Feb 14 '20 at 05:43
  • @zrajm Why they choose CBC, maybe we can find in the archives. Why CBC mode gives output different every time because it is designed this way. CBC mode has probabilistic encryption and to achieve that you need to give a random IV for every call. – kelalaka Feb 14 '20 at 07:46
  • Can you recommend a better alternative to 7z as part of your answer as well? – jaaq Feb 14 '20 at 15:02
  • @jaaq Actually, 7zip is fine, except the password mechanism. Regardless of the password mechanism, you should already use high entropy passwords as I indicated in the answer; diceware or bip-39. You can encrypt the gzip files with OpenSLL. Never use closed source. is there something in 7zip other than the passwords mechanism that doesn't use large memory to overcome the high parallelization? – kelalaka Feb 14 '20 at 21:20
  • @kelalaka thanks, that covers it. So just use e.g. scrypt or argon2id to generate the password used for encryption with 7zip – jaaq Feb 17 '20 at 08:25
  • @jaaq If you have a password with strong entropy you are fine. Using scrypt or argon2id will increase the attack time. Well, if you have a password with 256-bit entropy, there is no meaningful attack time. – kelalaka Feb 17 '20 at 13:33
  • @kelalaka are you able to share where you got the 2^18 times iterated SHA256 information from? 7Zip documentation often mentions SHA256 but I'm struggling to find the number of iterations in anything official. – Alex Hague Jul 26 '20 at 21:09
  • @AlexHague hashcat list : Hash-Mode 11600 and the explanation here https://github.com/philsmd/7z2hashcat. It seems it should be $2^{19}$. Could you verify it? – kelalaka Jul 26 '20 at 21:31
  • Why is it "not safe from massive parallelization"? – EllipticalInitial May 08 '22 at 07:01
  • @jippyjoe4 in modern key derivation we not only want huge iteration but also memory-hardness and high thread. Iteration can be highly parallelized and if your password has little strength then massive parallelization can reach it. If you add memory hardness then the ASIC and FPGA minings are doomed. This all can be calculated up to some point. – kelalaka May 08 '22 at 08:49
  • Looking at the source again, file 7z2201-src/CPP/7zip/Crypto/7zAes.cpp, code in CKeyInfo::CalcKey(), I think that the key is the SHA-256 of (by default) $2^{19}$ times (16-byte-random-IV || password || 8-byte-counter), which would be like $2^{18}$ rounds of SHA-256 for a moderate-size password. – fgrieu Dec 17 '22 at 11:58
20

Encrypting the same input multiple times, normally, is supposed to produce different outputs each time. This is so that an eavesdropper not only cannot tell that the input was hello there, but cannot even tell that the two files were produced from the same input. So for example you could send Mary the first file and Bob the second one, and an eavesdropper wouldn't be able to tell that you've sent them the exact same information.

The mechanics of how it's achieved is by one or both of:

  1. Deriving a different encryption key each time you reuse the same password, by using not just the password but also a random salt in this process;
  2. Supplying a different or random initial value (IV) to the encryption algorithm each time you call it.
Luis Casillas
  • 14,468
  • 2
  • 31
  • 53
  • 6
    Re: "This is so that an eavesdropper not only cannot tell that the input was hello there, but cannot even tell that the two files were produced from the same input": More importantly, the use of randomness prevents an eavesdropper who sees the differences between two ciphertexts from computing the differences between the two plaintexts, which can be more useful to an attacker than one might think. (See the second bullet point at https://en.wikipedia.org/wiki/One-time_pad#Exploits, where use of a one-time pad just twice let some messages be cracked.) – ruakh Feb 12 '20 at 08:01
  • Am I correct in assuming that the IV must be transported to the recipient of the message as well? If so, is that just done in clear-text/encrypted without an IV? – Wolter Feb 12 '20 at 15:03
  • 5
    @Bananenaffe: Random IVs must be transmitted along with the message, yes, but there's also designs that use or allow counter IVs which can often determined by context. And random IVs are safe to send unencrypted because they're random bits and thus (in theory) don't reveal anything. Counter IVs reveal at most how many messages have been sent so far and their order. – Luis Casillas Feb 12 '20 at 17:58