Encrypting files with known headers

Question

If I encrypt a file of a known format that has a lengthy header (e.g. an XML Excel file), does that render the encrypted file susceptible to a "known plain text" attack? In other words, if the first X bytes of the decrypted text are known, does that weaken the encryption for the remaining data?

score 7 · Answer 1 · edited Apr 13 '17 at 12:48

7

If a popular encryption scheme is being used: No.

The typical solution is that symmetric stream/block ciphers generate a constant stream of new pseudo-random bits which are merged/XOR'd with the plaintext to produce the ciphertext.

The pseudo-random stream is seeded indirectly by the private key - so as long as the previous or future bits of the PRNG can not be infered from known bits (or infer the PRNG seed for that matter), the cipher is protected from classical known-plaintext attacks.

edited Apr 13 '17 at 12:48

Community

1

answered Oct 10 '13 at 02:01

LateralFractal

740
3
8

1

If the private key is the same, and the first n blocks of the file have the same data, then the first n blocks of the ciphertext will be identical in any scheme not involving an IV (which you never mentioned in your answer). – Stephen Touset Oct 10 '13 at 02:14
2

Sigh. I would assume someone using an encryption scheme without subverting it at the code level would follow the scheme's initialisation process. You not referring to a plaintext attack of a single message but across messages; where we create fresh PRNGs for the same private key by adding some public random bits/IV to the stream. The IV and private key influence the entropic state equally for a properly implemented cipher; so the IV aspect is relevant across key re-use only. – LateralFractal Oct 10 '13 at 02:29
+1, but I thought that the "constant stream of ... random bits" was a property of stream ciphers, not block ciphers, though I see from your link that the CTR mode makes a block cipher act like a stream cipher. So, does what you say not apply to other modes of block ciphers (ECB aside)? Also, a very "popular" encryption scheme is zip file encryption, which is notoriously bad; it would be helpful if your answer could be more specific about which "popular encryption schemes" are okay in the context of this question. Thanks! – Paul Lynch Oct 10 '13 at 18:02
@PaulLynch Now that this question has migrated across to cryptography.se, Gilles more in-depth answer is the most suitable for the level of detail you desire :-). In theory, as computers are linear and deterministic, block and steam ciphers differ only in whether and when working memory for avalanching is written out to the ciphertext. In practice, most ciphers are cognitive tarpits of interrelated bitwise-avalanches and best left as end-user black-boxes called 'block' or 'stream' depending on the staggering of input and output data. – LateralFractal Oct 10 '13 at 21:44
@PaulLynch On a side note, was this question a list question initially? I think the kerfuffle of Stephen vs Woliveirajr and I was actually regarding implicit assumptions about the question. – LateralFractal Oct 10 '13 at 21:55
@LateralFractal If by "list question" you mean was I asking for a recommended approach for encrypting in this situation, no I wasn't. However, if the answer is "sometimes", then some discussion of the circumstances under which there is a problem is (I think) warranted, and if that incidentally pulls in some recommendations, those can be helpful. – Paul Lynch Oct 10 '13 at 23:37

score 6 · Accepted Answer · edited Mar 17 '17 at 10:46

No proper modern encryption algorithm allows known-plaintext attacks. Even if the adversary knows all of the plaintext except one bit, and knows the ciphertext, that doesn't help him determine the value of that one unknown bit. If you use a cryptographic library then you're unlikely to use an algorithm that is broken in this sense, except ECB. Just make sure that you're using the algorithm properly — if a mode requires a random IV, use a random IV. AES, DES or 3DES with CBC, CTR, GCM, EAX, or any number of other popular modes are safe in this regard.

There is one caveat. The adversary generally knows the length of the plaintext, because it is identical or close to the length of the ciphertext. When the “real” plaintext is compressed (such as an Excel file or any other zipped format) and the compressed text is then encrypted, the adversary knows the length of the compressed text. In isolation, this is rarely enough to extract useful information. However, in some scenarios, the adversary is able to submit part of the input for encryption, and the text that encrypted contains some parts provided by the adversary and parts that must remain secret. This can allow the adversary to map the length of the compressed plaintext against his submitted inputs, and thus to determine which inputs most resemble the secret part (because they result in a better compression ratio) and to reconstruct the secret part given enough chosen inputs. This was the basis of the CRIME attack against SSL.

Accepted as the most understandable and reassuring answer. (I lack the expertise to say the most correct.) — Paul Lynch, Oct 10 '13 at 17:52
@PaulLynch :) sorry, just after you asked it in information security, all those answers become debates, and some confusion arised. Happy that someone answered in a way that you could understand it ... — woliveirajr, Oct 10 '13 at 19:29

score 0 · Answer 3 · answered Oct 10 '13 at 02:27

0

An emphatic yes.

TL;DR: Don't try to do encryption on your own. Use a consumer-friendly solution like GPG for data at rest, or TLS for data in motion.

This is easy to test for yourself. For example, look what happens with AES in ECB mode when the same key is used to encrypt two strings that start with the same data:

echo testtesttesttesta | openssl enc -aes-128-ecb -K "a1a1a1a1" -a 
y0Vu2U+d4uThMygPLppuXbrIamvJTfsHNppU8Zje1tc=
$ echo testtesttesttestb | openssl enc -aes-128-ecb -K "a1a1a1a1" -a
y0Vu2U+d4uThMygPLppuXTh3YNl0Mu3ID8W3g67Qv9A=

You can see the first part of the ciphertext is the same for both plaintexts. There are many attacks that involve this type of weakness. For example, an attacker who can inject his own plaintext somewhere into your documents can potentially discover the plaintext for everything that comes after their input.

Thankfully, the solution is simple. Use an encryption mode that requires an initialization vector. Additionally, it is crucial (for other, but equally important reasons) that you use a mode which includes an authentication tag. Modes that satisfy this requirement include GCM, EAX, and CCM. For example:

$ echo testtesttesttesta | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "11111111" -a
XFxU81mnoiEbGMAM+1jBAfJ8
$ echo testtesttesttestb | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "bcbcbcbc" -a
L+vvGULFQXt7DtmzFs95qf+v

For these modes, you must use a unique (and for some, a cryptographically random) initialization vector that seeds the ciphertext with initial randomness.

Crypto can be extremely hard to get right yourself, even if you pick a secure, modern cipher. Do yourself a favor and don't try to do it on your own; if you find yourself at the level of choosing algorithms and generating nonces, you're working at too low a level. Use GPG for data at rest, or TLS for data in motion.

answered Oct 10 '13 at 02:27

Stephen Touset

11,002
1
38
53

A crypto-primitive is not an "encryption scheme"; of course any compliant end-user API will include IVs where applicable. – LateralFractal Oct 10 '13 at 02:31
I never used the words "encryption scheme" as you quote me. I have provided evidence for my claims. And on top of it all, I am actually correct. Using cryptographic primitives like AES directly as a novice is the path to ruin. Are you downvoting me because you're upset I've contradicted your answer, or because you think I am actually wrong? – Stephen Touset Oct 10 '13 at 02:59
2

I must clearly have sock puppets to have down voted you twice, make that thrice. – LateralFractal Oct 10 '13 at 03:24
I downvoted. While it is true that some crypto primitives are vulnerable, the overwhelming majority of known good ciphers are not. Implementation matters and is difficult to get right, but that has no bearing on the underlying math at all. – Ayrx Oct 10 '13 at 03:27
@TerryChia And yet, as I'm sure you're familiar, people come to Crypto.SE daily using the bad ones. By OP's very question, he is clearly a novice and thus overwhelmingly likely to get it wrong. – Stephen Touset Oct 10 '13 at 03:32
4

@StephenTouset On a side note, I rarely downvote answers, especially on any question I'm also answering - but I will apply TFT resolution to discourage casual sniping. – LateralFractal Oct 10 '13 at 03:34
@woliveirajr Knowing the first bytes of the plaintext utterly devastates that scheme, in fact. No, you can't decrypt the rest, but you can trivially manipulate the underlying data for the part of the plaintext you do know. Did you perhaps forget to include an authenticator in your encryption scheme? – Stephen Touset Oct 10 '13 at 03:37
I'm not going to debate the necessity of authenticated encryption for virtually all real-world contexts with you here. If you don't understand its importance, clearly nothing I say here will convince you otherwise. Suffice it to say, if you are not an expert and choose to do it on your own, you are going to get important details wrong. GPG for data at rest, TLS for data in motion should not be a controversial opinion. – Stephen Touset Oct 10 '13 at 03:49
While it is not the only answer to the question, the answers would be incomplete without a warning about the IV vector. To say "it depends" is surely correct. – Oct 10 '13 at 07:27
@woliveirajr - You are welcome, hoped the presentation will help somebody. – Oct 10 '13 at 12:00
1

Personally I don't like using GPG for data at rest. I prefer a simple AEAD API. – CodesInChaos Oct 10 '13 at 12:43
Even when using TLS one must still choose which algorithms will be used (e.g. when configuring a web server). Still, +1 for some useful information and concerns. – Paul Lynch Oct 10 '13 at 17:46

Encrypting files with known headers

3 Answers3

Linked