If I encrypt a file of a known format that has a lengthy header (e.g. an XML Excel file), does that render the encrypted file susceptible to a "known plain text" attack? In other words, if the first X bytes of the decrypted text are known, does that weaken the encryption for the remaining data?
3 Answers
If a popular encryption scheme is being used: No.
The typical solution is that symmetric stream/block ciphers generate a constant stream of new pseudo-random bits which are merged/XOR'd with the plaintext to produce the ciphertext.
The pseudo-random stream is seeded indirectly by the private key - so as long as the previous or future bits of the PRNG can not be infered from known bits (or infer the PRNG seed for that matter), the cipher is protected from classical known-plaintext attacks.

- 740
- 3
- 8
No proper modern encryption algorithm allows known-plaintext attacks. Even if the adversary knows all of the plaintext except one bit, and knows the ciphertext, that doesn't help him determine the value of that one unknown bit. If you use a cryptographic library then you're unlikely to use an algorithm that is broken in this sense, except ECB. Just make sure that you're using the algorithm properly — if a mode requires a random IV, use a random IV. AES, DES or 3DES with CBC, CTR, GCM, EAX, or any number of other popular modes are safe in this regard.
There is one caveat. The adversary generally knows the length of the plaintext, because it is identical or close to the length of the ciphertext. When the “real” plaintext is compressed (such as an Excel file or any other zipped format) and the compressed text is then encrypted, the adversary knows the length of the compressed text. In isolation, this is rarely enough to extract useful information. However, in some scenarios, the adversary is able to submit part of the input for encryption, and the text that encrypted contains some parts provided by the adversary and parts that must remain secret. This can allow the adversary to map the length of the compressed plaintext against his submitted inputs, and thus to determine which inputs most resemble the secret part (because they result in a better compression ratio) and to reconstruct the secret part given enough chosen inputs. This was the basis of the CRIME attack against SSL.

- 19,134
- 4
- 50
- 92
-
Accepted as the most understandable and reassuring answer. (I lack the expertise to say the most correct.) – Paul Lynch Oct 10 '13 at 17:52
-
@PaulLynch :) sorry, just after you asked it in information security, all those answers become debates, and some confusion arised. Happy that someone answered in a way that you could understand it ... – woliveirajr Oct 10 '13 at 19:29
An emphatic yes.
TL;DR: Don't try to do encryption on your own. Use a consumer-friendly solution like GPG for data at rest, or TLS for data in motion.
This is easy to test for yourself. For example, look what happens with AES in ECB mode when the same key is used to encrypt two strings that start with the same data:
echo testtesttesttesta | openssl enc -aes-128-ecb -K "a1a1a1a1" -a
y0Vu2U+d4uThMygPLppuXbrIamvJTfsHNppU8Zje1tc=
$ echo testtesttesttestb | openssl enc -aes-128-ecb -K "a1a1a1a1" -a
y0Vu2U+d4uThMygPLppuXTh3YNl0Mu3ID8W3g67Qv9A=
You can see the first part of the ciphertext is the same for both plaintexts. There are many attacks that involve this type of weakness. For example, an attacker who can inject his own plaintext somewhere into your documents can potentially discover the plaintext for everything that comes after their input.
Thankfully, the solution is simple. Use an encryption mode that requires an initialization vector. Additionally, it is crucial (for other, but equally important reasons) that you use a mode which includes an authentication tag. Modes that satisfy this requirement include GCM, EAX, and CCM. For example:
$ echo testtesttesttesta | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "11111111" -a
XFxU81mnoiEbGMAM+1jBAfJ8
$ echo testtesttesttestb | openssl enc -aes-128-gcm -K "a1a1a1a1" -iv "bcbcbcbc" -a
L+vvGULFQXt7DtmzFs95qf+v
For these modes, you must use a unique (and for some, a cryptographically random) initialization vector that seeds the ciphertext with initial randomness.
Crypto can be extremely hard to get right yourself, even if you pick a secure, modern cipher. Do yourself a favor and don't try to do it on your own; if you find yourself at the level of choosing algorithms and generating nonces, you're working at too low a level. Use GPG for data at rest, or TLS for data in motion.

- 11,002
- 1
- 38
- 53
-
A crypto-primitive is not an "encryption scheme"; of course any compliant end-user API will include IVs where applicable. – LateralFractal Oct 10 '13 at 02:31
-
I never used the words "encryption scheme" as you quote me. I have provided evidence for my claims. And on top of it all, I am actually correct. Using cryptographic primitives like AES directly as a novice is the path to ruin. Are you downvoting me because you're upset I've contradicted your answer, or because you think I am actually wrong? – Stephen Touset Oct 10 '13 at 02:59
-
2I must clearly have sock puppets to have down voted you twice, make that thrice. – LateralFractal Oct 10 '13 at 03:24
-
I downvoted. While it is true that some crypto primitives are vulnerable, the overwhelming majority of known good ciphers are not. Implementation matters and is difficult to get right, but that has no bearing on the underlying math at all. – Ayrx Oct 10 '13 at 03:27
-
@TerryChia And yet, as I'm sure you're familiar, people come to Crypto.SE daily using the bad ones. By OP's very question, he is clearly a novice and thus overwhelmingly likely to get it wrong. – Stephen Touset Oct 10 '13 at 03:32
-
4@StephenTouset On a side note, I rarely downvote answers, especially on any question I'm also answering - but I will apply TFT resolution to discourage casual sniping. – LateralFractal Oct 10 '13 at 03:34
-
@woliveirajr Knowing the first bytes of the plaintext utterly devastates that scheme, in fact. No, you can't decrypt the rest, but you can trivially manipulate the underlying data for the part of the plaintext you do know. Did you perhaps forget to include an authenticator in your encryption scheme? – Stephen Touset Oct 10 '13 at 03:37
-
I'm not going to debate the necessity of authenticated encryption for virtually all real-world contexts with you here. If you don't understand its importance, clearly nothing I say here will convince you otherwise. Suffice it to say, if you are not an expert and choose to do it on your own, you are going to get important details wrong. GPG for data at rest, TLS for data in motion should not be a controversial opinion. – Stephen Touset Oct 10 '13 at 03:49
-
While it is not the only answer to the question, the answers would be incomplete without a warning about the IV vector. To say "it depends" is surely correct. – Oct 10 '13 at 07:27
-
-
1Personally I don't like using GPG for data at rest. I prefer a simple AEAD API. – CodesInChaos Oct 10 '13 at 12:43
-
Even when using TLS one must still choose which algorithms will be used (e.g. when configuring a web server). Still, +1 for some useful information and concerns. – Paul Lynch Oct 10 '13 at 17:46
n
blocks of the file have the same data, then the firstn
blocks of the ciphertext will be identical in any scheme not involving an IV (which you never mentioned in your answer). – Stephen Touset Oct 10 '13 at 02:14