-2

Just wondered: I currently have TrueCrypt on my system (I am aware that there are various doubts about its penetrability, and that it is no longer under development). TC "volumes" obviously consist of bytes.

It seems to me that decryption must always be reliant on being able to recognise that you've managed to reach the unencrypted text. But supposing, of 1000 bytes of text which then gets encrypted, only about 40 of them actually contain the information (e.g. a password), and that all the 960 other bytes were just junk. And you then encrypt that as a 1000-byte TC volume?

Furthermore, you might make it so that the 40 characters that you want to hide actually consist of 7-bit ASCII characters, but you spread them over 40 * 7 / 8 = 35 8-bit bytes? So the first 8-bit byte contains all 7 bits of the first 7-bit byte, and the 1st bit of the next 7-bit byte, etc.?

It seems to me that under these circumstances a decryption app would never be able to recognise that it had actually reached the hidden text which had been encrypted.

More generally, how do decryption programs know that they've succeeded in breaking an encryption?

how my question might differ from the referenced one

I don't think my question is really the same as that one because I'm not asking about "double" encryption, which obviously merely requires more powerful computing power.

Also I should not have mentioned TrueCrypt or the business of disguising 7-bit text within 8-bit text. This is too specific. The fact that TC contains the word "TRUE" in a certain position actually makes me laugh somewhat: I didn't know that. Also the fact that many such apps may "randomly" fill with "random" zeros also makes me laugh. Can the question not be considered on its merits, rather than pointing to the inadequacies of certain existing apps, which are not really germane to the point I'm making?

What I'm trying to get at is: given that a password (or a bank account number) or any item of information may consist of a sequence of bytes which is completely indistinguishable, and I mean completely indistinguishable, from randomly generated sequences, how can you know (i.e. a human cryptographer or a decryption application) that you have found the correct way of decrypting.

In the case of the Enigma codebreakers, for example, they only managed to break the code because they were looking for, and found, human language, which obviously contains all sorts of patterns. If the Germans had only ever communicated numbers (and I don't mean numbers corresponding to "code words") to one another they could have done so and it would have been impossible to crack the code. How useful that would have been to them in WW2 is another matter. For certain purposes, however, all you need is to communicate a number.

If you are trying to encrypt a specific number, which does not contain a recognition pattern of any kind, how can decryption ever know that it's found the right way of interpreting these (encrypted) bytes?

Thus, my use of the word "disguising": if your unencrypted text does not contain any pattern or give-away indication which distinguishes it from a random sequence of bytes, how can a would-be decryptor (human or other) ever be certain the the result of the decryption is the byte-sequence which was in fact encrypted?

For clarification: I am referring in this question to encryption situations in which the key does not need to travel with the message.

PS JimmyB, in his answer to that other question, touches on what I'm wondering about. But even he is assuming that there will ultimately be some sort of underlying "plaintext" which, when found, will be identifiable as such because of patterns of some kind.

  • In TrueCrypt's case, it looks at a specific offset in the file for the ASCII string "TRUE", as well as a checksum. If it finds it, then with great probability, the password is correct. This is generally how it's done with most encryption utilities. It is futile to try to "hide" this. – forest Jun 07 '18 at 13:30
  • 2
  • There are a few erroneous assumptions in your question. First of all, the entire disk is encrypted using TrueCrypt. So there is just this one key, and plenty of information to validate it. Second, "just junk" is probably not securely randomized. For new systems, it probably consists of zeros, for old systems it may contain any previously generated data. Finally, passwords are generally not very random / well distributed; again, plenty of info to use for verification. Removing that single zero bit won't save you (but not leaking the key / password, generally, will, so no panic please). – Maarten Bodewes Jun 07 '18 at 14:02
  • "the entire disk is encrypted" ... Not sure I understand: you can make a TC volume consisting of a small number of bytes. And I shouldn't really have referenced TC anyway... too specific. Please see update. – mike rodent Jun 07 '18 at 17:27

2 Answers2

2

... any item of information may consist of a sequence of bytes which is completely indistinguishable, and I mean completely indistinguishable, from randomly generated sequences

How can a would-be decryptor (human or other) ever be certain the the result of the decryption is the byte-sequence which was in fact encrypted?

Assuming that you mean an illegitimate decryptor, as in an adversary who is not supposed to have access to the key... (if this is not the case, then the answer by Luis Casillas is more appropriate).

If the byte-sequence is actually used for something, then there may exist an oracle that says whether or not the pin/password/etc is valid.

For example, if you try to obscure a password this way, upon each guess at the key and recovery of a candidate plaintext, an attacker can try all possible substrings of the plaintext as the password to the relevant service. The login server acts as an oracle that outputs failure on an incorrect password and success on a correct password, which implies failure/success of cracking the key.

Analogous techniques exist if the truly random plaintext is itself an encryption key/MAC key/PRNG seed/etc.

What if the plaintext is not used for anything?

This seems like a contrived example; What use case is there for storing truly random strings of data if they are not being used as a source of secrecy for some kind of cryptographic scheme?

In that highly unlikely scenario, then no, there is no algorithmic oracle that can verify whether or not the string in question is valid or simply random garbage.

However, even in the scenario where you keep a pet random string whose existence serves no purpose other than to keep you comfort, there does exist a physical technique that will provide an adversary with the ability to recover your string. It's not very expensive, and it requires very little sophistication to perform.

Aside

These points are more or less irrelevant. "Cracking the key" via brute force is not a realistic threat for any competent cryptographic scheme. If you are using a setup where brute forcing the key is a realistic concern, chances are good that there are easier ways to break the scheme.

Even for schemes that consist of strong components that are assembled in a competent manner, many (if not most schemes) can be broken by guessing a users password instead of brute forcing a secret key.

Why encrypt plaintext from which you haven't first sought to eradicate patterns of all kinds as far as is humanly (or machinely) possible?

Because it is the responsibility of the cipher to ensure that ciphertexts are indistinguishable from random and that known (and chosen) plaintext does not help to recover the key. It is not the responsibility of the plaintext to meet these goals, it is the job of the plaintext to convey the intended message. The cipher can be designed explicitly to meet these goals. Placing that responsibility on the plaintext instead of the cipher is doomed to fail.

You can encrypt any plaintext that you want as many times as you want* with an algorithm such AES and it will not help an adversary to discover the key, or to learn anything more about the plaintext.

Partly what I'm suggesting is that the more the plaintext can be confined to data which is free of "patterns"...

To completely minimize the number of patterns, you would only be able to encrypt truly random bytes. Truly random bytes are not useful as a plaintext message. The plaintext message must have some pattern/structure to it if it is to be useful and have meaning.

... the more secure this must surely make the encrypted file

This is incorrect; It would only apply if you were using a classical cipher.

Modern ciphers are explicitly designed to be secure not only when an adversary knows the entire plaintext, but even when an adversary can choose plaintexts to submit to the encryption algorithm. The knowledge and structure of the plaintext provides no advantage to an adversary when using a modern encryption algorithm.

*There are technically (very) large limits to how many times you can do this without losing the security game, but for the purposes of this answer it is effectively unlimited.

Ella Rose
  • 19,603
  • 6
  • 53
  • 101
  • Thanks... I'm not saying that the plaintext is not used for anything. intrinsically meaningless alphanumeric strings can signify a bank account or password etc. I like your links, and was aware of such "human factors". Partly what I'm suggesting is that the more the plaintext can be confined to data which is free of "patterns", the more secure this must surely make the encrypted file. This might be something which could usefully be exploited in encryption "culture", the importanc of which your cartoons illustrate... – mike rodent Jun 07 '18 at 19:17
  • Why encrypt plaintext from which you haven't first sought to eradicate patterns of all kinds as far as is humanly (or machinely) possible? Again, in the case of Enigma: as is well-known, it was the operators' tendency to start their messages with "Heil Hitler" (and very very occasionally to repeat a message) which helped Bletchley Park... but supposing the Germans had missed out all spaces between words... and for that matter all vowels... ? – mike rodent Jun 07 '18 at 19:30
  • @mikerodent I have made edits to address your comments. – Ella Rose Jun 07 '18 at 20:09
  • Thanks. But we really appear to be going round in circles here "The plaintext message must have some pattern/structure to it if it is to be useful and have meaning." As I say, a bank account or password has no "meaning", is distinctly "useful"... and has no structure. Apart from that I defer to your greater knowledge of all this stuff and assume there are subtleties here I can't grasp... (!). – mike rodent Jun 07 '18 at 20:16
  • 1
    A password almost certainly does have structure - very very few people use truly random strings for a password. Additionally, passwords and bank pins are poor examples of "messages" to be encrypted - In the example you cite with enigma, very very few (if any) of the messages were bank pins or passwords. Anyways, the answer does address "What if the plaintext truly has no structure". – Ella Rose Jun 07 '18 at 20:29
  • "very few people use truly random strings for a password"... Just because someone has never posted before in a new SE forum doesn't mean they're stupid or naive. This is a disappointing response. It'd be better not to drop in any condescending red herrings. Perhaps if I use the last 10 figures of a SHA-1 key from a git commit you might concede that my password is as random as one can get. The substantive point is the value of plaintext which is not distinguishable as such. And I clearly anticipated the point about Enigma. I'm not convinced you understand. – mike rodent Jun 10 '18 at 09:21
  • @mikerodent Nowhere did I say anything of the sort.Perhaps if I use the last 10 figures of a SHA-1 key from a git commit you might concede that my password is as random as one can get - No, I would not, because the input to the hash is what determines the resistance to an adversaries ability to guess it, regardless of how random the output looks. If I (and others) failed to understand your question, then perhaps you should consider rephrasing it in a manner that we can understand. If you choose to do so, I recommend asking about an application (what do you want to do, and why?). – Ella Rose Jun 10 '18 at 14:59
0

It seems to me that decryption must always be reliant on being able to recognise that you've managed to reach the unencrypted text.

The 21st century answer to this question is that cryptographers have settled on authenticated encryption—which guarantees not only the confidentiality of messages, but also their authenticity—as the default recommended practical form of encryption. Authenticated encryption algorithms must meet the requirement that, as part of the decryption progress, they must prove that the plaintext they're recovering is authentic, or fail otherwise (and generally without revealing inauthentic plaintext).

But even if we limit our attention to old-fashioned, confidentiality-only algorithms, your statement that decryption must be able to "recognize" the plaintext is false, at least as I read it: that it must perform some sort of conditional test on its decryption state to discern whether it's completed the process of decryption or else must keep going. But no, that's not how it works. Encryption and decryption algorithms are generally just fixed sequence of steps that blindly scramble their inputs without any regard for their contents. The fact that you can recover the plaintext from the ciphertext is guaranteed by nothing more than the decryption function being the mathematical inverse of the encryption operation.

What I'm trying to get at is: given that a password (or a bank account number) or any item of information may consist of a sequence of bytes which is completely indistinguishable, and I mean completely indistinguishable, from randomly generated sequences, how can you know (i.e. a human cryptographer or a decryption application) that you have found the correct way of decrypting.

Either:

  1. You're using an authenticated encryption algorithm, in which case if decryption succeeds you're guaranteed you got the correct plaintext;
  2. You're not using an authenticated encryption algorithm, in which case you're only guaranteed you got the correct plaintext back if you can guarantee you provided the correct ciphertext and key.

Point #2 just means that unauthenticated ciphers just don't even try to solve that potential problem; they leave it to their users to deal with.

Luis Casillas
  • 14,468
  • 2
  • 31
  • 53
  • Thanks. I wasn't considering authenticated encryption, in fact. "they leave it to their users to deal with"... but in the case of the Enigma codebreakers... ? They tried ... and succeeded. – mike rodent Jun 07 '18 at 19:26