6

Considering the English alphabets to be encrypted why is the computation of mod 26 necessary after adding the pad to plain text. Is it just that it adds to another level of encryption or is it used so that two letters may end up with the same symbol in the cypher? 3 mod 26 is 3 and 29 mod 26 is 3.

GunJack
  • 163
  • 1
  • 5
  • 8
    Hints: (1) with mod used, and then not used, ask yourself the following: what is the highest possible ciphertext value? When would it occur? Would an adversary noticing that value in the ciphertext learn something about the plaintext? (2) Dig deeper: assume plaintext consisting of random independent characters, and a random pad: how much information (in bit or fraction thereof per plaintext character) is learned from the plaintext with mod used, and not used? – fgrieu Aug 15 '21 at 09:52
  • 1
    No two letters end up with the same symbol. Think of it like the alphabet repeats after 25. So 26 would be "a" again. 3 and 29 correspont to the same letter. Mod just makes the number smaller, so you only need 0 to 25 – jjj Aug 16 '21 at 12:34
  • Am I the only one whose first use of One Time Pad used five digit numbers with the numbers being nowhere close to either end? If it was random Gaussian rather than random uniform it leaks very little information. – Joshua Aug 16 '21 at 16:58
  • @Joshua No, you can't have a Gaussian OTP key. Unless you're missing the randomness extractor from your TRNG – Paul Uszak Aug 16 '21 at 20:48

3 Answers3

21

There are two main reasons.

First, when we encrypt data with a symmetric algorithm, we generally want each unit to encrypt or decrypt to a unit of the same size (ignoring padding and MACs). In your case, when we're using English letters, we'd want to also get English letters out, and not a set of random numbers. Similarly, when we're encrypting a byte, we also want to get a byte out, since computers usually work with bytes and it's most convenient to process them that way.

Secondly, and more important, not using modular arithmetic here leaks information, sometimes a lot of information, about the data. For example, if we're using the range 0-25 to represent our letters, if we see a 0 as the encrypted output, we know that both the pad and the input were 0, and if we see 50, we know that both the pad and the input were 25. Similarly, 49 tells us that the two numbers involved were 24 and 25 in some order. With that type of information and statistical analysis, we can probably decrypt the ciphertext.

However, if we used modular arithmetic, then the output value doesn't teach us anything about the pad or the input, since every output value is equally likely. If the pad is truly random and used only once, then it provides perfect confidentiality.

A. Hersean
  • 934
  • 10
  • 21
bk2204
  • 3,246
  • 6
  • 12
  • 1
    "we can probably decrypt the ciphertext": assuming English without space, I think that's going to be challenging. Studying when that can be done is actually an interesting problem! – fgrieu Aug 16 '21 at 07:45
  • 1
    @fgrieu Challenging, but not necessarily impossible. – user253751 Aug 16 '21 at 08:50
  • 1
    @user253751 With a random key stream I would say that a ciphertext only attack which succeeds in recovering the plaintext is typically impossible. What you can do is rule out some candidate plaintexts, so it definitely leaks information, but not generally enough for a complete break. For some shorter messages there might be enough information for a lucky guess, especially if there is some prior information about the likely content of the plaintext. – John Coleman Aug 16 '21 at 11:46
  • If the plaintext was random, I get we'd learn ≈0,7191 bit/symbol, which I think is not enough to reconstruct English even semi-reliably. How much we learn for a particular language plaintext depends on letter frequencies and the value assigned to each letter, but I think it remains significantly less than 1 bit/symbol, and that's still not much. – fgrieu Aug 16 '21 at 13:18
  • 1
    Given that the question asks specifically about mod 26, I assume the message would be in plain english with punctuation and spaces not encrypted. This would be silly if you're trying for real security. You could create tables based on the likelihood of each letter given the value and take into account how often the letter occurs and what range is possible. You could do the same for words and constrain it with the possible and likely letters to come up with a likelihood for each word. Then you could constrain that with parsing the language, and any information you know about the message... – Jason Goemaat Aug 16 '21 at 15:19
  • 1
    For example say the original message is 'a-z' and without mod the values could be added to be 'A-Z' (i.e. 'e' is 4, adding 21 gives 'z', adding 22 gives 'A', etc.). If the ciphertext has a 'd', you know the letter has to be a, b, c, or d. If the ciphertext for a word is 'aB', you know that it is 'a' and some other letter from c to z. That could be 'at', 'an', 'as', but not 'aj', 'ao', 'ap' because those aren't words. 'as' is an advertb or conjunction and the 17th most popular english word, 'at' is the 20th and a preposition, 'an' is 32nd and an article. You could not decrypt the... – Jason Goemaat Aug 16 '21 at 15:30
  • 1
    message with 100% certainty in most cases, but you could write software to come up with the most likely decryptions without too much work. – Jason Goemaat Aug 16 '21 at 15:32
  • @JasonGoemaat In classical cryptography, spaces and punctuation are suppressed rather than left unencrypted. Of course this is silly from the point of view of true security, but the unicity distance with a random key would still be close to the whole key length. You could come up with a notion of the most likely decryption, but given the astronomically large number of possible decryptions the probability in question could be well below 1%. – John Coleman Aug 16 '21 at 17:30
  • 1
    Well, to test this idea, here's a message where each letter is encoded A=0...Z=25 plus a pseudorandom “key” number between 0 and 25: 13-28-30-10-38-22-38-21-36-8-21-15-15-30-39-14-28-16-19-20-39-19-25-18-14-21-26-36-41-31-23-24-21-22-14-14-17-35-11-7-31-28-23-11-36-26-42-20-40-29-12-11-31-28-37. – dan04 Aug 16 '21 at 21:34
  • 1
    @dan04 interesting idea. There are 37979562564556140050029561655702064479432532295680000000000000000000 possible decryptions, though the vast majority wouldn't correspond to sensible English. Looking at just the first three letters it is easy to find multiple 3-letter words which could encrypt that way (and of course it need not start with a 3-letter word). To make more serious progress I would construct a DAWG for a large word list and write a script for turning cipher-text initial sequences into paths through it. – John Coleman Aug 17 '21 at 11:29
  • 2
    @dan04 I'm going to assume it's not "congratulations, dramatic illustration, milfhunter privileges" (yes "milfhunter" seriously appears in google's 10k word list). But "congratulations" is a very likely starting word given the context, and "dramatic-" is also reasonably likely. Long words are much more likely to be guessed correctly because there are more constraints on them. Of course, it could just be a coincidence that "congratulations" fits. I did not try smaller words such as "congratulations, you have" or "congratulations, this is" – user253751 Aug 17 '21 at 11:31
  • @user253751: “Congratulations” is indeed the intended first word. Now you just need to find a combination of other English words that (after removing punctuation and spaces) matches the regex [A-O][D-Z][A-Q][A-T][A-U][O-Z][A-T][A-Z][A-S][A-O][A-V][B-Z][L-Z][Q-Z][G-Z][A-X][A-Y][A-V][A-W][A-O][A-O][A-R][K-Z][A-L][A-H][G-Z][D-Z][A-X][A-L][L-Z][B-Z][R-Z][A-U][P-Z][E-Z][A-M][A-L][G-Z][D-Z][M-Z] – dan04 Aug 17 '21 at 23:51
  • 2
    @dan04 "dramatic illustration, milfhunter privileges" is such a combination :P – user253751 Aug 18 '21 at 08:15
  • 1
    @dan04 'congratulations: dictionaries provide many choices: hunt secret' is a possible decryption for which I have zero confidence. I wrote a script to find matches in segments of the cipher text using the 10K word list and was surprised at how many there were. Unless my script is buggy, there are nearly 2000 possibilities for the next word after 'congratulations'. Lucky guesses aside, this encryption method seems resistant to complete recovery of the plaintext (which of course doesn't mean it is any good). – John Coleman Aug 18 '21 at 20:39
19

Not using modulus leaks information.

For the English language, it's not so obvious. For an image, on the other hand...

what appears to be random noise on the left, and a noisy yet clearly identifiable tux on the right

Left half uses addition and modulus, right half uses addition and division by two (i.e. half brightness).

Fax
  • 303
  • 1
  • 6
  • Not sure that young TUX is an appropriate example (re. OTPs), as uneven character values can't be divided into integers. So of course you've rounded your RBG values (integers). You really need a completely different algorithm/character/key mapping. Which then wouldn't be uniformly random, but rather a group of some sort. – Paul Uszak Aug 17 '21 at 23:12
  • 4
    @PaulUszak: To be technically correct, the output format would need to support RGB values in the range [0, 510], and use lossless compression. The image is just being approximated here because of technical restrictions.

    Point is, if you “encrypt” a raster image and get something that's easily recognizable to the human eye (albeit with random “noise”), then you have a really bad encryption algorithm.

    – dan04 Aug 17 '21 at 23:48
  • @PaulUszak It's certainly possible to produce a 9-bit lossless image without division and rounding, but even if you have a monitor capable of displaying it, it won't change the conclusion. – Fax Sep 30 '21 at 15:28
3

It's not.

Encryption/decryption can be done in anyway you can think of. Maths isn't needed at all. All it represents is a 1:1 mapping (bijection) between the message and cipher text. The only important thing is that the key material is generated through a truly random (physical) process. It's just that either a modulo operation or XOR makes it simpler for a computer implementation.

This is encryption/decryption without maths using a reciprocal DIANA table:-

table

Also realise that there are more than 26 characters in cryptography. There are numbers, punctuation (very important as it can change the meaning of a sentence when declaring war on someone). Then there are enumerated codewords, like 73 (meaning something to someone). From the wiki page: " The JN-25 code used in World War II used a code book of 30,000 code groups superencrypted with 30,000 random additives. ". You might heretofore perform the mapping modulo 10, on a digit by digit basis.

Paul Uszak
  • 15,390
  • 2
  • 28
  • 77
  • 4
    The question reads: "why is the computation of mod 26 necessary after adding the pad to plain text". Your post is certainly interesting, but it doesn't answer the question. The second part of your answer is also at odds with the assumption in the first line of the question: "Considering the English alphabets to be encrypted" – Maarten Bodewes Aug 15 '21 at 13:45
  • 1
    and if you look into how the table is generated, you see e.g. under column N the two entries Ma and Nz next to each other. That tells of modular arithmetic. – ilkkachu Aug 16 '21 at 10:30
  • @MaartenBodewes My answer is correct because the question is moot. We can do "A/9" $\to$ which only involves friendship, and no maths. – Paul Uszak Aug 19 '21 at 22:45