4

I know there are already of few questions about this and I'm working with the advices that were given but I still doubt my approach is the fastest, so I'd really appreciate if you helped me find a faster way.

What I want to do: I have 2 encrypted messages (about 200 characters each in HEX) and I want to decrypt them. I know that the same key was used on both of them and that the original message is written in English (should be something about a book), so I attempted to crack it with crib-words (" the ", "the ", "he t" ..., same with " and ", " book " and " novel ") but I was only able to find two words so far ("guess" in one text, "the" in the other) which is kinda sad.

I made a little Java program (just in Eclipse) that's doing the following:

  • Get 3 Strings with the same length as input (text A, text B and the crib repeated over and over again)
  • Take 2 int, do A XOR B, take the next 2,... repeat until the end of the String is reached
  • Do the same thing with the result of A XOR B and the crib (in the same "for")
  • Output it as Hex

I'm then using a little applet to convert the Hex to ASCII.

I read that there's a way to get the positions of blanks which would be at least a little big helpful, but I didn't quite get how it works. Is there another method of getting a result faster?

Neph
  • 141
  • 2
  • 3
    First thing is that you should computer $X := A\oplus B$ once and store that - don't do it each time you change the crib! Personally, the first thing I'd check for would be $X \oplus "\ \ \ \ \ \ \ "$ – Cryptographeur Jan 14 '14 at 23:25
  • 2
    If you have "guess", you should also have " guess " (e.g., with the spaces around the word. That should also give you seven letters in the other ciphertext, which ideally doesn't fall perfectly on word boundaries; if it doesn't, you can try and guess at the remaining characters of the partial word revealed. – Stephen Touset Jan 14 '14 at 23:54
  • @figlesquidge There's not noticable delay in computing the programm and since I have to get X ^ crib anyway and I only use it for myself, I don't mind. ;) And thanks, I just tried that (didn't even think about it) and at least I now know where the spaces in one of the texts are (this is what it means, right?). – Neph Jan 15 '14 at 00:05
  • @StephenTouset I know that the word definately isn't "guessable" (I don't know about "outguess...") but yes, unfortunately it falls on word boundaries. The result is "g the m" (when I used " guess ", which gives me pretty much nothing. – Neph Jan 15 '14 at 00:12
  • 1
    That's not even close to nothing. "m" is highly likely to be followed by a lowercase vowel. That's only five plausible candidates On the other hand, a word ending in "g" is likely to really end in "ing", or one of only a few other digraphs. – Stephen Touset Jan 15 '14 at 01:12
  • Here's what I'd do. Find a corpus and generate a Markov model for the text. This will give you an approximately probability for the original plaintexts given any sequence of xor differences for how ever many characters of history you chose. It will happen that some xor differences are incredibly distinctive meaning you can guess the corresponding plaintext with high confidence. For example, you could probably find all the spaces and punctuation this way, at which point the rest should be easy. – Antimony Jan 15 '14 at 06:35
  • 2
    As pointed by figlesquidge, $X := A\oplus B$ is very useful; further, you demonstrably get no other clue from the ciphertext if the keystream is random. Also: in ASCII, space is 0x20, and uppercase letters are [0x41-0x5A]. It follows that if the plaintext only uses this, and this is a straight OTP using XOR, then a byte of $X$ has bit 5 (corresponding to 0x20 mask) set only if exactly one of the two plaintext character is a space. When (and only when) both plaintext characters are identical (including but not limited to space), $X$ is 0 (thus has bit 5 clear). – fgrieu Jan 15 '14 at 11:37
  • 2

0 Answers0