1

Say you have a file that is not random, and you XOR every bit with a random bit (not pseudo, but really random). Can someone who sees only the result extract any information from it? Obviously, it won't be 100% accurate, but I imagine you can do some sort of stochastics and get a vague idea. If yes, how? If no, is there a mathematical proof?

e-sushi
  • 17,891
  • 12
  • 83
  • 229
not sure
  • 11
  • 1

2 Answers2

4

This cipher is called a one-time pad. It is unbreakable ("perfect secrecy") assuming that:

  1. The pad (the collection of random bits) really is truly random
  2. The pad is never reused to encrypt other messages

So, no information can be extracted from $\text{file} \oplus \text{random bits}$.

The basic idea of the proof is that an attacker can test every possible key, but they have no way of knowing which plaintext is actually correct. If I encrypt "attack" with a one-time pad, then any six-character string could just as equally have been encrypted in the first place.

Reid
  • 6,829
  • 1
  • 39
  • 57
-2

If the file has been crafted deliberately to survive this form of damage then yes you should be able to recover your data.

There are many quite simple methods from adding CRCs to replicating the data multiple times.

There are other possible routes to recovery. If for example the file was an ASCII text file then it may be possible to recover something close to the original data by reasoning and dictionary work.

  • The "damage" done by XOR-ing with a truly random source (any bit has 50% independent chance of being 0 or 1) is too much for any recovery scheme. No amount of statistical analysis or combining of known repeated elements will give you a better than 50% guesswork on any individual bit, and a correct guess at any bit value gives you no advantage on guessing any other bit value. Any CRCs would be equally mangled and not recoverable. – Neil Slater Jun 25 '15 at 11:32
  • If you changed the source to have some bias such as $p(0)=0.4, p(1)=0.6$, then enough repetition or suitably robust error correction codes could in theory work. – Neil Slater Jun 25 '15 at 11:36