5

I've got this neural network running like a champ on my machine. The data is concatenated works of shakespeare in a plaintext file. RNN trains on the data, then gives you a 'sample', aka it tries to freestyle its own shakespeare based on what it has learned.

So now I've encrypted the dataset with a standard mid ROT cipher, and gave that to the NN to train on. Which it did, and it returned a result which was ROT ciphered. Decrypting that yielded valid results. As it should have, training on data that kept all its 'context', without randomness, unpredictability, or 'fuzz'. Not sure if those are the exact words, but you get it.

I'm wondering if there are any other ciphers, or really any other algorithms like compression schemes or what have you, that may yield the same results as my previous experiment? That training on its encrypted data would likely yield a sample that could then be decrypted accordingly? Or would most other stuff be 'lossy' in some way?

Ella Rose
  • 19,603
  • 6
  • 53
  • 101
Bango
  • 153
  • 3
  • I have a hunch that diffusion will make or break your networks ability to do this. I suspect that you are looking for ciphers that do not possess diffusion. – Ella Rose Mar 14 '17 at 16:28
  • It does seem reasonable to assume that any algorithm that utilizes diffusion would not work for my use-case. Can you provide example(s) of potential protocols, other than pure simple substitution ciphers, which could potentially work in the manner I've described? – Bango Mar 14 '17 at 17:04
  • Any classical cipher that is closer to an "encoding" then "encryption" should work. Modern encryption algorithms have known to incorporate diffusion since the time of Shannon's Communication Theory of Secrecy Systems, so you are not likely to find any modern designs that will exhibit this effect. As fgrieu mentions in their answer, any simple substitution cipher should exhibit this effect. It might work with simple transposition ciphers as well. Also, I removed the homomorphic-encryption tag; it did not appear relevant to your Q. – Ella Rose Mar 14 '17 at 18:08
  • I thought homomorphic encryption was completely relevant, based on what I read here. "A homomorphic encryption scheme is a crypto system that allows computations to be performed on data without decrypting it. A homomorphically encrypted search engine, for instance, could take in encrypted search terms and compare them with an encrypted index of the web." Computations, and arguable "meaning" is derived from the data without decrypting it. That's what I want to be able to do in this case. I'll check out transposition! – Bango Mar 14 '17 at 19:27
  • 1
    In principle, (fully) homomorphic encryption is what you are looking for. Then you can encrypt your plaintext (Shakespeare in your case), define operations on the ciphertexts (as circuits) and evaluate them (in the best case, multiple times). At the end you decrypt and get your result. Whoever runs the RNN (as circuits) may use circuit privacy to hide no only the input but also the computation. However, all of this works in theory, but if is not really feasible for "normal" usage yet, it will be way too slow. Maybe in some years. – CAR Mar 14 '17 at 23:35
  • Great stuff. You're saying fully homomorphic encryption protocols are exactly what I'm looking for, and from what read about homomorphic encryption, I can get on board with that. You also see alot of the same relevance in it as I do, however I'm curious about your last comment, that its not feasable for normal usage yet. Why not? AFAIK the only added complexity would be encrypting the dataset beforehand, and then decrypting the sampled result, both of which I am already doing with the ROT cipher. How is FHE any different? – Bango Mar 15 '17 at 00:08
  • It looks like this is already well-traveled territory. I would still love to hear about any more homomorphic algorithms. I can't seem to find any practical, or premade examples online which I could use, other than substitution cipher? – Bango Mar 16 '17 at 23:12

1 Answers1

1

Any letter (simple) substitution cipher will do (it might be necessary that it special-cases any character that your text-generating software special-cases, like space, tab, linefeed, perhaps punctuation).

ROT-n is a particular case of substitution cipher, with a much smaller key set (if we restrict to uppercase letters, ROT-n has $26<2^5$ keys, versus $26!>2^{88}$ keys for a general substitution cipher; that less than 5 bits versus more than 88).

Depending on what the text-generating software does, it might also work to substitute words, in addition to letters. That further increases the key space.

Substitution ciphers are hopelessly insecure, thus it is unclear what this achieves, or what is attempted.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
  • 1
    It wasn't a matter of security originally, so much as obscurity. I posted a couple of questions over in Law also. I was worried about the legality of open-sourcing the RNN, if it were to use a dataset which is full of copyrighted intellectual property instead of Shakespeare. I suggested that a Caesar Cipher to obfuscate the data with a ROT or modulo shift could potentially circumvent copyright issues -- since the NN would effectively be training itself on "non-copyrighted data." I was wrong. If it can't be shared free and clear, it can't be shared encrypted or obfuscated. – Bango Mar 14 '17 at 15:15
  • Then it just became a personal thought experiment: Which encryption schemes are able to be "learned on anyway," so to speak. I too am struggling to find much relevance or application, though it kind of amazed me when I put it into practice for the first time. – Bango Mar 14 '17 at 15:22