4

Is it possible to derive a secure cryptographic key from an array of sorted bytes, assuming the bytes themselves were generated in a secure manner (say, from quantum phenomenon)?

What would be the best approach to this issue?

Jakub
  • 51
  • 2
  • I presume duplicate bytes values are allowed? Otherwise after 256 bytes there is zero entropy left. I'm trying to come up with a calculation of how much entropy there is for a specific amoutn of bytes, but failing. – Maarten Bodewes Oct 25 '21 at 08:25
  • Yes, duplicate values are allowed. Thanks – Jakub Oct 25 '21 at 08:29
  • Alright. After we find out how to get to 128 bits of entropy with ordered bytes, you can just use a KDF to derive a key. Some ciphers may also take a key of any size & form, so you could even use the bytes directly in case you have enough of them (but I would probably default on a KDF just to be sure). – Maarten Bodewes Oct 25 '21 at 08:42
  • This is about probability problem, given 16 boxes places number from 0-to-255 increasing order and repetitions are allowed. Probability of Increasing Sequence – kelalaka Oct 25 '21 at 08:47
  • 1
    Wondering the source of the problem. What is the source, why do you get increasing random sequences? – kelalaka Oct 25 '21 at 09:01
  • To have a precise answer you need to define the source of the bytes. Do you have a source that procudes increasing sequence of bytes or the source produces you random bytes then you are sorting them. – kelalaka Oct 25 '21 at 10:37

1 Answers1

4

Assuming that you know that your source is producing IID bytes, what you have in this case is a sample from a multinomial distribution with $k=256$. If you have a good idea of the probability of each byte (e.g. 1/256 if they are equiprobable), then you can compute the entropy in the distribution which will grow with $n$ (the size of the array) as suggested in the comments. The formula for the entropy is given in the Wikipedia article.

However, the Shannon entropy could still hide individual probabilities that occur too often for good cryptographic key. Instead you should make sure that the min entropy $H_\infty$ is somewhat greater than the required key size. For equiprobable bytes and for $256|n$, this will be $$-\log \left(\frac{n!}{\left(\frac n{256}\right)!^{256}}256^{-n}\right) .$$

Again, this will grow with $n$. Once you have enough min entropy to feel comfortable, just take the byte counts and feed them into your key derivation function of choice.

ETA: @fgrieu asks for a min-entropy formula for more general values of $n$. The following is more cumbersome, but I think that it correctly captures the modal value of the multinomial. For $n=256d+r$ with $0\le r<256$ the formula is $$-\log \left(\frac{n!}{(d!)^{256-r}((d+1)!)^r}256^{-n}\right) .$$

Daniel S
  • 23,716
  • 1
  • 29
  • 67
  • 2
    These are "sorted bytes", hence very much NOT I.I.D. Thus the also can't be equiprobable. So simple Shannon doesn't apply here. – Paul Uszak Oct 25 '21 at 10:12
  • @paul Have I misunderstood? I took the question to mean that the bytes were generated in an IID manner and then sorted. – Daniel S Oct 25 '21 at 10:15
  • 1
    Yes you did.. The array is sorted and the question is not totally answerable. And I'm not sure that this is pure curiosity from the OP. If a newbie, always take care that it might be a hw. – kelalaka Oct 25 '21 at 10:38
  • 2
    Thank you for the answer. The reason I was asking was that I have access to the quantum machine I was hoping to use to generate encryption keys, but the results I received from repeated shots contain results sorted, not chronographically. – Jakub Oct 25 '21 at 14:23
  • If the result are sorted, how they can be random? – kelalaka Oct 25 '21 at 14:43
  • 2
    As @DanielS observed, the randomness does not come from the numbers themselves but from how often they occur. At least that's how I understood his response. – Jakub Oct 25 '21 at 16:04
  • @fgrieu: Thanks for the eagle eyes: I omitted a factorial which obviously makes a significant difference! Note that the formula only holds for $256|n$. It is the log of the modal probability in the multinomial distribution. – Daniel S Oct 26 '21 at 12:30
  • @Daniel: I agree with the new formula for $n=0$ and $n=256$ at least (for $0$ and $364.004$ bit of min-entropy). A formula for intermediary values would be nice. – fgrieu Oct 26 '21 at 12:48