4

I am not exactly sure if this is for math stackexchange or crypto:

A TRNG outputs numbers in $[0,1]$ in a Gaussian distribution. I would like to convert them into uniform random bytes ($[0,255] $) to perform byte operations. What is s cryptographically secure method of doing this?

Here is an example distribution from my generator before normalized between $[0,1]$: enter image description here

Edit:

Output from my original methodology: Normalize values to to be within $[0,1]$, remove first and second decimal place via $x*100-floor(x*100)$, then put between value discrete values in $[0,255]$ via $floor(x*255)$. The resulting distribution is as follows: enter image description here

e-sushi
  • 17,891
  • 12
  • 83
  • 229
dylan7
  • 551
  • 4
  • 10
  • What are the parameters of the gaussian generator? What is the actual distribution of output bytes? – Thomas Jul 25 '15 at 10:51
  • @Thomas I cleared up my question. – dylan7 Aug 01 '15 at 17:29
  • A Gaussian distribution is defined from $-\infty$ to $+\infty$. The [0,1] range doesn't make any sense. – Chris Aug 01 '15 at 18:51
  • @Chris I will post a picture of it when i get a chance. It is Normal but it might be discrete values in $[0,1]$ i.e. 3 decimal floating. – dylan7 Aug 01 '15 at 18:56
  • A Normal distribution is continous and not discrete. I'm afraid you are mixing up a number of things here. – Chris Aug 01 '15 at 19:22
  • @Chris It's sampling, so I assume approximate normal is vaid? – dylan7 Aug 01 '15 at 19:25
  • Maybe you can somehow say this. However, to be precise, continous and discrete distributions are quite different from each other. In a continous distribution, you have infinitely many possible events, and each single event has a probability of 0. You have to integrate over a region to get a probability > 0. In a discrete probability distribution you have a finite number of events and each event has a probability > 0. – Chris Aug 01 '15 at 20:04
  • @Chris I will post a picture of it when I get a chance, but is there a cryptographically secure method of taking sampled values that are between 0 and 1 and form a distribution that is approximate normal and converting them to discrete $[0,255] $ bytes. The method in the answer below works for making them to bits. But I am performing byte operations. – dylan7 Aug 01 '15 at 20:42
  • @dylan7 A byte is just 8 bits. To get a uniformly distributed byte, take 8 uniformly distributed independent bits and put them together (as in, binary notation). – Thomas Aug 02 '15 at 20:21
  • You have changed the question so that most of my answer no longer applies... – otus Aug 03 '15 at 09:12
  • @Chris I posted a picture of the distribution and my attempted transformation above. – dylan7 Aug 12 '15 at 01:35
  • @dylan7: The technique I suggested is a scientifically accepted technique that will work for sure if your source provides indepedent true random values; Ilmari Karonen even provided a reference. Unfortunately you are 'answer resistant' and you want to stay with your wild approach that somehow seems to do what you want, but you don't know if it is secure or not (well, probably it is not). :) – Chris Aug 12 '15 at 07:17
  • @Chris When I use the technique suggested I get wild results, and I can't understand why. What I get is far from uniform. It is only uniform among the bits. When I group them into bytes(traverse the bits and convert every 8 bits into decimal) I get a wild distribution. That's the reason it's hard to accept when the technique above is producing uniform results. My source of the numbers is radio noise. I am using a low sampling rate, hence the "Normal looking " input curve. In addition, a low sampling rate should skip over any cycles within the wave, hence it should be pretty "random", correct? – dylan7 Aug 12 '15 at 10:25
  • @dylan7: As far as I can tell from the distribution, the samples coming from your generator look good. Now, for instance if you have a sample -0.3 followed by a sample -1.5 you say this is a 0 bit... When you have 8 bits $b_0,...,b_7\in{0,1}$ you can compute an integer $X=b_0+2b_1+4b_2+8b_3+...+128b_7$. If this doesn't work, there is either a problem with the randomness of you source (which is unlikely), or you have a bug in your computations (very likely). – Chris Aug 12 '15 at 11:13
  • @Chris Ok well at least the source looks good. I can't seem to find a bug, but I'll keep looking. Thank you – dylan7 Aug 12 '15 at 11:15

2 Answers2

3

There is no such thing like a Gaussian distribution over [0,1]; this doesn't make any sense. So it is not clear what you have to begin with.

However, if you have independent random values, you can generate a random bit by taking two values A and B and comparing them. E.g., if $A<B$ you set the bit to 0, otherwise you set the bit to 1. A sequence of 8 such bits is then a (uniformly distributed) random byte.

PS: As correctly mentionned by Ilmari Karonen, if you have a non-negligible probability to have A=B, you have to check for this and if it happens you have to discard A and B.

Chris
  • 989
  • 6
  • 15
  • As I said above in my commrnt to the prevoous answer, I tested that with a normal pseudo rng, and a histogram of the bytes produced a distribution far from uniform. Are the bits themselves only suppose to be uniformly distributed ? – dylan7 Aug 02 '15 at 14:46
  • 2
    Note that, for this method to really generate unbiased bits, you'll have to discard both $A$ and $B$ and repeat the process if $A=B$. In fact, such rejection sampling is unavoidable in general: for input distributions having less than 0.5 bits of entropy, there's no way to get an unbiased output bit without sometimes consuming more than two input samples. Also note that this scheme relies on the samples being independent; if subsequent samples may be linked (as they, inevitably, are for PRNGs; good PRNGs try to hide this dependence, more or less successfully), the output can easily be biased. – Ilmari Karonen Aug 02 '15 at 14:46
  • 1
    @dylan7: You may have just demonstrated that your pseudo-RNG isn't as random as you think it is. In fact, serial correlation between successive samples is a common flaw in popular LCRNGs. – Ilmari Karonen Aug 02 '15 at 14:56
  • @Ilmari Karonen Thank you. I will try it with my TRNG and see if there is a difference. So I was using the mean of the distribution for comparison. When I come to the mean I get $A=B $, when you say repeat the process do I choose another number to compare the mean to and then compare all subsequent values to that new number until $A=B $ again? – dylan7 Aug 02 '15 at 15:04
  • 1
    @dylan7: Ah, no. You don't need the mean for anything. What Chris is saying is that you should take two random values, and see which one is greater. Assuming that they're independent and identically distributed, and that they don't happen to be equal, that will give you one unbiased bit. Then take two more values and compare them to get another bit, and so on. – Ilmari Karonen Aug 02 '15 at 15:41
  • @Ilmari Karonen Ah, that makes sense now, so I could just compare adjacent values in my generated sample (discard both after comparison) and if two are equal find a third and compare both them to that third number and discard all three? – dylan7 Aug 02 '15 at 16:27
  • 2
    @dylan7: If the two values are equal, you'll need to discard them both, and get two new values. Basically, you're taking in two random values A and B (from any distribution) and returning either 1 (if A > B), 0 (if A < B) or no value (if A = B). Otherwise, you may end up with biased results. For reference, this is basically a variant of von Neumann whitening, extended to non-binary input distributions. – Ilmari Karonen Aug 02 '15 at 17:58
  • @Ilmari Karonen I posted the distribution from the generator in my original post. I also posted the result from my original method. I tried the method proposed in user Chris 's answer which did not result in a uniform distribution among the bytes. However, my method did. But with my method I tested it with Matlab's entropy() function which produced a very low number (.034...). However, this function in Matlab seems incorrect since it produces a higher entropy for a normal distribution.The method I proposed seems to be producing what appears to be close to a uniform distribution. – dylan7 Aug 05 '15 at 23:58
1

Depending on the parameters of the Gaussian, every $X_i$ byte will have some entropy < 8 bits. So you cannot produce cryptographically random bytes from each of them, unless you add some entropy from another source.

You can, however, turn them into smaller values. For example, if they have at least 1 bit of entropy, you can turn them into bits. Like if the distribution had a peak at 127.5, you could just map everything smaller than that to 0 and larger to 1. Since the transform is not an injection, it's non-invertible. The resulting output is uniformly random and independent.

Or you can use a secret key and a one way transform to produce an output byte stream, like the first byte of $H(K||X_i||i)$ for some hash function $H$. But the $X_i$ aren't really doing a whole lot in that case – you could be using just $H(K||i)$.

otus
  • 32,132
  • 5
  • 70
  • 165
  • Thank you. One thing I was able to do was put the values between 0 and 1 chop off the tenths place. And then convert then back to 0->255. This produce a uniform distribution, since it stops them from "cluttering around a mean value" it makes them more "independent (not in the probability sense)". Is this also cryptographically secure, do they still lose entropy ? – dylan7 Jul 25 '15 at 14:41
  • @dylan7, I'm not really sure what you mean... if you convert a byte into a floating point between 0-1 and chop something off, you will probably no longer be able to get every possible value in $[0,255]$ when you convert them back. That would definitely not be good enough for anything crypto. – otus Jul 25 '15 at 16:58
  • My fault I realized I wasn't clear enough. $X_i $ is converted to between 0 and 1 (floating point) then do $X_i_float * 100 - floor (X_i_float *100)$ then convert back to [0-255]. This changes the 10ths and 100ths. So I believe all values 0-255 are covered. – dylan7 Jul 25 '15 at 21:04
  • @dylan7, from a quick test in Python, less than half the numbers are possible outputs. Anyway, with any such transform you either have a bijection that only shuffles the probabilities, or you lose some codomain values due to collisions. You can't get a uniform distribution with a deterministic mapping $[0,255] \mapsto [0,255]$ if the original isn't uniform. – otus Jul 26 '15 at 05:52
  • When I did it in MATLAB I was able to get a uniform distribution. The values came in as 0 to 1 then the transformation was performed then converted them to 0->255 [discrete values ->flooring the result, so there were no floating points. So the whole thing is really $[0,1] \mapsto [0,255] $ . If it's not really a uniform distribution , because you said it can't happen, what else could be causing what I am seeing? Thank you again – dylan7 Jul 26 '15 at 15:31
  • @dylan7, well if you started from a continuous distribution in $[0,1]$ you may well have gotten something close to uniform out. I doubt it's really uniform, but if you replace 100 by $n$ and let it grow, it may approach uniform. – otus Jul 26 '15 at 16:00
  • But the technique you said about making the bytes into bits is much more cryptographically secure? – dylan7 Jul 26 '15 at 16:01
  • @dylan7, Yes, unless you reduce them to something $le$ the entropy you cannot have secure uniform random values. Whether that has to be bits depends on the original distribution. – otus Jul 26 '15 at 16:03
  • I'm sorry, what's the $le $ in your comment mean? – dylan7 Jul 27 '15 at 00:27
  • @dylan7, sorry, typo $\le$. – otus Jul 27 '15 at 05:37
  • When I tried your technique out: I tested it on a pseudo-random Gaussian set. I generated 1000 numbers with a N(0,1) distribution. Then I converted them between 0 and 1 [normalized then divided by max] and then to 0 to 255 discrete values. I then turned every value above the mean to 1 and below to 0. This resulted in a perfect uniform distribution. However, I needed to work with bytes, so I grouped them into octets and converted them back to 0 to 255. The resulting distribution was far from uniform. Was this suppose to happen? I need them to be bytes in the end. Thank you. – dylan7 Aug 01 '15 at 17:14
  • @dylan7, likely the initial values weren't independent. – otus Aug 03 '15 at 08:51