On randomness related terminology clarification

Question

It was suggested to provide a new question in the comments in
How to combine $n$ 'less random' bits to generate one 'more random' bit?.

Why there is no unbiased randomness (seeking theoretical underpinnings)?
How can a hash function extract randomness (seeking theoretical underpinnings)?
Is there intuition behind leftoverhash lemma and can it useful in correlated randomness (seeking theoretical underpinnings)?

Is there references for 1. and 2.?

@SAIPeregrinus Err, do we have a deterministic interpretation for quantum uncertainty and the Heisenberg effect? I hope not as I have a whole page on my site about how Zener diodes are truly random... — Paul Uszak, Apr 05 '21 at 15:11
@SAIPeregrinus Could you please specialize and explain 1. below completely? — Turbo, Apr 05 '21 at 15:57
@PaulUszak It is quite likely no one will dispute that quantum waves are truly probabilistic. How that continuous wave is resolved into an exact uniform distribution over discrete values, that might be open for discussion. Especially if that probability for a single bit is really exactly 0.5 regardless of measurement accuracy. — tylo, Apr 05 '21 at 16:07
Question Nr 1 is a bit like "why are there no unicorns? Prove they don't exist". The basic expectation should be: Everything is biased unless you prove otherwise - or you make it unbiased enough for your requirements. Truly, perfect unbiased discrete coins might not exist. Measurements can only give high confidence, they can never give 100% confidence. — tylo, Apr 05 '21 at 16:15
@Paul Uszak The answer is yes. Several consistent interpretations of quantum mechanics are fully deterministic. The de Broglie-Bohm interpretation and Many Worlds are the most famous examples. The Heisenberg uncertainty principle is related to uncertainty, not randomness. It's just the Fourier uncertainty principle applied to the quantum-mechanical wavefunction, and says nothing about randomness. It just says that it's impossible to know two Fourier-related values exactly at the same input. — SAI Peregrinus, Apr 05 '21 at 20:44

Paul Uszak · Answer 1 · 2021-04-05T16:21:12.940

Where would it come from, and how would you measure it? If we had $P(x_i = 0, x_i = 1) = 0.5 \pm 0.0$, that would be mathematical perfection given a large enough sample. We believe that ChaCha is a fairly decent cryptographic random number generator. Yet see the abridged ent output from a pass over a 2GB sample from /dev/urandom:-

 Arithmetic mean value of data bytes is 127.5018 (127.5 = random)

It's 0.0018 $(\approx 2^{-9})$ out from the expected value . That's because it's random. And that's expected too. And consider that we are considering infinitesimally small biases of the order of $2^{-64} - 2^{128}$. I use the 128 value in some of my designs. Bigger sample? And due to large data sets necessary (or new maths), we don't know the output biases of our current cryptographic primitives. If any, theoretically. I tried to find out, but no satisfactory answer was presented.

2.

Clearly hash functions can extract randomness as you've seen in your opening link. I've shown how to do it.

3.

There is lots of intuition, but unfortunately I have to leave that for the more mathematically capable. I just build this stuff. Others understand the basic theories. All I can do is highlight the lemma and it's usage.

It is predicated on min.entropy ($H_\infty$). And this is a measure allowing for auto-correlation in the raw input data. Since we build with it, we can set the final bias ($\epsilon$) ourselves. So it's possible to go silly with $\epsilon \approx 2^{-10,000}$ if you believe that hash functions are entirely unbiased (as per the prior linked question). But as you see, nevertheless it would still be biased. A little. Maybe.

Auto-correlation $(R)$ also has the same issues in measurement as does bias. It's a continuous statistical metric so again the user has to decide the degree before stating that it exists or doesn't. Typically $R \le 10^{-3}(physicists); p > 0.001 (NIST); p > 0.01 (RRR)$ but those are anecdotal thresholds which influence the bias measure and $H_\infty$.

Interesting (Daniel) this also directly affects von Neumann extraction. vN is predicated on a totally uncorrelated stream, otherwise auto-correlations can propagate to the output. By virtue of the above paragraph, we can't prove when $R = 0.0$ with sufficient certainty to celebrate $\epsilon_{vN} = 0.0$.

I'm not suggesting that IID data doesn't exist. It's just that proving it with 100% certainty, rather than say 95%/$3 \sigma$ is difficult. MY ent test from above had $R = 0.000009$. But not zero.

@1.. Err, I'm sorry but I don't understand that comment. Can you re-phrase please? — Paul Uszak, Apr 05 '21 at 14:46
What does $2GB$ have to do with unbiased randomness? How does 'clearly hash functions can extract randomness answer 2.? — Turbo, Apr 05 '21 at 15:44
Cannot unbiased random come from a von Neumann extractor? The only reason that I can think of that it couldn't would be if independent, identically distributed phenomena don't exist. I guess I can see that as a philosophical point (you can't cross the same river twice etc.), but I then have to throw away a lot of probability theory that I use in all sort of places. — Daniel S, Apr 05 '21 at 16:02
@DanielShiu That occurred to me whilst in the green house... — Paul Uszak, Apr 05 '21 at 16:05

On randomness related terminology clarification

1 Answers1

Linked