Enumerating the bias of NIST and Pearson randomness extractors

Question

The following is an extract from a paper about using your mobile phone as a true random number generator (TRNG). The mobile phone bit is incidental to this question though. I'm interested in the calculated bias of the $M \times r$ matrix extraction mechanism. Specifically equation 5 at the bottom.

A. Does a similar bias equation exist for a cryptographic hash based extractor such as SHA-X? NIST advice simply states that if you extract less than half of the input entropy, you'll be fine. If you plug in values for SHA-1 with a 320 bits of entropy input you get $\epsilon = 2^{-80}$. That's below the NIST (DRAFT Special Publication 800-90B, August 2012) bias threshold of $2^{-64}$ for declaring a sequence as having 'full entropy'. I suspect that it is exactly the same equation, and that it's generic to all extractors.

B. Could one derive another similar bias equation for using an 8 bit Pearson hash as an extractor? Using Pearson with it's random permutation is not a million miles away from a vector matrix. Yes it's got that XORy thing in it, but otherwise the transmission of bias principle must still exist, and equally similarly must depend on the length of the input entropy.

Part A takes precedence.

It's a bit strange question; you would not be able to extract entropy bits from a secure one-way-hash function. That's why PRNG's are predominantly created from hash functions and there are functions like SHAKE defined for Keccak / SHA-3. — Maarten Bodewes, Oct 08 '17 at 12:48
Hmm, maybe that's because I don't completely get the question. — Maarten Bodewes, Oct 08 '17 at 19:55

Squeamish Ossifrage · Answer 1 · 2019-05-24T21:27:46.380

If $M$ is a uniform random $l \times k$ matrix, then the random function $H\colon x \mapsto M \cdot x$ is a 2-universal hash family[1] (paywall-free: (a), (b)). This means that for any $l$-bit strings $x$ and $y$, $\Pr[H(x) = H(y)] \leq 1/2^k$.

If $H\colon \{0,1\}^l \to \{0,1\}^k$ is a 2-universal hash family and the random variable $X \in \{0,1\}^l$ has min-entropy $H_\infty[X] = s\cdot l$, then $H(X)$ has total variation distance at most $\epsilon$ from the uniform distribution on $k$ bits, where $\epsilon = 2^{-(s l - k)/2}$. This is the leftover hash lemma. (Sometimes it is phrased instead as: if you want TVD $\epsilon$ from uniform, then you must pick $k \leq s\cdot l - \log(1/\epsilon^2)$.)

The total variation distance between two discrete random variables $R$ and $S$ is $$\frac 1 2 \sum_x |\Pr[R = x] - \Pr[S = x]|.$$

What does this all mean?

We are considering a probabilistic model. This is a model of an uncertain state of knowledge, such as the adversary's, about the specific values of $H$ and $X$ in this model, but we can quantify the uncertainty of this knowledge by giving numeric weights to each of the possible values:

$H$ has a probability distribution on $l \cdot k$ matrices that is uniform.
- This means for any specific value $h$ that $H$ could take on, $\Pr[H = h] = 1/(l \cdot k)$, since there are $l \cdot k$ distinct $l \times k$ matrices.
  - One possible matrix is the matrix that returns the first $k$ bits of an $l$-bit string. For this particular choice of matrix, the min-entropy of the result depends entirely on the first $k$ bits of the input and not at all on the last $l - k$ bits.
  - Another possible matrix is the all-zero matrix, in which case $H(x) = 0$ for all $x$. For this particular choice of matrix, the min-entropy of the result is always exactly zero! But it is only with probability $1/2^{l \cdot k}$ that $H$ is the all-zero matrix.
  - The point is that no particular matrix makes a randomness extractor; it is rather a probability distribution on matrices that does.
- You could pick a value for $H$ by flipping a coin $l \cdot k$ times and filling in the matrix entries with the outcomes.
- $H$ is not uniformly distributed in all functions from $l$-bit strings to $k$-bit strings: there are many functions that $H$ cannot be, namely those that do not have matrix representations. For example, $H$ cannot be SHAKE128.
$X$ has a probability distribution that has min-entropy $s \cdot l$.
- This means that the maximum probability of any specific value that $X$ could be is $2^{-s\cdot l}$; that is, $-\max_x \log_2 \Pr[X = x] = s \cdot l$.

Given this state of knowledge about $H$ and $X$, the two theorems let us conclude that our state of knowledge about the output $H(X)$, a $k$-bit string, is a probability distribution with at most $\epsilon$ in total variation distance from the uniform distribution. That is, $$\frac{1}{2}\sum_y \bigl|\Pr[H(X) = y] - 1/2^k\bigr| \leq 2^{-(s l - k)/2}.$$

The total variation distance between two random variables $R$ and $S$ is relevant to usual theorems in cryptography because it turns out to be a bound on the advantage of a random algorithm $A$ at distinguishing them, $|\Pr[A(R)] - \Pr[A(S)]|$. (There are other metrics of probability distributions too, such as KL divergence (or symmetrized KL divergence to get a proper metric), Hellinger distance, etc. Which one you use depends on what your needs are.) So this almost says something useful for cryptography.

But what if the adversary knows $H$? Then the conclusion goes out the window! In the paper you cited, the matrix $M$ is a pregenerated constant. So either,

by Kerckhoffs' principle, you should assume that the adversary knows $M$, i.e. $M$ is public, so it cannot satisfy the conditions of the leftover hash lemma and it is meaningless to use with a randomness extractor; or
you already have a sufficiently large uniform random secret, namely $M$, that you can use with a standard cryptography like ChaCha or SHA-3 to expand into as many keys as you need; or
you don't believe in cryptography, but then you probably wouldn't be hanging around here.

In other words, as this paper applies randomness extractors, either it fails to satisfy the premises of the theorem ($H$ is secret), or it does satisfy the premises of the theorem which makes the conclusion unnecessary for cryptography.

On to the specific questions you asked:

A: What if we choose $H$ to be a fixed function, yet instead of choosing a fixed matrix we choose a fixed ‘cryptographic hash function’ like SHAKE128? It is hard to say, because unlike universal hash families, cryptographic hash functions are designed not to have any interesting properties like linearity of matrix multiplication.

Suppose we model it as a uniform random function from $l$-bit strings to $k$-bit strings. For every distinct input $x$, the output $H(x)$ is an independent uniform random $k$-bit string—every possible output is equiprobable. This model is much stronger than a randomness extractor in some sense even though the adversary has access to it as a public oracle, and this model, like all models, is wrong, but some models are useful.

Suppose $X$ has $s \cdot l$ bits of min-entropy. What is the expected min-entropy of $H(X)$, over uniform random choice of $H$? To make a conservative estimate of this, let's say $X$ is really just a uniform random $\lambda = \lfloor s \cdot l \rfloor$-bit string. For any fixed function $h$, the min-entropy of $h(X)$ is $-\max_y \log_2 \Pr[h(X) = y]$. Note that $\Pr[h(X) = y] \leq (C(h) + 1)/2^k$, where $C(h)$ is the number of possible $k$-bit strings that $h$ does not reach, since in the worst case of min-entropy, there is a single output that is reached $C(h) + 1$ times instead. Consequently, $$-\max_y \log_2 \Pr[h(X) = y] \geq -\log_2 (C(h) + 1)/2^k = k - \log_2 (C(h) + 1).$$ $\log_2$ is concave, so $E[\log_2 (C(H) + 1)] \leq \log_2 (E[C(H)] + 1)$. What's $E[C(H)]$? First, the probability that we do not reach a particular output $y$ is

\begin{align*} \Pr[\lnot\exists x. H(x) = y] &= \Pr[\forall x. H(x) \ne y] \\ &= \prod_x \Pr[H(x) \ne y] \\ &= \prod_x (1 - \Pr[H(x) = y]) \\ &= \prod_x (1 - 1/2^k) \\ &= (1 - 1/2^k)^{2^\lambda}. \end{align*}

By linearity of expectations, this is also the expected fraction of unreachable outputs, so the expected number of unreachable outputs is $E[C(H)] = 2^k (1 - 1/2^k)^{2^\lambda}$. When $\lambda \leq k$ this puts an unsatisfying bound on the min-entropy, which happens because to set a hard bound we made an extremely conservative estimate of a colossal collision of all inputs with any collisions into a single output that feels very abused right now.

But $E[C(H)]$ rapidly goes to zero as $\lambda$ increases. Specifically, for $k \geq 1$, we have $1 - 1/2^k \leq e^{-1/2^k}$, so $$E[C(H)] \leq 2^k e^{-2^\lambda/2^k} = 2^k e^{-2^{\lambda - k}} \leq 2^{k - 2^{\lambda - k}}.$$ Consequently, as long as $\lambda \geq k + \log_2 k$, $E[C(H)] \leq 1$, so the min-entropy is at least $k - \log_2 (E[C(H)] + 1) \geq k - 1$.

For a distribution on $k$-bit strings with min-entropy $k - \delta$, where $\delta = \log_2 (E[C(H)] + 1)$, can we set an upper bound on the total variation distance from uniform? Suppose for simplicity that $N = 2^{k - \delta}$ is an integer; the greatest TVD is attained by assigning probability $1/N$ to $N$ of the $k$-bit strings, and probability $0$ to the remaining $2^k - N$, so that the TVD is bounded by

\begin{multline} \varepsilon = \frac 1 2 N \biggl|\frac{1}N - \frac{1}{2^k}\biggr| + \frac 1 2 (2^k - N) \biggl|0 - \frac{1}{2^k}\biggr| \\ = \frac 1 2 N \frac{2^k - N}{N 2^k} + \frac 1 2 \cdot \frac{2^k - N}{2^k} = \frac 1 2 \cdot \frac{2^k - N}{2^k} + \frac 1 2 \cdot \frac{2^k - N}{2^k} \\ = \frac{2^k - N}{2^k} = \frac{2^k - 2^{k - \delta}}{2^k} = 1 - 2^{-\delta}. \end{multline}

For $\delta = 1$, the the TVD is bounded by $\varepsilon = 1 - 2^{-1} = 1/2$, but the bound $\varepsilon = 1 - 2^{-\delta}$ rapidly approaches zero as $\delta \to 0$.

A more practical way to put it is: The security conjecture of, e.g., SHAKE128 is that there is no better attack at guessing $X$ given $\operatorname{SHAKE128-}\!k(X)$ than a generic search through all possible values of $X$, whose expected cost is at least $2^{H_\infty[X]}/2$ trials, or $2^k/2$, or $2^{256}/2$, whichever is smaller.

(Of course, if there are $n$ targets $X_1, X_2, \dots, X_n$, the cost to find at least one of them given $\operatorname{SHAKE128-}\!k(X_1),$ $\operatorname{SHAKE128-}\!k(X_2),$ $\dots,$ $\operatorname{SHAKE128-}\!k(X_n)$ is $2^{\min\{256, H_\infty[X_1], \dots\}}/(2n)$ instead, i.e. there is a standard factor of $n$ cost reduction for an $n$-way multi-target attack. This is why it is a good idea either to use 256-bit seeds $X_i$, or to use a globally unique per-seed salt.)
B: What if we choose $H$ to be a Pearson hash, i.e. CBC mode of a uniform random permutation? By the standard CBC theorem, algorithm limited to $q$ queries on inputs up to $m$ $b$-bit blocks can't attain better advantage than $mq(mq - 1)/2^{b+1}$ at distinguishing $H$ from a uniform random function, and any algorithm to distinguish the output from uniform random bit strings would serve as an algorithm to distinguish the hash from a uniform random function, so the ‘bias’ (here meaning distinguishing advantage) of the resulting distribution is bounded by $mq(mq - 1)/2^{b + 1}$.

Of course, you need at least about $2^b b - 2^b \log_2 e$ bits to store such a permutation, which rapidly becomes impractical as $b$ exceeds 8, the typical size for what gets called a Pearson hash. You could accommodate a much larger block $b$ by choosing the permutation to be $\operatorname{AES}_k$ for a uniform random key $k$. But that's only a minor setback. Much worse: The whole premise is nonsensical, because however you do this, you need a secret permutation to begin with, which is back to begging the question with a randomness extractor: If you already had a long enough secret, you wouldn't need a randomness extractor or a Pearson hash; if you don't have a long enough secret, you can't use a randomness extractor or a Pearson hash and get anything meaningful out of the use.

Enumerating the bias of NIST and Pearson randomness extractors

1 Answers1

Linked