Unit testing a library RNG and its seeding for practical security

Question

Suppose I have a cryptography library that provides me with a black-box random number generator that is supposed to be suitable for long-lived key generation. (Under the hood it may be assumed to use some combination of /dev/random or equivalent, and stretching with a CSPRNG.)

I would like to write a unit test that can detect implementation errors that would result in practical failures of security. For instance, it should reliably detect the infamous CVE-2008-0166 (Debian OpenSSL weak key generation) if applied to a version of OpenSSL with that bug. Note that statistical tests as discussed e.g. in this earlier question will not do the job by themselves, because of the CSPRNG in between me and the guts of the black box (the CSPRNG itself has already been checked by use of test vectors).

How might this be accomplished?

I'm not a Unix specialist, but isn't /dev/random itself a black-box random number generator that is supposed to be suitable for long-lived key generation, without any extra library? Is the library supposed to compensate for some weakness of some implementations of /dev/random, which one, and using what principle? Or is the objective of the library simply to limit the entropy gathered from /dev/random, for performance reasons? — fgrieu, Mar 21 '12 at 08:08
My impression is that the main goal of the wrapper is to hide portability differences among various popular OS's secure-RNG APIs. Limiting the amount of entropy consumed by any given process may also have been a goal, I'm not sure. — zwol, Mar 21 '12 at 14:49
Might be slightly off-topic in this question but I read somewhere that the output of /dev/random isn't cryptographically random (i.e. not indistinguishable from a true RNG). — recluze, Mar 27 '12 at 03:22
@recluze Whoever told you that was wrong. All the OSes that have that run the output through a CSPRNG before revealing it, so it's no worse than a CSPRNG even when the entropy pool is totally exhausted. (If there's an OS that happens to use a broken CSPRNG, I'm not aware of it.) — zwol, Mar 27 '12 at 03:41
@Zack Yes, after a bit of more study, I agree with you. Thanks for the clarification. — recluze, Mar 28 '12 at 04:37

score 3 · Accepted Answer · answered Mar 28 '12 at 06:36

You're not going to like this answer, but strictly speaking, you can't.

In a number of specific cases you can do what you want. For example, in the specific case of the Debian PRNG debacle, you could detect that (and as a matter of fact, bad keys generated by this PRNG failure were tracked down). But that's because the RNG was not seeded much at all and ended up with some small number of possible states. (I'm saying RNG not CSPRNG because I'm not considering any cryptographically insecure pseudo-random number generators.)

But if the output stage of your PRNG comes from a suitable pseudo-random function, it will pass all your tests. AES in counter mode, an iterated hash function, and so on will all be just fine -- these outputs are themselves pseudo-random functions and therefore pass all your pseudo-random tests. It will be cryptographically secure from a mathematical reference point. If it is insecure, it will be insecure because of an operational failure, like the Debian problem, which was operational, not mathematical.

You are only going to find practical failures of a PRNG by white-box testing. You will have to have a reasonably smart crypto person to code-review it. You will have to look inside it to see if it is constructed correctly.

Consider this as a badly-built PRNG: imagine a chip with a PRNG that is seeded with a completely random 64-bit number to which is added the chip's serial number, and then a high-quality output stage (Yarrow, SP 800-90 DRBG, etc.) generates random bits from that seed from then on.

In other words, for the entire family of chips, there's a single 64-bit chunk of entropy E, and each chip has put into it E+1, E+2, ... E+N for the N chips we manufacture. Call this number E_n.

We'll presume that all re-seeding is broken and has no effect. You very likely won't be able to tell this from testing the output only.

You might in some cases -- for example, let's suppose that the output is a naïve iterated hash. You might be able to detect that the substring R[i..i+31] is equal to SHA256(R[i-32..i-1]). But be lucky to detect that, because you'd have to explicitly check for it.

Similarly, testing its output can tell you if the output function isn't a PRF, but not a lot more than that, and these days the errors aren't going to be because of the output function. Even dummies are going to get that right; look at the Debian debacle again. The problem was seeding, not pseudo-randomness.

You wouldn't find it if the bogus chip hashed in E_n along with each iterated hash, or if there was an HMAC. You wouldn't find it if there was a simple counter hashed in, either. You almost certainly wouldn't detect anything amiss if the PRNG used a cipher in counter mode with E_n as the key, either. (There are plenty of edge cases where you would -- for example, if you used DES in counter mode and happened to pick a weak key, you might detect that. Bear with me for the general case here.)

This completely bogus random number chip has only 64 bits of good randomness in the entire chip production, and yet no black box testing is going to detect that. You're only going to find the problem by auditing its design and implementation, not by testing its output.

I told you you weren't going to like my answer, but there it is. Black box testing will only detect a small class of stupid errors, and none of the evil ones.

Jon

B-Con · Answer 2 · 2012-03-26T18:32:46.320

The CSPRNG output should be indistinguishable from a random distribution over $\{0,1\}^n$, where $n$ is the output length. The CSPRNG is responsible for ensuring that this occurs given its input domain of $\{0,1\}^m$, where $m$ is the length of the CSPRNG input. (Note that $n$ and $m$ need not be fixed for a given CSPRNG.)

The individual strings output by a CSPRNG should not be "reversible", aka, leak information about the input. The summary there is that, regardless of the input domain, the output domain should still "look random". This means that statistical tests on the individual output strings or on groups of output strings are useless, as you noted. The CSPRNG should always pass all of them, so long as it is executing properly.

If the CSPRNG output is detectably weak (assuming that obtaining the original RNG input itself is excluded) then it should violate one of the above assumptions (some of them were subtle). Here are the key ones:

The CSPRNG is implemented correctly. - Here, KAT unit tests could ensure the implementation is valid. Seemingly innocent changes can sometimes cause other code to fail, so this is worth testing. (This can also help verify the sub-assumption that the CSPRNG is in fact deterministic.)
The CSPRNG input domain is well-supplied. - Failure here limits the output domain. Unfortunately, the best way to test this, without access to the input, is to sample the outputs you generate and check for statistically unlikely collisions. With an output length of 32 bits, you should expect to see the first collision in about $2^{16}$ outputs. With an output length of 128 bits, you should be able to generate $2^{32}$ outputs with negligible collision. Significantly more collisions in a sample than expected suggests a limited input/output domain. How many samples you collect depends on how many time/space resources you can let the unit test use. You can toy around with what output lengths and sample sizes you want to use, so long as you can calculate what sort of collisions to expect.
The CSPRNG output is properly used. - Not necessarily beyond the scope of your unit tests. Make sure that whatever APIs call into the CSPRNG know how to do it right. Ie, don't just write and test the CSPRNG, write and test the wrappers that actually generate the keys.
The CSPRNG implementation does not leak internal information. - Obviously, we assume that the CSPRNG does not give away the information we feed into it. While seemingly mundane, it is a valid concern. I'm not sure how to properly encapsulate this in a unit test, but you might want to check for memory leaks or memory that is not overwritten after use. Simply leaving critical key-related data in memory is a valid vulnerability.

The second point is the most likely one, IMO, to go wrong in practice. While you can take a statistical sampling approach for the CSPRNG, this check is best suited to be put before the CSPRNG, rather than after. If possible, I would put the unit tests for that on the input to the CSPRNG, rather than the output. That is not only easier, but gives you the option of testing a lot more than just collisions.

Does any one know if people actually do run collision tests to identify badly seeded CSPRNGs? And in particular, anything I can cite? — Jeffrey Goldberg, Dec 16 '13 at 22:47
@JeffreyGoldberg: FIPS-140 requires RNGs to undergo "continuous output tests", generally meaning that an output block is compared to the previous output block and an error is raised if they are identical. This requirement may be applied to the entropy-gathering function itself that seeds an RNG, depending on how the entropy-gathering function is defined and constructed. (In my case the entropy-gathering process was deemed an RNG itself and required to undergo the continuity test.) — B-Con, Dec 26 '13 at 21:28

Unit testing a library RNG and its seeding for practical security

2 Answers2