14

There are a lot of quite elaborate PRNG's out there (e.g. Mersenne Twister et.al.), and they have some important properties, especially when it comes to crypto applications.

So, I was wondering how hash functions like SHA1 or MD5 would perform in such a scenario / compare to actual PRNG's. For instance, one could use the a1 = hash(seed) to generate the first batch of random bits and then a2 = hash(seed + a1) for the next series and so forth.

e-sushi
  • 17,891
  • 12
  • 83
  • 229
bitmask
  • 283
  • 1
  • 9

4 Answers4

11

Mersenne Twister (as an example) is a fast random number generator, and has good properties (long period, good distribution) for most applications of random numbers (like statistics, simulations, modelling).

For cryptographic applications, we need more: We need a cryptographically secure random number generator. Such a function (implemented by an algorithm) has the property that an attacker has no means of to distinguishing the output from a pure random bit sequence, if he does not know the seed.

Such CSPRNGs (cryptographically secure random number generators) can be build from several other cryptographic primitives:

  • cryptographic hash functions
  • block ciphers
  • stream ciphers

All these primitives effectively need the properties of a "normal" PRNG, thus they are usable as such.

On the other hand, the algorithms implementing such primitives are normally a lot slower than "normal" PRNG - thus don't use iterated SHA-1 when Mersenne Twister is enough for your needs.

Paŭlo Ebermann
  • 22,656
  • 7
  • 79
  • 117
  • 1
    Actually, I don't think they're that much slower. I think claims about "a low slower", in this context, need to be quantified with hard numbers. A well-designed cryptographic PRNG can be mighty fast. – D.W. Aug 04 '11 at 06:01
  • You are right ... I should make some tests. For first numbers, have a look at Thomas' answer. – Paŭlo Ebermann Aug 04 '11 at 14:37
  • I'd say that Thomas's numbers appear to cast doubt on your claim. He says non-cryptographic PRNGs run at > 1 GB/s, where as a cryptographic PRNG runs at either about 2 GB/s (if you have a modern x86 processor) or about 0.75 GB/s (if you don't). Either way, it doesn't sound like the cryptographic PRNGs are "a lot slower". – D.W. Aug 04 '11 at 20:35
  • They certainly don't impose the memory overhead of a Mersenne Twister. – Marsh Ray Aug 06 '11 at 08:00
11

It is possible to design a PRNG upon a hash function, but it requires some care, notably because existing hash functions are not random oracles (being collision-resistant and preimage-resistant is not all that can be dreamed of for a hash function).

NIST Special Publication SP 800-90 describes some PRNG designs which are "Approved" (in the bureaucratic sense) for cryptographic purposes. Hash_DRBG and HMAC_DRBG are based upon a hash function (within HMAC for the latter). If you use a hash function with an output of n bits, then Hash_DRBG will require, asymptotically, one hash function invocation (over a small input) per n bits of produced alea (for HMAC_DRBG, this will be two such invocations). This means that on a basic PC, using SHA-256, you will be able to produce, say, about 60 to 70 megabytes of alea per second, using a single core; I am talking about an Intel x86 Core2, with no fancy programming -- one should be able to double that bandwidth with SSE2 instructions (for HMAC_DRBG, divide performance by 2). Depending on your application, this speed can be total overkill, grossly inefficient, or anything in between.

Hash functions are (usually) very good at processing much input data, for which they yield a small output. This is exactly the opposite of what we want for a fast PRNG, which is why performance of a hash-based PRNG may be somewhat low. Some hash functions are designed on a "reversible" core which can accept input data and produce output very efficiently; these are designs which can be used as hash functions or stream ciphers. PANAMA is such a function (very fast, even faster than MD4 as a hash function; unfortunately, it turned out to be very broken too). A more recent reversible design is Skein, a candidate for SHA-3; other hash function designs are amenable to conversion to a stream cipher (e.g. all so-called Sponge functions). Caution should be exercised: hash functions and stream ciphers are not analyzed with the same techniques or goals; that a reversible function looks secure as a hash function does not mean that the corresponding stream cipher is secure, or vice versa. In particular, the SHA-3 process tells very little about use of Skein as anything else than a hash function.

For faster cryptographically secure PRNG, look up stream ciphers, in particular those selected by the eSTREAM project. A good, secure stream cipher should be able to output, say, 750 MB/s worth of alea on a basic PC (that's what I do on my 2.4 GHz Core2, there again with a single core, using SOSEMANUK).

Non-cryptographic RNG can be devilishly fast (more than 1 GB/s), albeit they do so by having detectable biases which may or may not be an issue for any given application. A sure sign of a PRNG not being cryptographically secure is any assertion about how large the "period" is. For cryptography, the period is mostly irrelevant (anything beyond 2128 is good enough); a long period says almost nothing about security.

On a recent enough x86 processor, forget all of the above: the AES-NI instructions should be used to implement an AES-based PRNG (like CTR_DRBG in NIST SP 800-90) which will provide excellent alea (fit for any purpose, including cryptography) at 2 GB/s or so.

Thomas Pornin
  • 86,974
  • 16
  • 242
  • 314
  • The Skein paper itself provides support for variable output size. That's pretty close to a PRNG. It doesn't contain any security proof for using it as a PRNG though (to my knowledge). With even better newer Intel x86 processors there is a seeded PRNG build around the RdRand instruction. – Maarten Bodewes May 18 '15 at 12:34
  • What is alea, is it area not sure. – kelalaka Dec 22 '19 at 11:57
5

In general, cryptographic hash functions make great building blocks for secure PRNGs. In fact, Skein is a third-round contender remaining in the SHA-3 contest. It documents its use as a CSPRNG as a side-effect of its operation.

But as others have mentioned they are slower than a non-CS PRNG function like MT that only needs to appear statistically random.

But watch out, the construction you gave "a1 = hash(seed), a2 = hash(seed + a1), ..." suffers from a problem. If you were to use, say, SHA-1 for the hash function it has an output size of 160 bits. Due to the "birthday bound" on collision resistance, you would expect to see your function enter a cycle after only about 2^80 blocks output.

A better design is to use a "CTR mode" in which you hash the concatenation of the seed and an incrementing block counter. There are still some traps, such as you need to make sure the seed and the counter are delimited. I believe NIST has a standard for this scheme, or a very similar construction using block ciphers.

The variable-length input property of hash functions allows the counter to run forever, so it you get an effectively unlimited output period.

But wait, that's not all! As a bonus, you also get O(1) access to any position in the stream!

Marsh Ray
  • 1,876
  • 13
  • 15
  • 1
    It is not really O(1) if you have unlimited-length input. It is more O(length(n)) = O(log(n)). – Paŭlo Ebermann Aug 02 '11 at 23:35
  • Well the 'input' to such a CSPRNG, the seed value, need not exceed a reasonable fixed amount. It could certainly be less than the 447 input bits consumed by a single block SHA-(n < 512) operation. Unless you want to include some ongoing reseeding policy in the length, but that's a whole 'nother discussion. :-) – Marsh Ray Aug 03 '11 at 06:10
  • This was directed at your counter mode ... if you use hash(seed || counter) and want to increment the counter unlimited (to have an unlimited output period), the counter must grow in length eventually. Of course, the block size of SHA-2 is large enough so this is not really a problem in practice - this is more a theoretical remark. – Paŭlo Ebermann Aug 03 '11 at 09:44
  • 2
    Yeah, if you don't pay the incremental costs before the entire universe burns out, I consider it O(1). Guess I'm a coder, not a mathematician. :-) – Marsh Ray Aug 04 '11 at 03:17
  • By this argument every algorithm is O(1) if only used for inputs which fit into the universe :-p – Paŭlo Ebermann Aug 04 '11 at 14:41
  • @Marsh, The hash-based construction you mention is OK; I don't think your criticisms are valid. A cycle length of 2^80 blocks is not a security problem; that won't loop within the lifetime of our civilization. (Run the numbers and see for yourself.) And I think your claims about cycling within 2^64 blocks are inaccurate and unfounded. The known collision attacks on SHA1 are not relevant to the cycle length of this construction. – D.W. Aug 04 '11 at 20:39
  • @Paŭlo I don't get your 'by this argument...'. A string comparison algorithm can be easily demonstrated to be O(N) with just a few small observations, whereas a 512-bit counter will simply not ever roll over to needing a second block. Only in practice, not theory, of course. :-) – Marsh Ray Aug 06 '11 at 05:02
  • @D.W. Well I didn't mean to say that 2^80 was an imminent security problem - but there are certainly many users who wouldn't accept it in a new design either, e.g., NIST. Also -- I didn't believe that paper either. So I tested it with exhaustive and Monte Carlo analysis on much smaller (and likely much more ideal) random functions. My results suggested that, if anything, the effect was understated a bit. Here's some links to the ensuing discussion (including some source): http://bit.ly/oOe8tX http://bit.ly/rn3my3 My feeling about 2^64 wasn't based on the analytic collisions...just 160/2.5. – Marsh Ray Aug 06 '11 at 05:27
  • @Marsh, I disagree. On the 2^80, I think a PRNG that is secure for 2^80 blocks of output is more than adequate for most purposes; even AES-CTR doesn't achieve that level of security. On the 2^64, I don't think you understood my comment. It's not a question of believing or not believing that paper; even if the paper's claims are accurate, the paper is irrelevant in this context. That paper talks about collision attacks involving chosen messages (very long messages), which is a different concern. Nothing in that paper implies that your construction will loop earlier than 2^80 blocks. – D.W. Aug 07 '11 at 04:39
  • OK, I'll buy that about the cycle length. – Marsh Ray Aug 07 '11 at 08:36
  • You should buy it about the 2^80, too. There's a huge difference between a 80-bit key and a 2^80 limit on the number of blocks of output. When you talk about NIST not accepting it, I think you've gotten the two confused. NIST says: don't use a 80-bit key, because that's susceptible to brute-force attacks that take 2^80 work. But they gladly accepted AES, even though AES-CTR is not safe for outputting 2^80 blocks of output. – D.W. Aug 09 '11 at 02:16
  • (cont.) There's a huge difference between an attack that requires 2^80 steps of computation, vs an attack that requires 2^80 blocks of output: the latter is much less feasible than the former. The good guys control how much known text an attacker gets; the bad guys control how much computing power they put in. Therefore, we should try hard to resist attacks that take up to 2^80 steps of computation -- but attacks that require 2^64 or 2^80 blocks of output are generally pretty harmless, because no application is likely to ever produce anywhere near that much output under a single key. – D.W. Aug 09 '11 at 02:18
  • (cont.) For example, consider a system that encrypts every packet transmitted. If it sends packets at an average rate of 128 Gbps (a ridiculously high rate), it'll take 35 million years before the system has produced 2^80 blocks of output. I doubt any system we ever see is going to continue using the same key for 35 million years, so an attack that only kicks in once the system has encrypted 2^80 blocks under the same key does not seem like a very realistic threat to system security. – D.W. Aug 09 '11 at 02:24
  • Agreed, 2^80 output blocks under the same key is not worth losing sleep over. But why would you even do it that way when the CTR-mode PRNG is no more expensive? On NIST, I read them saying (in their 2007 request for SHA-3 candidate submissions): "The 160-bit hash value produced by SHA–1 is becoming too small to use for digital signatures, therefore, a 160-bit replacement hash algorithm is not contemplated." This suggests to me that they would frown upon "only" 2^80 collion resistance in a new design. They even commissioned a function with a 512-bit output size too. – Marsh Ray Aug 09 '11 at 04:46
  • @Marsh, you are still getting confused about the difference between a hash function and a PRNG. Your thinking seems muddled to me. Your NIST quotes don't prove what you think they prove. You keep bringing up collision-resistance, but as I've mentioned before, collision-resistance is not super relevant here; a PRNG does not need to be collision-resistant (the notion isn't even defined for PRNGs). – D.W. Aug 10 '11 at 23:04
  • Sorry for the confusion, let me try again: For a given seed, the PRNG we're discussing recursively evaluates a hash function on its own output. When this recursion count approaches the Collision Resistance of the function, it risks colliding with one of its previous outputs. So our expectation for such a PRNG is that its cycle length will have an upper bound determined by the HF's CR. Looking at NIST SP 800-90, Hash_DRBG uses both counters and recursive HF application, and still places very conservative limits on the allowed output (< 2^48 reqs even for SHA-512) – Marsh Ray Aug 11 '11 at 21:29
  • So I was right that NIST wouldn't like "a1 = hash(seed), a2 = hash(seed + a1), ...", their approved method is somewhat more complex and does involve counters. I was wrong thinking that NIST wouldn't like a recursive SHA-1 PRNG at 1602^80 bits output, AFAICT they don't allow anything* to output more than 2^67 bits! – Marsh Ray Aug 11 '11 at 21:45
  • "But wait, that's not all! As a bonus, you also get O(1) access to any position in the stream!" That's great if you want to regenerate the pseudo random values, but generally it is not something you require from a SPRNG. You just want random data - random data at a certain position makes no sense, especially if you want to reseed. Talking about reseed: CTR in itself doesn't support that notion, so you need to define a protocol to allow for reseeding. CTR on itself is not a good SPRNG in that sense. – Maarten Bodewes May 18 '15 at 12:40
2

Along with the PRNGs, this scheme is repeatable (deterministic) and uniform.

This also has the (mostly desirable) property of having a very large period. This method only falls into a cycle after a hash collision.

However, this is NOT cryptographically secure if the seed could be easily guessed. (e.g. if it were based solely on the clock time or a static hardware identifier.)

This seems like a pretty feasible route for a PRNG, if you are willing to sacrifice a ton of performance for the huge cycle length.

John Gietzen
  • 1,505
  • 2
  • 15
  • 16
  • 1
    As for CSPRNG, what you say is applicable to every PRNG, as you have to seed each, dont't you? However, I didn't consider the performance point (I don't know why I overlooked this). – bitmask Aug 02 '11 at 22:16
  • Well, some PRNGs are unacceptable for cryptography, regardless of the seed. I made this distinction to show that it is not sufficient to simply increase the period in order to make it secure. – John Gietzen Aug 02 '11 at 23:10