Is there something wrong with using a hash function as a PRG?

Question

I need to create cryptographically secure pseudorandomness in JavaScript. However, when I googled for PRGs, all I found was very sketchy.

My idea is as follows (in pseudocode):

seed = "0x1a29fd..." // long number I always get passed (impossible to guess but used in a different context as well)
hashedAndSaltedSeed = sha256sum("seed: " + seed)
purpose = "..." // my current function's name (no spaces)
usageIndex = 1; // will increment this each time

randomness = myPrg(hashedAndSaltedSeed, purpose, usageIndex)

function myPrg(hashedAndSaltedSeed, purpose, usageIndex, numberOfBytes) {
    if(numberOfBytes > 32) {
        fail()
    }

    input = purpose + " " + usageIndex + " " + hashedAndSaltedSeed
    return sha256sum(input).binary2hex().slice(0, 2*numberOfBytes).hex2binary()
}

I only need a small amount of randomness (at most 32 Byte at a time, very few times), so the speed difference won't matter. But is there anything else wrong with this approach?

The randomness I need doesn't need to be distributed perfectly randomly but it needs to be infeasible to guess the resulting randomness when only given purpose and usageIndex but not seed nor hashedAndSaltedSeed.

Edit: I'm sorry that I forgot to mention an important requirement. I'm sure it was in my question at some point as I wrote it but I seem to have deleted that part accidentally. I need to be able to reproduce the same randomness when given the same seed. That's why I can't just use something that gives me randomness but doesn't let me control the seed.

If seed is secret drawn uniformly at random from $2^{256}$ possibilities, that's fine, but your naming is a little confusing: what is the salt in hashedAndSaltedSeed? This also, of course, begs the question of where seed comes from, but if your present goal is just to have a pseudorandom function family for which it is the caller's responsibility to choose a seed uniformly at random, then that's fine. — Squeamish Ossifrage, Aug 26 '19 at 21:16
@kelalaka That won't work because I need to be able to reproduce the same randomness. (Sorry for not having mentioned this requirement in my initial question.) — UTF-8, Aug 26 '19 at 23:40
@SqueamishOssifrage That's exactly how its drawn. :) The reason I don't wanna use it straight away or even its hash is that its also used as a key in a different part of the project. That's why I want to add some string that indicates my use case prior to hashing so that if I hand hashedAndSaltedSeed around, no one can do anything with it that it's not intended for. It's not technically a salt because it's constant. Maybe I should find a better name. — UTF-8, Aug 26 '19 at 23:48
I would recommend having the number of bytes requested as an input to the final output function — Richie Frame, Aug 27 '19 at 01:05
You should make sure that the inputs are encoded uniquely. As is, distinct inputs with spaces might get confused with one another and cause the same result. — Squeamish Ossifrage, Aug 27 '19 at 01:13
@SqueamishOssifrage I'm sorry, I don't understand what you mean. Do you mean when the seeding triplets are written down by a human who doesn't know where one of them ends and the next one begins? — UTF-8, Aug 27 '19 at 09:45
If I pass in purpose="hello world" and usageIndex="foobar", I'll get the same key as if I pass in purpose="hello" and usageIndex="world foobar" even though the inputs are distinct. — Squeamish Ossifrage, Aug 27 '19 at 12:29
@SqueamishOssifrage Ah, okay. But this can't work for two reasons: Function names cannot contain spaces; usage indices are integers. I chose a space as the delimiter in the example because spaces do not occur anywhere in the input triplet. — UTF-8, Aug 27 '19 at 13:47
I now changed the order of what goes into the hash function in case the hash function is vulnerable to a length extension attack. — UTF-8, Aug 27 '19 at 13:49
Maybe use HMAC-SHA256 or HKDF-SHA256, with test vectors to confirm you're implementing them correctly, so you can simplify the job of auditors who will be wondering about uniquely encoded inputs and length extensions? — Squeamish Ossifrage, Aug 28 '19 at 14:43

score 6 · Accepted Answer · answered Aug 26 '19 at 20:45

6

This is fine, although you should avoid these conversions to and from hexadecimal, that are no needed and may introduce side channels.

Now, since you are using JavaScript, you can simply use the WebCrypto API. Specifically crypto.GetRandomValues, which can fill an array up to 65536 bytes.

answered Aug 26 '19 at 20:45

Frank Denis

2,964
15
17

That's one of the libraries I came across when looking for one prior to asking. It doesn't let me control the seed, does it? (Sorry for not having mentioned this requirement in my initial question.) – UTF-8 Aug 26 '19 at 23:39
@UTF-8 what amount of control do you need? – Natanael Aug 27 '19 at 05:54
I need to be able to specify a seed. When I specify the same seed again, I need to get the same pseudorandomness out (all the calls will be exactly the same order and exactly the same ranges if different ranges / sizes are possible). – UTF-8 Aug 27 '19 at 09:39

qz- · Answer 2 · 2021-11-07T17:22:19.300

In practice it is probably fine.

In theory - for anyone looking for the theoretical answer - hash functions only guarantee that collisions are infeasible to find by a PPT attacker while a PRG guarantees the output is indistinguishable from uniform randomness to a PPT attacker. The two are very different. Notably, a hash function can always output $0$ for the first $n/2$ bits and still be collision-resistant but if used for a PRG it would be trivially distinguishable in $O(n)$ time. A PRG is also supposed to be length-expanding whereas hash functions are often length compressing.

One-Way-Functions (OWFs), PRGs, and hash functions are all related. OWFs imply PRGs and vis-versa. A good practical hash function will probably be a PRG, as @fgrieu mentions in comments below.

The modern ideal for a hash is computational Indistinguishability from a random oracle. SHA-3 is designed for (and believed to meet) that goal. SHA-2 is believed to meet (not designed for) that goal except for the length-extension property. — fgrieu, Nov 03 '21 at 07:37

score 1 · Answer 3 · answered Aug 28 '19 at 06:04

1

If instead of using a hash function, you used a block cipher, this would be precisely using CTR mode as a PRNG. This can definitely work, but I'd recommend you read this post to see some implementation details you'd want to be careful about (for the case of using a block cipher/AES).

answered Aug 28 '19 at 06:04

Mark Schultz-Wu

12,944
19
41

This seems like a better idea than using a hash function. Thank you! – UTF-8 Aug 28 '19 at 11:53
1

Actually you'll get better security with essentially no limits on the volume of data you can process if you use SHA-256, and less complexity of code to hash structured inputs like your purpose and usage index (as long as the inputs are uniquely encoded and prefix-free), and you won't invite the standard timing side channel attacks that AES invites. The performance may not be as good but that's probably not going to be your bottleneck here. – Squeamish Ossifrage Aug 28 '19 at 14:41
@SqueamishOssifrage I can see that a random oracle would be better than an ideal cipher. But real hash functions might only produce multiples of 7 w/o violating the criteria of collision-resistant hash functions, or they might have other characteristics. Furthermore, ciphers are guaranteed to use the entire value space. – UTF-8 Aug 28 '19 at 19:08
@UTF-8 That's why I said SHA-256 specifically, and not ‘a collision-resistant hash function’ generally. (See https://crypto.stackexchange.com/a/70709 for a more nuanced discussion.) Use of the entire value space is actually part of what weakens the security of, e.g., AES-CTR in contrast to ‘SHA256-CTR’ and why safe data volume limits are something you actually have to worry about for AES-CTR, unlike SHA256-CTR. – Squeamish Ossifrage Aug 28 '19 at 19:15

Is there something wrong with using a hash function as a PRG?

3 Answers3