7

I have two pseudorandom generators:

  1. $f_1$ takes a random seed $l_0||r_0$ $\in \{0,1\}^{160}$ as input and outputs $r_1||r_2|| \dots ||r_k$, where $l_i||r_i = \operatorname{SHA-1}(l_{i-1}||r_{i-1})$ and $l_i, r_i \in \{0,1\}^{80}$

  2. $f_2$ takes a random seed $s \in \{0,1\}^{160}$ as input and outputs $t_1||t_2|| \dots ||t_{k/2}$, where $t_i = \operatorname{SHA-1}(s||s_i)$, where $s_i \in \{0,1\}^{80}$ is the binary representation of $i$.

($l||r$ denotes the concatenation of $l$ and $r$.)

I'm thinking that $f_1$ seems better in a "random" perspective because for each step we use the output from the previous step as input, whereas in $f_2$ we use the same seed together with the position, which seems very predictable. But maybe that doesn't matter since we hash it? Then I would prefer $f_2$ because we get a longer output (more pseudorandomness) from the same amount of random input.

Am I on the right track or have I totally missed something?

Paŭlo Ebermann
  • 22,656
  • 7
  • 79
  • 117
Sup3rgnu
  • 359
  • 2
  • 6

2 Answers2

5

In cryptography, the standard we use when evaluating a cryptographically secure random number generator is "how much effort does it take to distinguish this generator from a truly random source".

By this criteria, we find that, as specified, your $f_2$ is considerably better than your $f_1$.

With $f_1$, we can distinguish the generator from random (and predict future outputs) by selecting a $r_i$, iterating through the possible values of $l_i$, and find if there is a $l_i$ value that successfully predicts the $r_{i+1}, r_{i+2}, \ldots$ values. There are $2^{80}$ possible $l_i$ values, and hence this rng can be distinguished with approximately $2^{80}$ SHA-1 evaluations.

In contrast, with $f_2$, where is no known weakness in SHA1 where, given a long series of hashes of related equal-length plaintexts, we can deduce anything about the plaintexts (or, indeed, whether the hashes were from related plaintexts). Hence, the best approach would be to iterate through the possible $s$ values, and try to find one that correctly predicts the outputs. There are $2^{160}$ possible $s$ values, hence this approach would take approximately $2^{160}$ SHA-1 evaluations.

In addition, I would notice a practical advantage to $f_2$; it performs one SHA-1 evaluation for every 160 bits of output, while $f_1$ performs one SHA-1 evaluation for every 80 bits of output. This means that, practically speaking, $f_2$ is likely to be twice as efficient.

On the other hand, while $f_2$ is distinctly better, I couldn't really recommend either; I would suggest you look at the NIST 800-90 random number generators, and in particular, the CTR_DRBG - this would be rather more efficient then either of your approaches.

J_H
  • 264
  • 2
  • 8
poncho
  • 147,019
  • 11
  • 229
  • 360
4

Under the assumption that SHA-1 is a random oracle, $f_2$ is better (if I understand it properly, i.e. concatenating a constant seed with a counter) in that it forbids the possibility of a short cycle. Specifically, $f_2$ guarantees that no two inputs will be the same, which means that any two outputs are independent of each other, whereas $f_1$ does not guarantee this.

Since SHA-1 is not a random oracle, we cannot exclude the possibility that feeding a counter into its compression function may produce some output bias. But block ciphers have been doing this for a long time now (CTR mode), so I doubt it would be a problem in practice (but I could be wrong - look at Thomas Pornin's answer in this question).

On a related note, $f_2$ can be enhanced - SHA-1 has an block size of 512 bits, so for a maximum pseudorandom output size of $2^n$ hashes, you could have $s$ be $512 - n$ bits large (or, if you wish to include padding in a single block, $447 - n$ bits) for some more entropy. $f_1$ cannot be enhanced in this way because of its feedback nature.

Thomas
  • 7,478
  • 1
  • 31
  • 44