12

If I hash a keyword with SHA-512 and then feed the output as the key for the next round ....and keep repeating this process, will I gather a stream of random numbers?

user2256790
  • 433
  • 4
  • 12
  • 2
    pseudorandom yes, depending on your use of the outputs, the method you described may be a massive security risk – Richie Frame Apr 09 '14 at 08:38
  • 2
    http://crypto.stackexchange.com/questions/48/is-it-feasible-to-build-a-stream-cipher-from-a-cryptographic-hash-function describes some of the problems you may run into with schemes like this. – archie Apr 09 '14 at 09:07

2 Answers2

14

For an adversary not knowing the definition of SHA-512 (or just not knowing the 512-bit initialization constant of SHA-512, defined as the first sixty-four bits of the fractional parts of the square roots of the first eight prime numbers), the sequence obtained by $$\begin{align*} H_0&=\text{SHA-512}(Seed){\small\text{ where }}Seed{\small\text{ is the statement's keyword}}\\ H_{i+1}&=\text{SHA-512}(H_i)\\ \end{align*}$$ is a Cryptographically Secure Pseudo-Random Number Generator as far as we know. It is in practice indistinguishable from random for said adversary, with residual odds of the contrary less than $2^{-100}$, assuming a few additional requirements:

  1. Less than $2^{200}$ outputs are available to the adversary [rationale: if it happens that the generator enters a cycle, then the adversary can predict future output, including with feasibly little memory; after about $2^{(512-100+1)/2}$ iterations of a random function with 512-bit output, odds of cycling are about $2^{-100}$; I kept some margin]. Notice that producing even the first $2^{60}$ outputs would take at least five years with current technology, because this RNG and SHA-512 are a serial process [estimate based on two gate delays each one picosecond per round].
  2. This adversary uses classical computing means bound to perform less work than needed for $2^{250}$ hash computations (a safe assumption), or anything that I can fantasize today (your call) [rationale: the best explicit attack I have enumerates the SHA-512 initialization values, and that reaches odds $2^{-100}$ to succeed at about $2^{412}$ hashes; I kept a helluva of margin]; Note: 1 and 2 can be combined into the adversary can not perform the classical-computing equivalent of counting to $2^{200}$, which is still very credible.
  3. $Seed$ is never reused.
  4. The adversary does not obtain the SHA-512 specification (including initialization value) by some oblique mean: reverse engineering, operating goof, spying (including bribery and planting trojans), rubber hose cryptanalysis, side channels.. [Note: reading the official specification was discounted in the first sentence].

However, with respect to an adversary knowing the full definition of SHA-512 (which is the assumption a cryptographer will make by Kerckhoffs's principles), the generator is unsafe. In particular, $H_j$ for $j>i$ can be trivially predicted from $H_i$; the generator fails the next bit test.


In addition, from a practical perspective, the generator is very bad by the mere fact that it is simultaneously

  • deterministic;
  • without another key than a keyword presumably of low entropy;
  • without provision for key streching to slow keyword enumeration.

If the keyword is simple enough to be reliably memorized in a real application by a majority of adult humans, then password cracking can quickly find the keyword by enumeration knowing say 10 bytes of $H_0$.


So all in all, the generator is secure for some non-cryptographic applications like numerical simulations, and disastrous from the perspectives on-topic here.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
5

The definition of "random" is something not very clear that deserves some more explanation, like what you expect from the output number sequence.

  • If you want an uniformed distributed sequence you will get it.

  • If you want an unpredictable sequence you won't.

  • If you want a "sequence undistinguishable from random" you won't get it either.

xxxxxxxxx
  • 558
  • 2
  • 11
  • 3
    Strictly speaking, the sequence is not uniformly distributed: it will ultimately enter a cycle (that's expected after roughly $2^{256}$ hashes), and it is extremely unlikely that the mean number of $1$ in this cycle is exactly $1/2$, a requirement for any uniformly distributed ultimately cyclic sequence. – fgrieu Apr 09 '14 at 09:48