SHA-256 doesn't follow a uniform distribution?

Question

I have been playing with SHA-2-256 in Julia and I noticed that the hashes produced don't appear to follow a uniform distribution. My understanding of secure hashing algorithms is that they should approximate a uniform distribution well, so they are not predictable.

Here is the Julia code I'm using:

using BitIntegers, Distributions, HypothesisTests, Random, SHA
function sha256_rounds()
    rounds::Array{Array{UInt8,1}} = Array{Array{UInt8,1}}(undef, 10000) # 10000 Samples
    hash::Array{UInt8} = Array{UInt8}(undef, 64) # 64-byte array
for i = 1:10000
    hash = sha2_256(string(rand(UInt64), base = 16)) # Random number, convert to hex string, then seed
    rounds[i] = hash
end

return rounds

end
sha256_str_vals = [join([string(x, base = 16) for x in y]) for y in sha256_rounds()] # Stitch the bytes together into strings
sha256_num_vals_control = [parse(UInt256, x, base = 16) for x in sha256_str_vals] # Get the numerical value from the strings
OneSampleADTest(sha256_num_vals, Uniform()) # One sample Anderson-Darling test

And the result of the test:

One sample Anderson-Darling test
--------------------------------
Population details:
    parameter of interest:   not implemented yet
    value under h_0:         NaN
    point estimate:          NaN
Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-7
Details:
    number of observations:   10000
    sample mean:              8.73991847621225e75
    sample SD:                2.2742656031884893e76
    A² statistic:             Inf

To me this says that the produced hashes do not conform to a uniform distribution. Am I using the test incorrectly, or is my sample faulty? Thank you for your thoughts.

your hash value stores $64*8 = 512$ bits, however, SHA-256 has 256-bits, define it as hash::Array{UInt8} = Array{UInt8}(undef, 32) # 32-byte array — kelalaka, Nov 26 '21 at 20:31
I remember earlier similar claims that a hash or a block cipher's output is not random. They turned out to be wrong. SHA-256's outputs (for distinct inputs prepared independently of the constants in SHA-256) are usable to validate statistical tests. Independently: the claim needs to be expressed independently of the Julia code, and include a description of the statistical test performed. — fgrieu, Nov 26 '21 at 20:41
Look at similarly How to get an output of SHA-1 with first 2-bit are zeros? — kelalaka, Nov 26 '21 at 21:24
I voted to reopen even without the improvements fgrieu suggested. SHA256 will not fail a simple statstical test, I would try to test individual bits, and bit pairs, to convince myself it aproximate uniformty. If you insist on the test you applied, look at how you are converting to numeric the bug is very likely there. — Meir Maor, Nov 28 '21 at 08:00
This is actually really simple. Generate 1GB of stuff in counter mode and run ent on it. If it passes so be it. If it fails, then so does your code... — Paul Uszak, Dec 28 '21 at 22:39

fgrieu · Answer 1 · 2021-11-28T19:50:03.993

Again, we are not a code review site, especially for code in a language seldom used for cryptography. And there are obvious issues with the code:

sha256_num_vals_control is computed but not used, when presumably the intend was that it is.
I can see neither an attempt to normalize the generated material to interval $[0,1)$, nor an input to OneSampleADTest specifying a range.

I conclude the samples for OneSampleADTest are not formatted as expected for this test. Malformed in, garbage out.

Even if the samples were correctly formatted, cryptography would not care for bugs in OneSampleADTest in a certain version of Julia and the library used. It would care for a valid claim that SHA-256 output for distinct inputs prepared independently of the constants in SHA-256 can be distinguished from random. But such extraordinary claim would need extraordinary evidence. And as a preliminary, a description independent of the language and it's libraries.

SHA-256 doesn't follow a uniform distribution?

1 Answers1