Is there a guarantee that for each possible hash y there exists a number x such that with hash function H, H(x) = y?

Question

Specifically talking about SHA-256 here and its involvement in Bitcoin, this was a question someone asked me which I didn't know the answer to. Yes, I am aware that SHA-256 can only have a hash from $0$ to $2^{256} -1$, so it is fine for this to be the range for our y. I am asking within the range, is there a guarantee that there is an x for each possible y.

I ask this mainly because I wonder if there is anything that guarantees that a new Bitcoin block can always be made with a hash below its target.

Note, if you didn't notice it yet: You are asking whether SHA-256 is surjective. — SEJPM, Jul 08 '17 at 11:33
Crossdupe with ursine answer https://stackoverflow.com/questions/2658601/do-cryptographic-hash-functions-reach-each-possible-value-e-g-are-they-surject (perhaps he was migrating?) although I can't find any similar on bitcoin.SX where it would fit fine — dave_thompson_085, Jul 09 '17 at 00:02

fgrieu · Accepted Answer · 2018-02-12T19:09:55.627

TL;DR: there is no mathematical certainty that every output value of common cryptographic hash functions is reachable, but for most that's overwhelmingly likely. A notable exception is double-SHA-256 (SHA256d) used in Bitcoin mining, where overwhelmingly likely there are some unreachable outputs.

For an idealized 256-bit hash, it becomes likely that every possible hash value $y$ is reached by hashing some $x$ when there are about $2^{264}$ possible values of $x$, including all 33-byte $x$; then odds that some $y$ is not covered become vanishingly small: less than one in $2^{b-264}$ if there are $2^b$ possible values of $x$; so while there is never a guarantee, it is practically certain that all $y$ are covered when we allow say 40-byte $x$: odds of the contrary are less than $2^{-56}$.

Argument: a hash can be idealized as a random function. The problem of if a random function covers all its image set is the coupon collector's problem. If the destination set has $n$ elements, the expected number of elements in the source set to fully cover the image set is $E(n)=n\log n+\gamma\,n+{1\over2}+O({1\over n})$ where $\gamma\approx0.577$ is the Euler–Mascheroni constant; and odds of not covering the input set after hashing $E(n)/\epsilon$ elements are less than $\epsilon$. Changing $n$ to $2^b$ for a $b$-bit hash, we get $E(2^b)\approx2^b(b\log(2)+\gamma)$ and $\begin{align}\log_2(E(2^b)) &\approx b+\log_2\left(b+{\gamma\over\log(2)}\right)+\log_2(\log(2))\\ &\approx b+\log_2(b+0.833)-0.528766\end{align}$

SHA-256 is a Merkle-Damgård hash using a compression function built per the Davies-Meyer construction. The problem of if all $y=\operatorname{SHA-256}(x)$ are reached reduces, up to 55 bytes hashed, to if all $\operatorname{Enc}_x(IV)$ are reached for a certain constant $IV$ and a certain block cipher $\operatorname{Enc}$ where $x$ is the key, and key scheduling encompasses the block padding. For an idealized block cipher the same analysis as for an idealized hash would hold, so we'd cover all values with overwhelming odds. SHA-256 uses an ARX block cipher and I see no reason why it would significantly differ from an idealized one, but that's a weak argument.

We can be more positive that we get full coverage when we allow two rounds (say, 100-byte hashes), because the problem is now if all $\operatorname{Enc'}_{x_1}(y_0)\boxplus y_0$ are reached where $x_1$ is the 36-byte second block, and $y_0$ is allowed to vary among the nearly-full set of 32-byte values reachable with the first round ( $\boxplus$ is 256-bit addition without carry across 32-bit boundaries). Still, there is no mathematical guarantee.

In Bitcoin mining, what's computed is $y=\operatorname{SHA-256}(\operatorname{SHA-256}(x))$. We are, for the outer hash, much below the coupon collector's bound and overwhelmingly above the birthday bound, therefore it is overwhelmingly likely that there are some values of $y$ not reached.

I second Dave Thomson's comment below.

The bitcoin header (your $x$) is 80 octets, hence two input blocks/rounds for the inner SHA256 with 256 bits state and about 70 bits variable data in the second round. Given bitcoin is limited to one planet (at a time) and matter and energy available for mining on this planet are bounded, it's likely possible to show that the adaptively-set target will never go small enough to seriously risk failing to find a hash satisfying it. — dave_thompson_085, Jul 08 '17 at 23:57

score 1 · Answer 2 · answered Jul 08 '17 at 10:55

No, there is no such gurantee. The number of y values with no pre-image greatly depends on which values you are willing to consider for x. If you are considering only n bit values for x and an n bit value for y then you expect to have many values with no pre-image, specifically 1 out of e values are expected to have no pre image. However for a compression function 2n bits to n bits the expected number of values with no preimage is zero, though still no gurantee. All of this is assuming the hash function behaves like a pseudo random function, which I believe is a reasonable assumption to make for this kind of analysis and common cryptographic hash functions.

Is there a guarantee that for each possible hash y there exists a number x such that with hash function H, H(x) = y?

2 Answers2

Linked