TL;DR: there is no mathematical certainty that every output value of common cryptographic hash functions is reachable, but for most that's overwhelmingly likely. A notable exception is double-SHA-256 (SHA256d) used in Bitcoin mining, where overwhelmingly likely there are some unreachable outputs.
For an idealized 256-bit hash, it becomes likely that every possible hash value $y$ is reached by hashing some $x$ when there are about $2^{264}$ possible values of $x$, including all 33-byte $x$; then odds that some $y$ is not covered become vanishingly small: less than one in $2^{b-264}$ if there are $2^b$ possible values of $x$; so while there is never a guarantee, it is practically certain that all $y$ are covered when we allow say 40-byte $x$: odds of the contrary are less than $2^{-56}$.
Argument: a hash can be idealized as a random function. The problem of if a random function covers all its image set is the coupon collector's problem. If the destination set has $n$ elements, the expected number of elements in the source set to fully cover the image set is $E(n)=n\log n+\gamma\,n+{1\over2}+O({1\over n})$ where $\gamma\approx0.577$ is the Euler–Mascheroni constant; and odds of not covering the input set after hashing $E(n)/\epsilon$ elements are less than $\epsilon$. Changing $n$ to $2^b$ for a $b$-bit hash, we get $E(2^b)\approx2^b(b\log(2)+\gamma)$ and $\begin{align}\log_2(E(2^b))
&\approx b+\log_2\left(b+{\gamma\over\log(2)}\right)+\log_2(\log(2))\\
&\approx b+\log_2(b+0.833)-0.528766\end{align}$
SHA-256 is a Merkle-Damgård hash using a compression function built per the Davies-Meyer construction. The problem of if all $y=\operatorname{SHA-256}(x)$ are reached reduces, up to 55 bytes hashed, to if all $\operatorname{Enc}_x(IV)$ are reached for a certain constant $IV$ and a certain block cipher $\operatorname{Enc}$ where $x$ is the key, and key scheduling encompasses the block padding. For an idealized block cipher the same analysis as for an idealized hash would hold, so we'd cover all values with overwhelming odds. SHA-256 uses an ARX block cipher and I see no reason why it would significantly differ from an idealized one, but that's a weak argument.
We can be more positive that we get full coverage when we allow two rounds (say, 100-byte hashes), because the problem is now if all $\operatorname{Enc'}_{x_1}(y_0)\boxplus y_0$ are reached where $x_1$ is the 36-byte second block, and $y_0$ is allowed to vary among the nearly-full set of 32-byte values reachable with the first round ( $\boxplus$ is 256-bit addition without carry across 32-bit boundaries). Still, there is no mathematical guarantee.
In Bitcoin mining, what's computed is $y=\operatorname{SHA-256}(\operatorname{SHA-256}(x))$. We are, for the outer hash, much below the coupon collector's bound and overwhelmingly above the birthday bound, therefore it is overwhelmingly likely that there are some values of $y$ not reached.
I second Dave Thomson's comment below.