I was wondering if there is a cryptographic hash where the resulting hash is identical to the input for the hash function.
If this is the case, would it be a weakness in the hashing function?
I was wondering if there is a cryptographic hash where the resulting hash is identical to the input for the hash function.
If this is the case, would it be a weakness in the hashing function?
If you mean that for all $x$, $H(x) = x$, this is usually known as the identity function. It has the convenient property that it is guaranteed not only to be collision-resistant, but to have absolutely no collisions whatsoever. However, it has the inconvenient property that finding preimages is slightly easier than one might hope, which one might consider to be a ‘weakness’.
That said, whether it is weak or not depends on what security goals you are hoping for it! There are many kinds of things called hash functions, even in cryptography—universal hash families like Poly1305 and GHASH, message authentication codes like Poly1305-AES and AES-GMAC, stream ciphers like Salsa20, pseudorandom function families like SipHash, collision-resistant hashes like SHA3-256, password-based key derivation functions like argon2, etc. But the identity function is usually not considered a hash function.
If you mean that there exists $x$ such that $H(x) = x$, called a fixed point of $H$, well, for a uniform random function $H$, what's the probability this can happen? Let's suppose the input and output are $h$ bits long. We can ask the complementary question of the probability that there are no fixed points, and note that for each distinct $x$ independently, $\Pr[H(x) = x] = 1/2^h$. Then
\begin{align} \Pr[\exists x. H(x) = x] &= \Pr\bigl[\lnot \forall x. \lnot(H(x) = x)\bigr] \\ &= 1 - \Pr[\forall x. H(x) \ne x] \\ &= 1 - \prod_x \Pr[H(x) \ne x] \\ &= 1 - \prod_x \bigl(1 - \Pr[H(x) = x]\bigr) \\ &= 1 - \prod_x (1 - 1/2^h) \\ &= 1 - (1 - 1/2^h)^{2^h} \\ &\approx 1 - e^{-1} \approx 63\%. \end{align}
So there are better than 50-50 odds that a fixed point exists in a uniform random choice of function. We often use this to model functions like SHA3-256. Does this indicate a weakness? Well, there are two sides to this:
The traditional properties that we might think of when we hear ‘cryptographic hash function’ after reading Wikipedia without consulting any cryptography literature are preimage resistance, second-preimage resistance, and collision resistance. Knowing a fixed point of SHA3-256 doesn't mean you can find preimages, second preimages, or collisions. So in that sense, a fixed point does not indicate a weakness. Indeed, it is very easy to find fixed points in the internal compression function of any Davies–Meyer hash function like MD5, SHA-256, etc., although that doesn't mean a fixed point on the full hash function.
On the other hand, the probability that you come upon a fixed point by happenstance in a single trial is $1/2^h$. For, e.g., SHA3-256, this is unimaginably improbable for all imaginable numbers of trials—so the inference one might draw in practical terms is that there likely is a weakness in SHA3-256 that enabled you to find it, but the details would depend on how you found it and on details of SHA3-256, not on the general fact of a fixed point, and while it would cast serious doubt on the security of SHA-3, it might not actually have any practical consequences in any real protocols.
It has the convenient property that it is guaranteed not only to be collision-resistant, but to have absolutely no collisions whatsoever
note that this property is easy to achieve if you don't compress anything.
– Ella Rose
Mar 31 '19 at 14:16
No.
Take a look at the Definition Wikipedia offers (shortened):
A cryptographic hash function is a special class of hash function that has certain properties which make it suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size to a bit string of a fixed size (a hash) and is designed to be a one-way function, that is, a function which is infeasible to invert. The only way to recreate the input data from an ideal cryptographic hash function's output is to attempt a brute-force search of possible inputs to see if they produce a match, or use a rainbow table of matched hashes. [...]
The ideal cryptographic hash function has five main properties:
- it is deterministic so the same message always results in the same hash
- it is quick to compute the hash value for any given message
- it is infeasible to generate a message from its hash value except by trying all possible messages
- a small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value
- it is infeasible to find two different messages with the same hash value
Your proposed identity function can obviously not satisfy them: