Cryptographic hash where the hashing function input is similar to the result

Question

I was wondering if there is a cryptographic hash where the resulting hash is identical to the input for the hash function.

If this is the case, would it be a weakness in the hashing function?

use cryptography stack exchange for questions about cryptography. — Z.T., Mar 31 '19 at 01:52
We don't know. I can happen with any cryptographic hash function with unknown probability. — kelalaka, Mar 31 '19 at 08:09
Does your question ask that for all inputs, the hash function output is equal to the input, or that for some (possibly very few) inputs, the hash function outputs is equal to the input? — Ella Rose, Mar 31 '19 at 14:18
Or, even can $ H(x||remainder) = x $ , where $remainder$ is that part of the input > block width? This would agree with the question's title. — Paul Uszak, Apr 01 '19 at 13:22

Squeamish Ossifrage · Answer 1 · 2019-04-01T14:59:20.600

If you mean that for all $x$, $H(x) = x$, this is usually known as the identity function. It has the convenient property that it is guaranteed not only to be collision-resistant, but to have absolutely no collisions whatsoever. However, it has the inconvenient property that finding preimages is slightly easier than one might hope, which one might consider to be a ‘weakness’.

That said, whether it is weak or not depends on what security goals you are hoping for it! There are many kinds of things called hash functions, even in cryptography—universal hash families like Poly1305 and GHASH, message authentication codes like Poly1305-AES and AES-GMAC, stream ciphers like Salsa20, pseudorandom function families like SipHash, collision-resistant hashes like SHA3-256, password-based key derivation functions like argon2, etc. But the identity function is usually not considered a hash function.

If you mean that there exists $x$ such that $H(x) = x$, called a fixed point of $H$, well, for a uniform random function $H$, what's the probability this can happen? Let's suppose the input and output are $h$ bits long. We can ask the complementary question of the probability that there are no fixed points, and note that for each distinct $x$ independently, $\Pr[H(x) = x] = 1/2^h$. Then

\begin{align} \Pr[\exists x. H(x) = x] &= \Pr\bigl[\lnot \forall x. \lnot(H(x) = x)\bigr] \\ &= 1 - \Pr[\forall x. H(x) \ne x] \\ &= 1 - \prod_x \Pr[H(x) \ne x] \\ &= 1 - \prod_x \bigl(1 - \Pr[H(x) = x]\bigr) \\ &= 1 - \prod_x (1 - 1/2^h) \\ &= 1 - (1 - 1/2^h)^{2^h} \\ &\approx 1 - e^{-1} \approx 63\%. \end{align}

So there are better than 50-50 odds that a fixed point exists in a uniform random choice of function. We often use this to model functions like SHA3-256. Does this indicate a weakness? Well, there are two sides to this:

The traditional properties that we might think of when we hear ‘cryptographic hash function’ after reading Wikipedia without consulting any cryptography literature are preimage resistance, second-preimage resistance, and collision resistance. Knowing a fixed point of SHA3-256 doesn't mean you can find preimages, second preimages, or collisions. So in that sense, a fixed point does not indicate a weakness. Indeed, it is very easy to find fixed points in the internal compression function of any Davies–Meyer hash function like MD5, SHA-256, etc., although that doesn't mean a fixed point on the full hash function.
On the other hand, the probability that you come upon a fixed point by happenstance in a single trial is $1/2^h$. For, e.g., SHA3-256, this is unimaginably improbable for all imaginable numbers of trials—so the inference one might draw in practical terms is that there likely is a weakness in SHA3-256 that enabled you to find it, but the details would depend on how you found it and on details of SHA3-256, not on the general fact of a fixed point, and while it would cast serious doubt on the security of SHA-3, it might not actually have any practical consequences in any real protocols.

It has the convenient property that it is guaranteed not only to be collision-resistant, but to have absolutely no collisions whatsoever note that this property is easy to achieve if you don't compress anything. — Ella Rose, Mar 31 '19 at 14:16
Would the downvoters care to elaborate on what you disagreed with in this? — Squeamish Ossifrage, Mar 31 '19 at 18:50
Perhaps the disagreement is about the presuemd implication that the identity function is a cryptographic hash? I'm not sure, I wasn't a downvoter. — forest, Apr 01 '19 at 22:37
‘But the identity function is usually not considered a hash function.’ — Squeamish Ossifrage, Apr 02 '19 at 01:06
It think the question is NOT precise. Is he asking "is there a fixed point" or "can a hash be the identity"? And no, given the vague question, there is nothing wrong with your answer. — kodlu, Apr 02 '19 at 03:32

score 2 · Answer 2 · answered Mar 31 '19 at 07:57

No.

Take a look at the Definition Wikipedia offers (shortened):

A cryptographic hash function is a special class of hash function that has certain properties which make it suitable for use in cryptography. It is a mathematical algorithm that maps data of arbitrary size to a bit string of a fixed size (a hash) and is designed to be a one-way function, that is, a function which is infeasible to invert. The only way to recreate the input data from an ideal cryptographic hash function's output is to attempt a brute-force search of possible inputs to see if they produce a match, or use a rainbow table of matched hashes. [...]

The ideal cryptographic hash function has five main properties:

it is deterministic so the same message always results in the same hash

it is quick to compute the hash value for any given message

it is infeasible to generate a message from its hash value except by trying all possible messages

a small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value

it is infeasible to find two different messages with the same hash value

Your proposed identity function can obviously not satisfy them:

it does not produce a fixed size bit string as a result
it is extremely easy revertable, as nothing has changed
a small change in the input also leads to a small change in the output

Cryptographic hash where the hashing function input is similar to the result

2 Answers2

Linked

Related