3

I'm trying to figure out whether the following is impossible or merely improbable:

Does there exist an input, I and its hashed value, H, such that:

hash(I) --> H and hash(H) --> I

If not, what prevents it?

John Ruiz
  • 133
  • 4
  • See this answer on Is it possible to demonstrate that md5(x) != x for any x?. We expect fixed points to be very likely, but at the same time, really hard to find. – mikeazo Sep 24 '18 at 16:30
  • 2
    @mikeazo actually this question admits any cycles of length at most 2 as an answer, so it's actually more general than asking for a fixed point, even though the answer is probably about the same. – SEJPM Sep 24 '18 at 16:58
  • @SEJPM I see what you mean. I immediately thought of fixed points, but you are correct. – mikeazo Sep 24 '18 at 17:37
  • 1
    Nothing much is proven about the practical hash functions like SHA. (That's probably a good thing; any structure that would allow to prove things could also allow attacks.) – fkraiem Sep 25 '18 at 02:21
  • @SEJPM I don't see how that influences the calculations. And with these kind of numbers, I expect any difference in calculation to have a huge impact on the result. – Maarten Bodewes Sep 25 '18 at 13:00
  • @MaartenBodewes see korosensei's answer on how this affects the calculations ;) – SEJPM Sep 26 '18 at 21:09
  • by what reasoning do you call it a palindrome? – Fractalice Sep 27 '18 at 17:15
  • @Hyperflame - perhaps it's poor reasoning. A palindrome is a word that spells the same backwards and forwards. That's what I had in mind when I thought about an input that can be hashed forward and then hashed back to itself. I'm not invested in the name; feel free to call it whatever you like. – John Ruiz Sep 28 '18 at 04:14

2 Answers2

4

TL;DR: The probability of a hash palindrome should be around $1-\frac{1}{e^2} \approx 0.8647.$

I'd like to generalize the question and answer it immediately afterwards.
First of all, let's make the setting more abstract. Say we have a set $M$ of finitely many hash values and denote its cardinality $|M|$ by $m$, usually $M=\{0,1\}^m$.

Then a more general question could be: For any random permutation $H: M \rightarrow M$, how probable is it, that there exists some $x\in M$ such that $\underbrace{H(H(...x))}_{\text{n times}}=x$, meaning that after $n$ repetitions of $H$ on $x$ the hashed value is equal to the input. We'll have to assume surjectivity of the hash function for now, despite it technically not being accurate. Now that we know what we are looking for let's focus on the case that the $n$-th time is the first time this occurs.

If you are wondering whether $n\geq m$ is possible, there's a section about this at the end of the post, otherwise let's continue with $n < m$.

Case 1:
For simplicty let's look at $n = 1$ first. The question then is equivalent to asking what the probabiliy is, that $H$ has some fixed point or to speak in the language of permutations: $H$ does not produce a derangement of $M$, as there is one value (the fixed point) that is not deranged.

Let's first calculate the probability of a permutation being a derangement. In general there exist $m!$ many permutations of $M$ and $!m$ of them are in fact derangements, with $!m$ meaning

$$!m=m! \sum_{i=0}^{m} \frac{(-1)^i}{i!}=m!\cdot\left(1-1+\frac12-\frac16+\frac1{24}-\ldots\right)$$.

So we get $$P(\text{H produces a derangement})=\frac{!m}{m!}$$

Now, as our $m$ get's larger and larger our $n$ does so as well. So I will be assuming that we have large enough values, such as $m=2^{256}$, for both to make the known approximation $\lim_{n\rightarrow \infty} \frac{!m}{m!}=\frac{1}{e}\approx 0.3679$ which results in

\begin{align} P(\text{H does not produce a derangement})&=\\ 1-P(\text{H produces a derangement})&=\\ 1-\frac{!m}{m!}&=1-\frac{1}{e} \approx 0.632 \end{align}

So this far, I have not said much more than in Maartens' answer, I just framed it a little bit different.

At this point it's time to let our $n$ increase. I'll reason inductively. We already established, that $n<m$ holds. For notation purposes let's denote the $n$-th repetition of $H$ as $H^{n}$, meaning $H^{n}(x)=\underbrace{H(H(...x))}_{\text{n times}}$.

Observe, that the first hash not producing a derangement leaves only around $37\%$ of a chance open for the remaining cases to occur. Since those cases are disjoint you can add their corresponding probability.

Case 2:
Now, if the first hash did produce a derangement, what is the probability of the second hash not producing one as well?
As argued before, we get

$P(H^{2}\text{ does not produce a derangement})= P(H \text{ does not produce a derangement}) \\ + P(H \text{ does not produce a derangement})\cdot P(H^{2} \text{ produces a derangement}))=\\(1-\frac{1}{e}) + (1-\frac{1}{e})\cdot \frac{1}{e}= 1-\frac{1}{e^{2}}\approx 0.8647.$

I should mention, that $H^2$ is just another permutation on $M$, meaning that the probability for it to produce a derangement is exactly the same as for $H$.

That much for the start of my inductive reasoning. In the next steps $H^3, H^4 \dots H^{m-1}$ the probablities have to be added in a similar way. This yields the formula

$P(H^{m-1} \text{does not produce a derangement})= \sum_{i=1}^{m-1} \frac{1}{e^{i-1}} - \frac{1}{e^{i}}$. The last telescoping series evaluates to $1-e^{1-m}$.

The result also mirrors the intuition in the way that not producing any derangement becames less and less likely as we continue to hash our values, thus at some point it is nearly guaranteed to get some hash value back to the input, with the case of $m$ iterations being covered seperately as cycling back guaranteed.


Now, as promised, the argument why $n<m$:
First of all, note that $n$ can't be equal to or greater than $m$. The reason: Let $j$ be any element in $M$. If $\underbrace{H(H(...H(j))}_{m-1 \ times}$ would not have produced a cycle back to any previously hashed value yet, it would have to do so in the next iteration, since all other $m-1$ elements in $M$ were already reached exactly once. This means, that any Hash repeated $m$ times is guaranteed to cycle back to the input.

Korosensei
  • 56
  • 3
  • One may want to note that $1-e^{-2}$ seems to be an upper bound for the probability of a 2-cycle and $1-e^{-1}$ to be a lower bound, due to most hash functions actually not being permutations / surjective. Even though this answer can pretty much directly be applied to actual permutations like AES though :) Also, welcome to Crypto.SE korosensei and an impressive first answer at that, I hope you stay around. – SEJPM Sep 26 '18 at 21:45
  • This is a fantastic breakdown of the problem! And I also thank @Maarten Bodewes for taking the time to answer. One question I have is whether in your tl;dr summary means that there's a 0.8647 probability of a hash palindrome on n = 1, or when n >= 1. – John Ruiz Sep 28 '18 at 04:27
  • I also want to mention that while I am marking this as the answer (thus giving @Korosensei credit for their work), I am not qualified to check the answer's correctness. Caveat Emptor – John Ruiz Sep 28 '18 at 04:29
3

Generally speaking, a cryptographic hash function should output close to random. This means that the algorithm that makes up the hash could have the stated property. Actually, the chance of it happening is identical to hash(H) having a predefined result, say R (randomly chosen before the experiment). Another form of this property is a hash having a fixed point, as mikeazo suggests in the comments.

The chance of hitting the result is therefore $2^{hlen}$. Of course, given the formula there are only $2^{hlen}$ possible inputs. As $H(X) = H(Y)$ may be present as well, the chance that hash(I) = H and hash(H) = I will have the right property is the 1 minus the chance that it is not there, for all $2^{hlen}$ input messages: $$1 - \bigg({{2^{hlen} - 1} \over {2^{hlen}}}\bigg)^{2^{hlen}}$$ or which comes down to around $1 - 0.3678 = 0.632$ - that's more than 50% at least.

Just like that it is computationally infeasible to find $M$ so that $H(M) = Y$, it is impossible to prove that an unbroken cryptographically secure hash has this property though: proving that is as complex as finding a normal pre-image for the hash function.

Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313