-1

If i understand correctly, hashes run a defined set of operations iteratively until the original data is all hashed.

Can you take a hash value, run it backward through the last step (or once, for an approprately sized starting block), and get a pool of possible starting states?

If you run that pool of states through the next iteration of the hash algorithm in reverse, getting an even larger pool of states, will one of them be the correct input state in the originally run hash at that point in the forward algorithm?

If so, how many steps becomes prohibitive (too expensive) to continue reconstructing a hash original data input? Is it easy to calculate the size of the pool after each reverse state?

Forgive me if i misunderstand something. I've only watched a couple YouTube explainers on hashing (by hand), and not everything was spelled out.

1 Answers1

2

Yes, you could in principle do something like this, but

how many steps becomes prohibitive (too expensive) to continue reconstructing a hash original data input?

will be a quite small number. I'll discuss the details for a particularly simple/common type of hash, with the caveat that there are many other types.


A compression function is a function

$$h : \{0,1\}^n\times \{0,1\}^n\to\{0,1\}^n$$

e.g. a "hash" that maps a $2n$ bit input to an $n$ bit output. A common design paradim for hash functions is to

  1. specify a compression function, and
  2. use a generic technique to extend the compression function from $2n$ bit inputs to arbitrary length inputs.

An example of this is the MD transform. This proceeds as follows

  1. Start with a message $m$ of some length

  2. Apply a padding scheme $\mathsf{pad}(m)$, which (among other things) makes $\mathsf{pad}(m)$ an element of $(\{0,1\}^n)^k$ for some $k\in\mathbb{N}$, e.g. now $\mathsf{pad}(m)$ can be written as $k$ $n$-bit "blocks"

  3. Chain the computation of the hash. For (fixed) initialization values $\mathsf{iv}_0, \mathsf{iv}_1$, define $X_{i-1} := h(\mathsf{iv}_0, \mathsf{iv}_1)$, and then $X_i = h(X_{i-1}, \mathsf{pad}(m)_i)$, where $\mathsf{pad}(m)_i$ is the $i$th block of $\mathsf{pad}(m)$.

  4. Return $X_{k-1}$

This takes roughly the form you describe, where we can attempt to "work backwards" as you mention. Say we have a hash value $X_?$. This comes from some compression function call --- how many values $X_{?-1}$, $\mathsf{pad}(m)_{?}$ do we expect there to be that map to $X_?$?

Under appropriate assumptions on $h$ (that it is "regular", in the sense that $|h^{-1}(x)|$ is constant, independently of $x$), we expect there to be $2^n$ inputs consistent with the hash. For reasonably-designed hashes (e.g. SHA2 has $n\geq 256$ iirc) this is already much too large to mount your type of attack. One might be able to cut down this number some (for example, there are $\approx 2^{n}$ pairs of $(X_0, X_1)$ such that $h(X_0, X_1) = X_?$. We know one of the $X_i$ must be of the form $\mathsf{pad}(m)_{?-1}$ --- this is an additional constraint), but still one will quickly get the set of possible collisions will blow up tremendously.

Mark Schultz-Wu
  • 12,944
  • 19
  • 41
  • I will need to go ro school to understand most of that. The important part is that it is prohibitive after one iteration. Thank you. – Zekchelovek Nov 29 '23 at 01:19
  • Further question: for a set hash output value, say 256 bits, if we were to hash all sequencial numbers up to a number equal to the max value hash output plus 1, how many collisions should we expect overall? Is it calculable? – Zekchelovek Nov 29 '23 at 01:23
  • @Zekchelovek it's best practice to ask further questions as new questions rather than discussing them in the comments of an old question. – Mark Schultz-Wu Nov 29 '23 at 01:26
  • Ok. Didn't know that. Thanks – Zekchelovek Nov 29 '23 at 01:27
  • A moderator told me posting too many questions is not good. I was told to clarify in comments. I assumed that meant follow up questions. Trying to avoid orphan posts. I did post new question per your suggestion. – Zekchelovek Nov 29 '23 at 01:33
  • @Zekchelovek I didn't notice you had posted ~10 questions today. Perhaps you should slow down this number some --- the forum is small, and if one user posts that many times it will become the entirety of the activity feed. I would try to limit myself to maybe ~1 or 2 questions per week if I were you, and use the chat rooms for "smaller" questions. – Mark Schultz-Wu Nov 29 '23 at 01:51
  • Here is a link to the crypto stackexchange chat rooms. – Mark Schultz-Wu Nov 29 '23 at 01:52
  • I will. I likely won't post again for weeks or months. I apologize. I'll bookmark the chatroom for future multiple question days. – Zekchelovek Nov 29 '23 at 01:59