How can a hash have a fixed length?

Question

I am very new to cryptography, I apologize if my question is so basic.

I do not understand how a hash can have a fixed length, because you can have infinite inputs to a hash function. With a fixed length you would have limited possibilities of hash outputs. So a hash can have different messages with the exact same output?

It seems like you've asked and then abandoned this question. If one of the answers suffices then please accept one of them; otherwise please followup with a comment. — Maarten Bodewes, May 23 '19 at 11:55

score 8 · Answer 1 · answered May 12 '19 at 19:36

8

So a hash can have different messages with the exact same output?

Yes; that's true of any function with inputs longer than the output.

What collision resistance of a hash function doesn't mean that there aren't two messages $X$ and $Y$ that hash to the same value; instead, it means that it's hard to find them.

answered May 12 '19 at 19:36

poncho

147,019
11
229
360

1

I like your very short take on this answer. I'd reword slightly the part about inputs/outputs, maybe along the line of "number of possible inputs is bigger than number of possible outputs" (surjective function? not sure). – domen May 13 '19 at 09:20
1

@domen there's already the name pigeon hole principle for this concept. – Natanael May 13 '19 at 10:34
@Natanael, I know. I agree, it would make it clearer. – domen May 13 '19 at 10:54

score 2 · Answer 2 · answered Jun 04 '19 at 01:57

To understand the mechanism, you could refer to the one of the simplest and dumb hashing method that everyone knows, the arithmetic $modulo$ operation.

Different-length input numbers produce the same-length results through $mod$.

For example:

if we want only $8$-bit output values, we choose $2^{8}$ as modulo divisor
we could simply choose $4$ input numbers of different lengths ( 8, 16, 24, 32 bits), that will produce the same output value (aka $4$ collisions):
- $(2^{32}-1) \equiv 255\ (\textrm{mod}\ 256) $
- $(2^{24}-1) \equiv 255\ (\textrm{mod}\ 256) $
- $(2^{16}-1) \equiv 255\ (\textrm{mod}\ 256) $
- $(2^{8}-1) \equiv 255\ (\textrm{mod}\ 256) $
Obviously for cryptographic hash like SHA256/512 it's quite hard to find 2 inputs that produce the same output.
take also a look to the pigeon's principle

score 1 · Accepted Answer · edited Jun 17 '20 at 08:17

When we talked about a cryptographic hash function, we want them to process arbitrary length inputs and fixed outputs.

\begin{align} H:&\{0,1\}^*\to \{0,1\}^\ell\\ m&\mapsto H(m) \end{align} where the $\ell$ is the size of the hash function.

Hash functions based on Merkle–Damgård construction^† use a compression function $f$ to achieve the fixed size output;

The message $M$ is padded and divided into $\ell$ length blocks $M_1,\ldots,M_n$
$H_0$ is set to initial values;
From $1$ to $n$ $$H_i = f(H_{i-1},M_i)$$
Output $H(M) = H_n$

A changeable output size according to input size is hard to build and is not feasible for protocols. Building a hash function with a larger output then truncating is much safer as in SHA-224 ( note: the parameters are different then SHA-256)

By the simple combinatorial argument, since the input space is much larger than the output space, the pigeonhole principle implies that there will be more than one input value maps to the same hash value. Indeed, with the arbitrary size of the input, there will be numerous inputs will hash the same value. As pointed in poncho's answer, we want collision-resistance for hash functions;

If finding two inputs that hash to the same output $a$ and $b$ such that $H(a)= H(b)$, $a \neq b$ is hard then we have collision resistance.

Collision resistance considered an easier security goal to achieve, Joux 2004. There is a generic attack by birthday paradox that after $2^{\ell/2}$ hash calculations we expect a collision with 50%. To have resistance to generic birthday attacks, one has to use a hash function double size of the threat. As an example, the SHA-1 output size is 160-bit with 80-bit generic birthday attack it is no longer recommended by NIST;

SHA-1: Federal agencies should stop using SHA-1 for generating digital signatures, generating time stamps and for other applications that require collision resistance. Federal agencies may use SHA-1 for the following applications: verifying old digital signatures and time stamps, generating and verifying hash-based message authentication codes (HMACs), key derivation functions (KDFs), and random bit/number generation. Further guidance on the use of SHA-1 is provided in SP 800-131A.

August 5, 2015

For other properties of hash functions, see How do hashes really ensure uniqueness?.

_{^† SHA-3 uses Sponge function.}

How can a hash have a fixed length?

3 Answers3