1

Please give help! how can I calculate the probability of collision? I need a mathematical equation for my studying.

Assume, I am using SHA256 to hash 100-bits. Thus:

SHA256 {100} = 256-bits (hash values) I would like to know the probability of collision.

This website: https://preshing.com/20110504/hash-collision-probabilities/

Gave an equation: $K^2 / 2N$

But I could not know what is $K$ and $N$? Is K is the output which is (256-bits) and $N$ is the input which is (100-bits)

Please Help and Thanks in advance...

kelalaka
  • 48,443
  • 11
  • 116
  • 196
Al-Ani
  • 95
  • 1
  • 7
  • 2
    Umm... did you read the entire web page you linked to? It defines $k$ and $N$ very close to the top. – Ilmari Karonen Dec 08 '18 at 15:26
  • well, then based on the equation: 100^2 / 2^256 right? – Al-Ani Dec 08 '18 at 15:34
  • @Al-Ani: No. For one, the question is about "using SHA256 to hash 100-bits", which would be a single hash, and we need at least two get a collision. If we change that to hashing 100 random large bitstrings, then $100^2/2^{256}$ is still off by a factor two from the approximation given, which itself is about 1% off. – fgrieu Nov 23 '22 at 11:33

1 Answers1

6

$n$ is the output size of the given hash function. To find a collision you try randomly generated $k$ different inputs. When we say the output size is $n$, it means that the output space has $2^n$ elements.

What you see on that website is the general case of collision probability. We normally talk about the 50% probability (birthday attack) on the hash collisions as

$$ k = \sqrt{2^n}$$ You can also see the general result from the birthday paradox.

To have a birthday attack with a 50% percentage you will need $k = 2^{128} \approx 4.0 × 10^{38}$ randomly generated differently input for a hash function with output size $n= 256$


Dear readers, I've only answered the question, however, it seems that it gets so much attention. We have a 101 for hash collisions answer, please refer to that for more detail about probability calculations;

kelalaka
  • 48,443
  • 11
  • 116
  • 196
  • 1
    Is it possible for an $n$-bit cryptographically secure hashing function to have collisions for less than $2^n$ many inputs? Or would it contradict the definition of "cryptographically secure hash"? – caveman Oct 15 '22 at 06:07
  • 2
    @caveman First of all, Collision is inevitable by pigeonhole principle. Once you get $2^n+1$ output you will find at least one collision. This is %100. When we talk about collision with smaller quantity we need probability and this probability is given by the birthday attack calculations. This is inevitable result of the birthday attack. For secure hash functions we expect that they have collision resistance that close to the birthday bound. If one find easier that than the birthday attack, then the hash function has not collision resistance like MD-5 and SHA-1 – kelalaka Oct 15 '22 at 10:49
  • 1
    @caveman instead of writing more, I thought it is better to link to our 101. The link is in the answer. – kelalaka Nov 23 '22 at 11:05