-3

I'm searching an existing hash function or trying to make a hash function that has a lot of collisions.

Regularly a hash is used for it's ability to create unique hashes for hash tables or security purposes, but I desire the opposite. I desire a hash function that has allot of collisions. Preferably a simplistic hash function so finding a collision is easier/faster.

By the definition of a hash function. The hashes also have a predetermined length. I'd like a hash function that has this as a variable.

I'm new to this subject so I'm searching sources to create a hash function myself or candidate's that might fulfill the requirements.

If I have definition or terms wrong, please correct me.

2 Answers2

1

The probability of collisions is ratio between how many items you have and what size the output hash is.

If you have 1024 items (10 bits) and hash outputs 8 bits, you would expect 4 items (2^(10-8)) to have same hash and thus 4 collisions.

And there are plenty of hash functions with low amount of output bits. Like a trivial Pearson Hash. And it being trivial and easy to understand, it should be easy to modify it to have variable bit output size.

Euphoric
  • 37,384
  • To clarify, it will be (on average) 4 items for each of the 256 possible hash values, not just 4 collisions total. – 8bittree Jul 13 '20 at 16:33
1

Here is an example of a hash function that has the properties you desire:

H(message, outputLength) = 1 << outputLength

  • It is clearly a hash function, since it maps a larger input space to a smaller output space.
  • It has the maximum possible performance for every possible sequence of inputs.
  • It has the maximum possible amount of collisions for every possible sequence of inputs.
Jörg W Mittag
  • 103,514