Forgive me if this is a noob question - My CS education is a somewhat incomplete
Basically, I need a way to hash an input, so that someone seeing the output doesn't see the original input value. Security is not an issue, and I need to be able to make this unique across about 1 million inputs.
Originally I was using a hashcode() just fine, but when I started getting to around 1 million inputs, I had very high collision.
Can someone help me conceptually understand where I should look to approach this problem? Should I be trying to write my own function to do this? I thought I could just find a better hash function, like hashcode(), but I only find stuff about writing my own.
hashCode()
is not guaranteed to be unique. The built-in function relies on memory addresses, but on 64 bit systems has to lose data to fit into 32 bits which can result in collisions. For customhashCode()
implementations (e.g.java.lang.String
), you will have collisions. As ratchetfreak mentioned, 32 bits is not enough output for your purpose. Delnan has the right idea: use a better-quality hashing algorithm with many more bits in the output. Related reading: What is a good 64bit hash function in Java for textual strings? – Oct 27 '14 at 21:36