I have a problem that I'll simplify here. Let's say I want to store a table of {person, fav color}. Only, I don't want anyone to ever be able to look up a person't favorite color; the only thing someone should be able to do is group people if their favorite color is exactly the same (not just merely similar). The obvious solution would be to not store the favorite color, but {person, hash(fav color)} using a good crypto hash(). Only, turns out that there is a very small list of color names, and anyone can easily enumerate all the colors, and compute the hash thereby reversing the hash and extracting the fav color for a person. I can stretch, but that makes computing hash() slow -- it needs to be a fast operation.
Basically, I have input x chosen from a small, easily enumerated space. I want a fast-to-compute function H
that will guarantee x==y <=> H(x)==H(y)
(with high probability). But knowing H(x)
does not allow one to easily compute x
.
I'm creating a secret key, and using H(x)
as Hmac-Sha256(x, key)
.
Does this work? Is there a better, more accepted way of doing this?
In any case, I'd say that your proposed solution is valid, albeit it has the problem that anyone with knowledge of the keys could do the enumeration. So, who is expected to perform the comparison? Only the database administrator (or something similar)? Anyone? Some restricted set of users?
You could probably use some zero-knowledge based construction, through which you would probably achieve higher privacy, but at the cost of much lower efficiency.
– Ginswich Aug 24 '18 at 14:47