I have a database table that needs to be shared between multiple parties. There is a sensitive column than must be obfuscated in a way that preserves equality relationships between records. A trusted party must have the ability to reverse the obfuscation. Is there a composition of cryptography primitives that satisfies these requirements?
By “equality relationship” I mean that for any two records A
and B
the obfuscation function, F
, should respect F(A) = F(B)
if A = B
. A cryptographic hash function, such as SHA2, preserves equality but is not reversible. We need some form of encryption.
My first thought was to use a cascade of asymmetric encryption schemes and encrypt records with the trusted party's public key followed by my private key 1. I tried using a modern implementation of RSA which implicitly applied OAEP and thus the cipher text became non-deterministic and we lost equality relationships. My understanding is that RSA without OAEP is inherently vulnerable, but my knowledge isn’t deep enough to understand why.
My reading of some entry-level encryption literature suggests that the goal of OAEP is to make the system “semantically secure” which means that no information about plaintext messages can be derived from encrypted messages, including equality. I understand why this property is desirable in most context, but in our application we specifically want to preserve equality (and only equality).
Are there other designs that result in encryptions that are only reversible by a trusted party but the equality relationships are preserved across encrypted messages?
1 My understanding is that this mitigates the risk of a rainbow attack assuming both private keys are not compromised because chosen plaintext cannot be encrypted through both levels of the cascade without the a private key.
EDIT: It has been requested that I expand on the use case to help better motivate the need for deterministic encryption. We are attempting to create an decentralized (and open source) tokenization scheme for obfuscating patient PII in health data prior to disclosing data with third parties. This is part of a larger suite of privacy tools focused on data sharing.
Here are a couple instances of similar technologies that are commercialized. To my knowledge, all are proprietary and don't allow the organizations that are sharing tokens to manage their own public/private encryption keys.