Deterministic Authenticated Encryption with AES-OFB and HMAC

Question

I have an encrypted column in a database that stores sensitive information. The existing encryption scheme is deterministic and a database index exists on the encrypted value to allow searching.

I am upgrading the encryption scheme to use authenticated encryption, so the obvious choice seems to be AES-SIV. However, the data is accessed both from a Java app and from PL/SQL stored procedures so I am limited to the cryptographic primitives supported by Oracle's DBMS_CRYPTO package (SIV is not supported).

I read several questions on this site, mainly Why is synthetic IV (SIV) mode considered deterministic authenticated encryption (DAE)? and Is it okay to use an HMAC of the plaintext and a (possibly distinct) key as the IV for symmetric cryptography? and I have come up with the following design:

Use a 64 byte key, where there first 32 bytes are the HMAC key and the remaining 32 bytes are the AES cipher key.
Use HMAC-SHA512 on the plaintext to generate a the initialisation vector (use first 16 bytes only).
Encrypt using AES in OFB mode (since CTR mode is not available). OFB mode will preserve length and (I believe) avoid padding oracle attacks.
Concatenate the 16 byte IV and the ciphertext in the database column.

My questions are:

Does this design provide deterministic authenticated encryption?
Is there a better design that would allow searching on the encrypted data?

Regarding point 2: What kind of searching do you want to do? Please keep in mind the possible attacks described in https://blog.cryptographyengineering.com/2019/02/11/attack-of-the-week-searchable-encryption-and-the-ever-expanding-leakage-function/ and remember that allowing searches on encrypted databases could open it up to attacks. — rlee827, Mar 29 '19 at 01:45
Also are you stuck using that particular cryptography library? They still support MD5 and (3)DES, which makes me think that it is quite dated and that other libraries may be better. — rlee827, Mar 29 '19 at 01:48
@rlee827 thanks for the linked article. I only need exact-match searching so if I understand correctly, the range-based attacks in the article and associated paper don't apply. I can use any standard Oracle database feature, and I believe the DBMS_CRYPTO package is the only one offering modern (ish) crypto support. — Nathan, Mar 31 '19 at 22:12

Squeamish Ossifrage · Accepted Answer · 2019-03-31T07:25:15.203

Does this design provide deterministic authenticated encryption?

This provides reasonable security as long as you limit the total volume of data encrypted to well below $2^{64}$ bytes.

Details. The SIV theorem[1], roughly, is that if $F_{k_1}$ is a $t$-bit PRF, with PRF distinguisher advantage bounded by $\varepsilon_F$, and if $E_{k_2}$ is a randomized cipher, with CPA distinguisher advantage bounded by $\varepsilon_E$, then $$m \mapsto F_{k_1}(m) \mathbin\| E_{k_2}(F_{k_1}(m), m)$$ with decryption $$t \mathbin\| c \mapsto \begin{cases} m, & \text{if $F_{k_1}(m) = t$;} \\ \bot, & \text{otherwise,} \end{cases} \quad \text{where $m = E_{k_2}^{-1}(t, c)$,}$$ is a deterministic authenticated cipher with distinguisher/forgery advantage bounded by $\varepsilon_F + \varepsilon_E + q^2/2^t$ for any adversary making at most $q$ queries to $E$ or $F$.

Under standard conjectures about SHA-512, the PRF advantage of a $q$-query cost-limited adversary against a 128-bit truncation of HMAC-SHA512 with a 512-bit key is bounded by about $q/2^{128}$.
Under standard conjectures about AES, the IND-CPA advantage of a $q$-query cost-limited adversary against AES-OFB with a 256-bit key and messages up to $\ell$ blocks long is bounded by
- $q^2 \ell^2/2^{128}$, if the IVs are unique, and therefore
- $q^2/2^{128} + q^2 \ell^2/2^{128}$, if the IVs are chosen uniformly at random, as SIV hypothesizes.

Summing up, the adversary's advantage is bounded by about $$\frac{q}{2^{128}} + \frac{q^2 \ell^2}{2^{128}} + \frac{q^2}{2^{128}} + \frac{q^2}{2^{128}} = \frac{q + 2 q^2 + q^2 \ell^2}{2^{128}}.$$

I say ‘about’, because this excludes the probability of guessing $k_1$ and $k_2$, which, if they are 256 bits, is minuscule in comparison even in a multi-target attack on $2^{64}$ users; and this excludes the probability of an internal collision in HMAC-SHA512, which is also minuscule in comparison. The dominating term $q^2 \ell^2$ is the total number of blocks encrypted. Hence: keep the total volume of data encrypted well below $2^{64}$ bytes. A limit of a few gigabytes should be fine.

The real SIV, as presented in the paper and standardized with CTR mode, also handles associated data with a custom tuple PRF construction, instantiated with AES. The custom tuple PRF doesn't matter if you don't have associated data, and the instantiation with a PRP like AES instead of a PRF like HMAC actually hurts security a little (by the standard PRP/PRF switching lemma).

Is there a better design that would allow searching on the encrypted data?

This looks to me like the easiest secure option with the tools you have at hand.

You could derive a fresh AES key for each message, and cut it down from 64 bytes of key material to 32 bytes of key material, but there's a cost—switching AES keys is more expensive than switching AES-OFB IVs, and you'd have to call HMAC-SHA512 twice or something—and there's no benefit to security unless you expand the tag from 128 bits to 192 or 256 bits.

You didn't ask, but you perhaps should have asked:

Are there any implementation pitfalls to be aware of?

Make sure that you do not touch the unauthenticated plaintext except to compute the authenticator until after you have verified it. Unauthenticated data is pure evil—don't touch it!
Make sure that you don't use PKCS7 padding or anything. Make sure that the AES-OFB decryption cannot fail, because failure of the decryption distinct from failure of the authentication might leak information about the plaintext in a padding oracle attack like BEAST.
Make sure to write test vectors in another language to cross-check and perform self-tests.

Finally, encrypted database search is a goldmine for attackers even if they can't see the content of the encrypted field[2] (popular exposition). Caveat data hoarder!

Thanks so much for the detailed answer and especially for the answer to the un-asked question #3. I only need exact-match searching so if I understand correctly, the range-based attacks in the article and associated paper don't apply. In answer to my own qustion #2, I'm aware of using a blind index for the database searching, where the plaintext is hashed using HMAC. This seems like it would have similar security properties to the AES-OFB-HMAC encryption above, since the plaintext is HMACed in both cases. Is that correct? Or should I ask a separate question on that topic? — Nathan, Mar 31 '19 at 22:49
Concerning the index: whether you also use AES-OFB or not, you reveal plaintext equality through the HMAC. That is, your data field is (hmac, ciphertext), while your index is just hmac—if the same plaintext occurs in two rows, the same hmac will occur in the two rows. Concerning database search: it's a problem even beyond range queries, e.g. the AOL search history dump. — Squeamish Ossifrage, Apr 01 '19 at 01:20

Deterministic Authenticated Encryption with AES-OFB and HMAC

1 Answers1