Is this method of deterministically using CBC secure?

Question

I'm trying to design a deterministic encryption scheme to enable searching in untrusted databases with the smallest possible IV overhead on ciphertexts. I know it's very bad practice but unfortunately it's a business requirement, so I'm just trying to offer the best security I can under the circumstances.

Right now I'm thinking about a way of using AES-CBC deterministically. My idea is based on a scheme presented in Deterministic and Efficiently Searchable Encryption by Bellare, Boldyreva and O'Neill.

Given a plaintext $x$ that we need to encrypt deterministically, we first compute $MAC_{k_1}(x)$ where $k_1$ is a key used only in this step. Next, we take the bottom four bytes of the computed MAC, call them $y$, and compute $H(y)$ where $H$ is a public hash function (e.g. SHA-256). Truncate this computed hash and use it as an IV for CBC mode. The ciphertext is $y||CBC_{k_2, H(y)}(x)$. ($k_2$ is our encryption key)

This may seem convoluted (and it is) but I'm trying to think of a way to get around the equality leakage when a predicable IV is used with CBC. With a fixed IV, different plaintexts with the same first block will encrypt to ciphertexts with identical first blocks. With the above scheme, the only way for the first block of two different plaintexts to be equal is if their computed MAC values have the same four low-order bytes, which we expect to happen only after $2^{16}$ encryptions.

We can of course decrease the likelihood of a collision by using more than four bytes of the MAC, but for length-restricted environments four bytes might be acceptably insecure.

I'm wary of this construction because of how kludgy it seems - if there's an obvious flaw please don't hesitate to point it out.

Thanks for your help, all.

I'm posting this as a comment rather than an answer because I'm not sure I understand the security requirements. The MAC-then-hash part aims at deterministically computing a pseudorandom nonce for every message. This will do that (and it correctly avoids length-extension attacks on the hash that may enable attacks against the encryption), but a cleaner (IMO) approach is to compute the nonce by first applying the hash function and then AES. Both constructions give a "variable input-length PRF," which is the correct theoretical primitive for this job. — David Cash, Aug 09 '13 at 01:48

score 4 · Accepted Answer · edited Oct 07 '21 at 06:47

Your construction seems quite similar to SIV mode, except that you're using CBC rather than CTR mode for the encryption step and that you're truncating the MAC value to only 32 bits (and re-expanding it by hashing to derive the actual IV).

The only obvious weakness I see in your scheme is the shortness of $y$ and the consequent small IV space. I'd prefer a 128-bit $y$, but even just 64 or 48 bits would at least be a considerable improvement over 32. With a 32-bit $y$, not only are collisions likely after about $2^{16}$ encryptions, but with about $2^{32}$ encryption oracle queries an attacker could even arrange for a chosen message prefix to be encrypted with any possible IV, which would allow them to carry out block-by-block dictionary attacks. Depending on your implementation setting, that many oracle queries may or may not be feasible, but it's a bit too close to a practical attack for me to feel comfortable about.

(Also note that this attack can be easily carried out against multiple target messages simultaneously with little if any extra effort. In fact, if the attacker only needs to decode one message out of several, having multiple targets will speed it up.)

Other than that, your construction looks OK to me. Note that you don't necessarily need to use a hash function for the IV expansion step: simply padding $y$ to a full cipher block and applying the block cipher encryption function would do. This method of deriving CBC IVs from a nonce is even explicitly endorsed by NIST SP 800-38A in appendix C.

Is this method of deterministically using CBC secure?

1 Answers1

Linked