I'm trying to design a deterministic encryption scheme to enable searching in untrusted databases with the smallest possible IV overhead on ciphertexts. I know it's very bad practice but unfortunately it's a business requirement, so I'm just trying to offer the best security I can under the circumstances.
Right now I'm thinking about a way of using AES-CBC deterministically. My idea is based on a scheme presented in Deterministic and Efficiently Searchable Encryption by Bellare, Boldyreva and O'Neill.
Given a plaintext $x$ that we need to encrypt deterministically, we first compute $MAC_{k_1}(x)$ where $k_1$ is a key used only in this step. Next, we take the bottom four bytes of the computed MAC, call them $y$, and compute $H(y)$ where $H$ is a public hash function (e.g. SHA-256). Truncate this computed hash and use it as an IV for CBC mode. The ciphertext is $y||CBC_{k_2, H(y)}(x)$. ($k_2$ is our encryption key)
This may seem convoluted (and it is) but I'm trying to think of a way to get around the equality leakage when a predicable IV is used with CBC. With a fixed IV, different plaintexts with the same first block will encrypt to ciphertexts with identical first blocks. With the above scheme, the only way for the first block of two different plaintexts to be equal is if their computed MAC values have the same four low-order bytes, which we expect to happen only after $2^{16}$ encryptions.
We can of course decrease the likelihood of a collision by using more than four bytes of the MAC, but for length-restricted environments four bytes might be acceptably insecure.
I'm wary of this construction because of how kludgy it seems - if there's an obvious flaw please don't hesitate to point it out.
Thanks for your help, all.