- Does this design provide deterministic authenticated encryption?
This provides reasonable security as long as you limit the total volume of data encrypted to well below $2^{64}$ bytes.
Details.
The SIV theorem[1], roughly, is that if $F_{k_1}$ is a $t$-bit PRF, with PRF distinguisher advantage bounded by $\varepsilon_F$, and if $E_{k_2}$ is a randomized cipher, with CPA distinguisher advantage bounded by $\varepsilon_E$, then $$m \mapsto F_{k_1}(m) \mathbin\| E_{k_2}(F_{k_1}(m), m)$$ with decryption $$t \mathbin\| c \mapsto \begin{cases} m, & \text{if $F_{k_1}(m) = t$;} \\ \bot, & \text{otherwise,} \end{cases} \quad \text{where $m = E_{k_2}^{-1}(t, c)$,}$$ is a deterministic authenticated cipher with distinguisher/forgery advantage bounded by $\varepsilon_F + \varepsilon_E + q^2/2^t$ for any adversary making at most $q$ queries to $E$ or $F$.
- Under standard conjectures about SHA-512, the PRF advantage of a $q$-query cost-limited adversary against a 128-bit truncation of HMAC-SHA512 with a 512-bit key is bounded by about $q/2^{128}$.
- Under standard conjectures about AES, the IND-CPA advantage of a $q$-query cost-limited adversary against AES-OFB with a 256-bit key and messages up to $\ell$ blocks long is bounded by
- $q^2 \ell^2/2^{128}$, if the IVs are unique, and therefore
- $q^2/2^{128} + q^2 \ell^2/2^{128}$, if the IVs are chosen uniformly at random, as SIV hypothesizes.
Summing up, the adversary's advantage is bounded by about $$\frac{q}{2^{128}} + \frac{q^2 \ell^2}{2^{128}} + \frac{q^2}{2^{128}} + \frac{q^2}{2^{128}} = \frac{q + 2 q^2 + q^2 \ell^2}{2^{128}}.$$
I say ‘about’, because this excludes the probability of guessing $k_1$ and $k_2$, which, if they are 256 bits, is minuscule in comparison even in a multi-target attack on $2^{64}$ users; and this excludes the probability of an internal collision in HMAC-SHA512, which is also minuscule in comparison. The dominating term $q^2 \ell^2$ is the total number of blocks encrypted. Hence: keep the total volume of data encrypted well below $2^{64}$ bytes. A limit of a few gigabytes should be fine.
The real SIV, as presented in the paper and standardized with CTR mode, also handles associated data with a custom tuple PRF construction, instantiated with AES. The custom tuple PRF doesn't matter if you don't have associated data, and the instantiation with a PRP like AES instead of a PRF like HMAC actually hurts security a little (by the standard PRP/PRF switching lemma).
- Is there a better design that would allow searching on the encrypted data?
This looks to me like the easiest secure option with the tools you have at hand.
You could derive a fresh AES key for each message, and cut it down from 64 bytes of key material to 32 bytes of key material, but there's a cost—switching AES keys is more expensive than switching AES-OFB IVs, and you'd have to call HMAC-SHA512 twice or something—and there's no benefit to security unless you expand the tag from 128 bits to 192 or 256 bits.
You didn't ask, but you perhaps should have asked:
- Are there any implementation pitfalls to be aware of?
- Make sure that you do not touch the unauthenticated plaintext except to compute the authenticator until after you have verified it. Unauthenticated data is pure evil—don't touch it!
- Make sure that you don't use PKCS7 padding or anything. Make sure that the AES-OFB decryption cannot fail, because failure of the decryption distinct from failure of the authentication might leak information about the plaintext in a padding oracle attack like BEAST.
- Make sure to write test vectors in another language to cross-check and perform self-tests.
Finally, encrypted database search is a goldmine for attackers even if they can't see the content of the encrypted field[2] (popular exposition). Caveat data hoarder!