Which encryption technique should be adopted to keep the search functionalities?

Question

I am in exactly the same situation as described in this post. I must be able to encrypt data from a client workstation then send it to an untrusted server (encrypted at rest), then decrypt it only on the same client workstation. However, the client must be able to continue to do "encrypted" searches based on the ciphertext.

AES CBC seems to need a unique IV for each encryption, so I can't use this technique. And all others, CGM, ... and even the asymmetric RSA encryption have the same behavior with a padding which means that the ciphertext is never the same for the same plain text ...

It seems that only AES ECB can produce the same ciphertext. The important point in my case is that the data that I have to encrypt is all in the format string max 1000 char and also that I have to protect this data from the hoster (at rest), there is no risk the In Transit and the data are not exhibited on the internet. I don't know if I can relay on ECB...

So I ask the question again because 5 years have passed since this post, there may be other options? Can you advise me which technique to use being quite secure and ensuring the non-brute forcing and with which I can continue to search (on ciphertext). Can I use AES CBC with the same IV? Can I use AES ECB (256)? Others?

Thank you !

First, note that How can frequency analysis be applied to modern ciphers?. Then See CryptDB. Then, consider that your data is really not exhibited or not. Also, see the Netflix attack. — kelalaka, Feb 21 '21 at 09:43
Note cross-posted with information Security. This is not preferable, see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? — kelalaka, Feb 21 '21 at 11:49
Does this answer your question? Is it possible to match encrypted documents using user-defined search terms? — mentallurg, Feb 21 '21 at 14:45
@mentallurg not really. My question is even more simple. I would like to do only Exact Match search and I would like to use an algo that give as the same cypher text for a given plaintext. So I looking for some "validations" on the fact to use AES ECB that seems to be the only one able to provide this behaviour... AES CBC and other use padding and random IV, ... I looking for something "standard" and not to complicated to implement because I'm not a crypto expert but just a dev. Also, I would like to validate this model according my use case (string(1000) and non trust server). Thx — Marc Alves, Feb 21 '21 at 16:27
It is cross-posted, it has an answer in information security. We don't need an answer here, too. — kelalaka, Feb 21 '21 at 18:20
@kelalaka: I don't agree. The topic of this question is much more related to Crypto SE than to Information Security SE. If you find that answer on IS relevant, vote to migrate it to Crypto SE. Then close this question. — mentallurg, Feb 21 '21 at 20:40
@MarcAlves: 1) "not to complicated to implement because I'm not a crypto expert" - this makes an impression that you think that cryptographers design complex algorithms just because they like complexity, and developers can find much simpler cryptographical algorithms with the same same strengths :-) The algorithms are as they are. The answers that I linked address your goal. 2) You are free to implement what you want. But you should keep in mind that very probably your approach will have huge security weaknesses that attackers will use to decrypt your data. — mentallurg, Feb 21 '21 at 20:47
@mentallurg Answering this question correctly only possible if the data is exactly known. There are many pitfalls as the designers of the CryptDB did and/or did not aware. To be able to search, the OP needs to tell what is the queryand what is the data? Then we can talk about splitting accoding to query and try to minimize the leakage. Even the client access can leak lots of information to the observer. — kelalaka, Feb 21 '21 at 20:56
@mentallurg Just suggesting, split the words, eliminate the duplicates, then choose a maximum word size, pad the data to the max size, then encrypt the padded words with ECB and store in the table can be a solution only for word search. But, what are those search functionalities? Even the untrusted server is vague, is it semi-honest or malicious? Does the integrity, authenticity, and freshness of the data important? So, if the OP wants a good answer, they need to clarify the case. — kelalaka, Feb 21 '21 at 21:04
@kelalaka: Sorry that my comment was unclear. I meant only, that your suggestion to answer it on IS site instead of answering here is not quite good. I fully agree with your 2 last comments :-) — mentallurg, Feb 21 '21 at 22:02
@mentallurg no sorries, I've got your point and went further to show what the question needs to be clear about. — kelalaka, Feb 21 '21 at 22:08
@mentallurg if I ask your help it's clear that I dont feel than dev can do better than Crypto... And I'm afraid to do something weak. Now it's clear between us. To answer to your questions : Server is semi-honest. I need a full match search from string to string. Means if I'm looking for "Hello World" I should get the cell values that match to "Hello World". So the case is: user will encrypt data from the client then send this cypher data to the server for storage. Then user will run a query where the search term can only apply equal operator, and the idea is to search on the cypher data. — Marc Alves, Feb 22 '21 at 10:57
@kelalaka: And this one : 1. Encrypt data with any strong algo with IV, ... 2. create another dedicated column for this data, and in the same time of the encryption, hash the data with HMAC-SHA256 (use the same key). store the hash value in this column 3. Perform the search against this new column by hashing the search term with HMAC-SHA256 and check the equality (instead of doing the search with the cypher text). With this design I feel that the data will be strongly encrypted with AES and the search is done thanks to the Hash (by definition hash is one way process...) What do you think ? — Marc Alves, Feb 22 '21 at 11:08

score 0 · Answer 1 · answered Feb 21 '21 at 23:11

There are two possibilities related to your "search" (partial or exact) and you do not specify which you're using.

Let us imagine the plaintext is ATTACK AT DAWN. Let us suppose that the encryption translates this to QWWQTHXQWXBQNO: each letter is encoded as a different letter regardless of its position (so, A is always encrypted as Q and so on). However obtained, this is a Caesar cipher and is weak. But it allows you to simply search for DAWN because it will always be BQNO.

A different cipher scheme (for example the Vernam cipher) would encode each letter as a different letter depending on its position. So to be sure to find DAWN inside a 1000-character string, you would need to look for 997 different strings - DAWN encoded starting at 0, 1, 2,... 996. This is more secure, and even completely secure if you encode one string. If you encode lots of different records, then this devolves into a Vigenère cipher, which can be attacked. The more the records, the easier the attack.

AES would encode each letter into a different letter depending on its position and what the other letters are. This makes it impossible to look for DAWN because you would need to look for a jillion possible sequences (the encrypted version of DAWN in ATTACK AT DAWN would be be different from the DAWN of RETIRE AT DAWN, even if the word starts at the tenth position in each case).

So, use of a strong cipher means you cannot do partial matches but only full matches.

You can very slightly ameliorate this situation by encoding different, shorter chunks and only looking for matches wider than twice a single chunk. With chunks of 2 characters [only an example, AES won't (efficiently) do that], AW will always be encoded as, say, QR. If you look for DAWN, the plaintext string, once divided in 2-char chunks, will either contain ?D AW N? or DA WN. So you look for DA, AW and WN; if you find the encrypted form of AW, either you retrieve the record, decrypt, and do a plaintext search for confirmation, or you need to also look for additional 625 combinations (from AD AW NA to ZD AW NZ). With bytes, that's actually 65025 combinations. And that's with 2-byte chunks; with N byte chunks you'll need (N-1)^2 combinations, which quickly becomes prohibitive.

And, of course, using short chunks over many records weakens the encryption. In general, the ability of using (arbitrarily small) matches leaks information.

different approach #1

You can perhaps filter the records, without breaking the encryption, and restrict the returned records to a reasonable number. Then, have them transmitted in the encrypted form, and decrypted on the client, and searched again for the string. This allows you zero-trusting the server; the plaintext never exists in the server in any form. You could use a stronger encryption on the records, since you're not going to seach the encrypted returned records and you have to decrypt them all anyway.

different approach #2

With some databases (e.g. MySQL), you can have memory storage transient tablespace, or RAM disks, where the information can be stored in plaintext form.

You can decrypt all the information upon user login and store it in a RAM disk or tablespace, thereby rendering it searchable.

When the system is powered off or brought to rest, that information is lost, unless specific physical attacks are carried out (so-called "cold boot" or "cryo" attacks, involving the rapid refrigeration of RAM chips to preserve their content).

This leaves you vulnerable to the host system being subverted, but if the client's code is sent by the host system (as it usually happens in Web applications), a host subversion will be enough to break the encryption by simply accessing the user's credentials on the client side.

You do have, therefore, to (limitedly) "trust" the server, albeit not at rest.

If the server is just data storage, and the client code is not hosted there, then you can't go this way without implicitly adding trust in the server security.

Which encryption technique should be adopted to keep the search functionalities?

1 Answers1

different approach #1

different approach #2