Which tamper-protection algorithm provides the shortest output?

Question

Background

When making web applications you sometimes need to pass data along with the form which isn't visible to the user. Database IDs are the most common, but texts and IDs are sometimes necessary too. For this reason there is a <hidden> field in HTML, but it has a drawback - any hacker with a basic understanding of what he is doing can modify the value. It would be better if such values would be tamper-protected, if not outright encrypted.

Question

I need to pass a value through hostile territory. The value is most often an integer (and that most often less than 65'000, so mostly just two non-zero bytes, rarely three, almost never 4). However sometimes the value can also be a string (short, a few hundred characters at most). I'm OK with using different algorithms for different kinds of data.

I want to tamper-protect the value, so that it cannot be changed en-route without detecting. Hiding the data isn't necessary, but doing so would be a bonus. So either tamper-protection or encryption would work.

Whatever the algorithm, I want to have the output as short as possible without sacrificing too much of security. This is because it's a webpage and shorter data means faster response times. Also, the HTML code becomes easier to read if there aren't lengthy random strings in it.

What algorithms would you suggest for this purpose?

Note that "output as short as possible" and "faster response times" may be competing objectives. $\hspace{.45 in}$ — , May 21 '14 at 09:28

score 8 · Accepted Answer · answered May 21 '14 at 09:45

8

You are writing out data and reading it back on the same server. You want to ensure that the data that you read back is the same as the data that was written out.

For this use case, symmetric cryptography seems appropriate. Have a single symmetric key that doesn't leave the server. You need to rotate the key only if the server is compromised; this will invalidate sessions in progress, which is likely to happen for other reasons in case of a server compromise as well.

You can take add an HMAC checksum of the data; with HMAC-SHA-256 (a strong, recommended choice), this adds 64 bytes if you encode the checksum in hexadecimal or 44 if you encode it in base64 (or 32 if you include it in binary form but that tends to be hard to parse). There'll probably be a few more bytes' overhead to put the string somewhere. Given the context — a web page — I don't see any point in weakening the cryptographic strength to save a few more bytes.

If you want to encrypt the data as well, use AES-128 with an authenticated encryption mode (CCM or GCM are common choices). There will be an overhead of 16 bytes of IV, up to 16 bytes of padding, 16 bytes of MAC, and a multiplication factor due to the encoding requirement since the encrypted data can use all byte values (4/3 factor for base64). You'll need to decrypt the data before doing any processing with it; in particular, if some of that data is used on the client side, you'll either need to duplicate it or to combine authenticated encryption with a MAC of that plus the client-readable data.

Note that protecting the data in this way only authenticates it — it proves that the data was produced by your server. An attacker can still take the data from one page and send it with another page, or a later version of that page. How to avoid this depends on when the data is legitimate. You may want to include the URL or the user ID in the hashed data, for example, to avoid the data being used out of context.

answered May 21 '14 at 09:45

Gilles 'SO- stop being evil'

19,134
4
50
92

Good point about the reuse. I'll keep that in mind. And... I guess you're right about the saving a few bytes too. I just thought that a SHA-1 seems so... unsightly. :) – Vilx- May 21 '14 at 10:12
The question asks for "the shortest output"; hence the result of HMAC should be at least truncated; 64 bits seems appropriate in the context. – fgrieu May 21 '14 at 11:21
2

I'd go with truncated HMAC-SHA-2 with a length somewhere between 64 and 128 bits. Truncation is fine with HMAC, but do not truncate a GCM / GHash MAC. – CodesInChaos May 21 '14 at 11:52
@CodesInChaos - Truncating a HMAC does not severely affect it's security? – Vilx- May 21 '14 at 13:07
3

@Vilx- With HMAC the chance of accepting a forgery is $2^{-n}$ for an $n$ bit MAC. With GCM every successful forgery reveals a big part of the key whereas for HMAC the only effect is that you accept a forgery. So if you really need to go below 128 bit MACs, I recommend against GCM. Another issue with GCM is that it needs a nonce and reuse is fatal, whereas HMAC doesn't need a nonce. – CodesInChaos May 21 '14 at 13:09
If you encrypted with AES-128, would you need HMAC too? – Bohemian May 21 '14 at 15:02
@Bohemian: if you care about integrity, you need to add something that actually gives an integrity guarantee. Common encryption modes, such as CBC or CTR, do not. – poncho May 21 '14 at 15:13
This answer is not sufficient, because it doesn't provide freshness (it doesn't prevent replaying of old values). Also, in practice you probably want to encrypt by default, too, because if you make encryption optional it is too easy for there to be some value that was confidential but where you forgot to enable encryption. – D.W. May 22 '14 at 22:31
@D.W. Given the information in the question, there's a generic approach to authentication and confidentiality, but not to freshness. That's why I explain in my last paragraph that replays, and more generally uses out of context, are possible. – Gilles 'SO- stop being evil' May 22 '14 at 22:38
@Gilles, as you say, there are two separate issues: (1) uses out of context, and (2) replays. In my opinion, a good solution needs to solve both ("you may want to" is not enough). In my opinion, this is something that is not optional; it's mandatory, if we want to deploy this in practice and be secure. So, I think this answer would benefit from more on how to handle the practical security challenges (the hard parts of this problem are not the crypto algorithms but how to ensure they'll be used appropriately in practice). Optional security will often fall short, because it be left disabled. – D.W. May 22 '14 at 22:42

OhJeez · Answer 2 · 2014-05-22T06:25:41.563

3

I wrote this response while thinking I was on Information Security. Oops. Anyway, I think it may be helpful, so sorry if this is not exactly "cryptography" POV. Point of this response is: in this case encryption or hashing is not a good solution. It has a lot of problems, because there is very little entropy and it has to be working over HTTP.

Full programming-and-security POV response:

While this does not answer the question, consider avoiding at all "hidden" values that can be tampered with.

If this is a resource id, simply check if user has a right to make operations on that resource after receiving the form. Otherwise, read on.

Give each form instance a random token (long enough that it cannot be easily guessed by bruteforcing), and put somewhere server-side the "secret" data connected with the form token. Upon receiving the form, pull up the secret data connected with the token, and then delete the token and data on the server-side.

How to do that:

If you are using sessions, put the sensitive data in a associative array/map/hash (or whatever name of that data structure appeals to you) inside the session.
Otherwise put them in a database (remember to delete old non-used entries every hour or so)

Why you should do that:

Protection against CSRF, which is the SQL Injection of the modern day
Your "tamper proof" data is quite short, there is not a lot entropy in that. If all you did is hashing the data using a secret key, determined attacker may be able to do a "rainbow table" of all possible values.
Protection against replay attacks (sending the same data twice)

Things to consider:

Birthday paradox, if you are using single DB to hold all the data, the tokens have to be extra-long to avoid collisions

Last but not least, I have no idea what are you doing, but saving 32 bytes from each request (sha256 length) seems a bit pointless. You can achieve a lot more by optimizing images, removing whitespace from HTML, minifying JavaScript and CSS, using compression... 32 bytes is not that much in terms of webpages.

edited May 22 '14 at 06:25

answered May 21 '14 at 15:11

OhJeez

131
2

I wrote this response while thinking I was on Information Security. Oops. – Well, in that case: please feel very welcome to Crypto.SE… ;) – e-sushi May 21 '14 at 15:15
This seems like the best answer in this situation: access should be controlled by permissions, and once you have permissions in place it shouldn't really matter how the request came about - either the user is allowed to see the record or they're not. – nobody May 21 '14 at 15:25
This has less to do with visibility and more with tampering and the stateless nature of The Web. The current mantra is to make your webpages as stateless as possible. So, basically, when I open a document for editing in my browser, all the data should be on client side, and not server. Then, when I save the document, all the data should be posted back. The problems arise when permissions only allow a person to edit some parts of the document, but not others. Without a copy of the original document, I cannot tell on server side which fields were modified. – Vilx- May 21 '14 at 16:11
Storing a copy on the server side however becomes pretty tedious pretty quickly, especially since a person can open multiple browser tabs and edit several similar documents at once. It also leaks memory, because it's hard to tell when a browser tab has closed and when I can dispose of the server-side copy. Therefore a more elegant solution is to always pass all data to the client, but protect the sensitive parts. – Vilx- May 21 '14 at 16:16
The attacks you mention however can all be mitigated by using frequently changing keys and tokens (on opening of a document). Thank you for pointing them all out, I will definitely keep them in mind when designing my tamper-protection! – Vilx- May 21 '14 at 16:20
1

@Vilx- You cannot do everything client-side, because CSRF and replay attacks. Without server-side state you cannot tell replayed request from the first one. Changing keys and tokens on opening of a document is nothing else than having to update server-state on each client request, which kind of defeats the purpose. You are trying offload security to client, which is a very bad idea. Send everything, but accept only allowed changes, and what is allowed should be based on server state. – OhJeez May 21 '14 at 17:37
Hmm... true. Well, how about this - I don't want to save the document on the server side, but I can generate a new key/salt/whatever when the user logs on and save that in the session. Then another key/salt/whatever is generated for each opening of the form. BOTH are used to encrypt or hash the values. That way you can't replay or CSRF anything, even within the same session. – Vilx- May 21 '14 at 21:35
@Vilx- then just store in the session the thing you want to encrypt and don't send it to the user. Much simpler. – OhJeez May 22 '14 at 06:23
@OhJeez - True, but this presents me with a memory leak. How do I know when I can remove it from the session? Also I need to differentiate between different open browser tabs. This gets pretty hairy pretty quickly. Encryption is actually simpler (if I have the proper primitives, of course). – Vilx- May 22 '14 at 07:49
1
no it does not, sessions are gc'ed after they expire. 2. you don't need to differentiate in any way. 3. Getting data from server, encrypting it, deleting it (?) and sending to user, just so that user sends it back, it gets decrypted and saving it again - THAT is hairy. What if user does not send data back? What if he tries to open it twice? Stick to proven solutions. Don't experiment with such things if strangers on internet can point out few common fallacies in your experiments at first glance.

OhJeez

May 22 '14 at 17:05

score 2 · Answer 3 · answered May 21 '14 at 12:10

I second Gilles' suggestion of using an authenticated encryption mode. In particular, if your crypto library provides it, this sounds like a perfect job for SIV mode (RFC 5297).

SIV is designed to be "maximally misuse-resistant" authenticated encryption (AEAD) mode for securely encrypting and tamper-proofing (relatively) short messages (such as private keys for other cryptosystems; hence it is often described as a key-wrapping mode). It provides the following features:

As an authenticated encryption mode, SIV both encrypts the message and protects it from tampering. If fact, it is an AEAD mode, meaning that it can also, optionally, be used to tamper-protect unencrypted messages, or messages that are only partially encrypted (e.g. unencrypted headers + encrypted data).
SIV can be used with or without a per-message nonce. When used with a nonce, it provides full IND-CCA2 security; when used without one, it still retains almost all of this security, except for the fact that identical plaintexts can be detected by the fact that they will encrypt to identical ciphertexts. (Contrast this with many AEAD modes like GCM, where a missing or reused nonce can completely destroy the security of the scheme.)

(Of course, like most encryption modes, SIV also leaks the length of the plaintext, unless the plaintext messages are padded to a fixed length before encryption. This is universal enough that cryptographers tend to assume it without explicitly stating it, but it can sometimes be an important issue in practice.)
Like most authenticated encryption modes, SIV is based on a block cipher. In principle, any secure block cipher may be used, but AES is generally recommended. Unlike, say, GCM, SIV does not require any other low-level operations such as (generic) Galois field multiplication.
When SIV-AES is used without a nonce, the ciphertext is 128 bits (= 16 bytes) longer than the plaintext. (Since the ciphertext will, as usual, look like random binary data, you will have to e.g. base64-encode it to embed it in an HTML form, increasing its length by a further 33%.) If used with a nonce, the nonce will, of course, have to be somehow transmitted alongside the ciphertext as well.

(Technically, it would be possible to modify SIV to shorten the padding down to something like 64 bits (= 8 bytes), but I would not recommend this without a careful security analysis. 128 bits provides a comfortable security level, and the few extra bytes per message are rarely prohibitively expensive.)

The one major disadvantage of SIV, as compared to other AEAD modes like GCM, is that it's an inherently two-pass mode: the plaintext must be processed twice from beginning to end, which reduces its efficiency for long messages. (This is an unavoidable consequence of the maximal misuse-resistance goal, since it requires that every part of the plaintext must affect the encryption of every other part.) Fortunately, for your purposes, this does not seem a major issue.

Another, hopefully temporary, problem for SIV is that it's not yet as widely implemented in crypto libraries as older AEAD modes. While it's possible to implement SIV yourself based on CMAC and CTR mode (or even just a raw block cipher), this is not completely trivial. Hopefully, this issue will disappear over time, as more and more crypto libraries will support SIV.

I'm using .NET (C#). Usually it's possible to find libraries for most things, but if not - .NET provides a good set of cryptographic functions (primitives?) to build upon. I'll read your links and see which method suits me best. — Vilx-, May 21 '14 at 13:06
This answer is not sufficient, because it doesn't provide freshness (it doesn't prevent replaying of old values). The fact that your scheme provides both confidentiality and integrity/authenticity for all values is a good thing, though. — D.W., May 22 '14 at 22:34
@D.W.: Good point, although it really goes beyond algorithm choice, and into threat modelling. The general answer, with an AEAD algorithm, is to pass any extra (meta)data, which you want to tie the protected data to, into the algorithm as Associated Data. This might include e.g. a form ID, a timestamp, the user ID and any other hidden fields on the form. Note that all of those effectively become integrity-protected too: the authentication tag will only match if all of them are unchanged (although an attacker might still be able replace all of them, together, with an earlier set of values). — Ilmari Karonen, May 23 '14 at 17:23
... A convenient feature of SIV mode, in this regard, is that it can take a tuple of strings as associated data, rather than just a single string, so you don't have to worry about details like unambiguous encoding. — Ilmari Karonen, May 23 '14 at 17:29
@D.W.: Also worth noting is that, sometimes, you might not care about replay attacks: for example, I once worked on an in-house web app framework that supported passing arbitrary serialized data structures as hidden variables. On the framework level, we did not care about replay or pick-and-mix attacks -- those, if they were an issue, would be checked for at the application level. But we did want to stop untrusted data from going into the deserialization library, since it wasn't designed with security in mind, and had "features" that allowed e.g. arbitrary code execution. ... — Ilmari Karonen, May 23 '14 at 17:35
... Hence, we just slapped a MAC onto any serialized data items, without any associated data; if users wanted to poke into the HTML code and replace one serialized string with another, that was fine with us, as long as we knew they couldn't pwn the server by feeding in some serialized code and telling the deserializer to overwrite a common library function with it. — Ilmari Karonen, May 23 '14 at 17:38

score 2 · Answer 4 · answered May 22 '14 at 22:30

You need integrity/authenticity and freshness (replay prevention), and in many contexts confidentiality (depending upon the situation). Mere encryption is not enough, because it doesn't provide integrity/authenticity and doesn't prevent replays. A better scheme would be authenticated encryption, with a nonce that is checked on the server and verified not to repeat.

In practice you probably want to encrypt by default (i.e., always provide confidentiality by default), because if you make encryption optional it is too easy for there to be some value that was confidential but where you forgot to enable encryption.

You also need the value to be bound to this particular session and this particular context (e.g., this particular form and form attribute). Otherwise, an attacker can cut-and-paste a hidden value from one form into another form, and that might enable attacks.

Many of these considerations are security considerations, more so than crypto algorithm issues. For that reason, the primary challenges here are IT Security challenges, rather than Crypto challenges -- and you might want to flag your question to have it migrated over to Security.SE.

Look up Windows view state encryption, for a real-world example of such a mechanism.

Which tamper-protection algorithm provides the shortest output?

Background

Question

4 Answers4