Why AEAD instead of encrypting with a simple hash?

Question

I hear a lot about AEAD encryption (GCM, EAX, ...). Why is it unsecure (it seems to be as AEAD exists) to do do that: Imagine a block of data, hash it, append the hash to the data, then encrypt the result (data + hash) ? I would intuitively think that it is not possible to currupt the data then corrupt the hash to hide the changes as both are encrypted.

EDIT: I found something that that made me realise the utility of the header in AEAD, it stores informations like the algorithm used, however I don't need a header in my case, so is it ok to encrypt(data + hash(data))?

"the header in AEAD"; the concept of an AEAD doesn't imply a 'header'. Are you talking about the AAD information. That's not (necessarily) a header; instead, it is actually 'encryption context', there so that an attacker cannot present one valid ciphertext he found into a different context. This context information can be included with the ciphertext (and checked by the receiver), however it can also be included implicitly (e.g. this is the seventh encrypted TLS record received on this connection). — poncho, Jun 21 '20 at 12:15

hakoja · Accepted Answer · 2020-06-21T12:18:20.597

So suppose we define our encryption scheme as follows:

$E(K, M) = \operatorname{CTR}(K, M || H(M))$,

where $H$ is a hash function (e.g., SHA2-256), and $\text{CTR}$ is the counter mode-of-operation of some underlying blockcipher (e.g., AES-128). Now suppose we observe the ciphertext $C = C_M || C_T $ of a known message $M$ and want to modify some bits in $C$ so that it decrypts to some other message $M'$. Here $C_M$ denotes the part of the ciphertext which contains the encrypted part of the message itself, while $C_T$ contains the encrypted part of the hash of the message. In more detail:

$C = \overbrace{10000110100011}^{C_M} || \overbrace{1100010}^{C_T}\\ \phantom{C} = \overbrace{00100011110010}^{M} || \overbrace{0010100}^{T = H(M)} \\ \hspace{3.5cm} \oplus \\ \phantom{C =}\ \underbrace{10100101010001 || 1110110}_{\text{CTR keystream}}$

For simplicity, suppose we want to create $M'$ by flipping bits 1, 3, and 13 in the original message $M$. First we start by simply flipping bits 1, 3, and 13 in $C_M$. This gives

$C' = \overbrace{\color{red}{0}0\color{red}{1}001101000\color{red}{0}1}^{C_{M'}} || \overbrace{1100010}^{C_T}$

When the $C_M'$-part is decrypted, this will yield $M'$ due the properties of the CTR mode-of-operation:

$\overbrace{\color{red}{0}0\color{red}{1}001101000\color{red}{0}1}^{C_{M'}}\\ \hspace{1.5cm} \oplus \\ 10100101010001 \dots \quad (\text{CTR keystream})\\ \color{red}{1}0\color{red}{0}000111100\color{red}{0}0 \quad = M'$

However, now the hash won't match anymore. So we also need to modify $C_T$ into $C_{T'}$ such that when $C_{T'}$ is decrypted it yields $T' = H(M')$, i.e., the correct hash of our modified message $M'$. But this is easy since we know $M$ and $C_T$: first compute $T' = H(M')$ and suppose $T$ and $T'$ differs in bits, say, 2, 3, and 7, i.e. $T' = H(M') = 0\color{red}{10}010\color{red}{1}$. Now we simply flip bits 2,3, and 7 in $C_T$ to get $C_{T'}$, and this will decrypt to $T'$. Thus our full ciphertext is:

$C' = C_{M'} || C_{T'}$,

which when decrypted yields:

$C' \oplus \text{CTR keystream} = M' || T' = M' || H(M')$.

Note that this attack won't work as-is on another mode-of-operation which doesn't provide integrity. However, analogous attacks are usually easy to come up with. For example, see here for the analogous attack on CBC-mode.

In conclusion: your suggested scheme, albeit natural, fails to provide integrity. That is why modes like GCM, CCM, and EAX exist.

Sorry, I am new to cryptography, could you explain the meaning of the letters you used (T, T', Cm, Ct, ...). — moutonlapin28, Jun 21 '20 at 11:13

Why AEAD instead of encrypting with a simple hash?

1 Answers1