I only understand assurance of integrity using a hash function. How to use cryptograpy to assure data integrity?
1 Answers
If I want to ask a potentially compromised server to remember a file that I don't have room to store myself, I can pick and remember a 256-bit secret uniformly at random, and compute a short—say, 128-bit—authenticator (or MAC, message authentication code) for the file under the secret key. I keep the key on my person; I affix the authenticator to the file.
- Standard MAC algorithms include Poly1305, which is very fast but can handle only one file per secret key, and HMAC-SHA256, which is much slower but can handle many files per secret key.
If, when I ask the server to retrieve my file, the server tries to fool me into accepting a file that is different from the one I stored, I can recompute the authenticator using the secret key, and compare it to the one that was stored alongside the file. If they match, then it is almost certainly the file I stored. If they don't match, then the file was modified.
- The technical property that a MAC has is existential unforgeability under chosen-message attack: we conjecture, or prove in the case of one-time authenticators like Poly1305, that an adversary who can learn the authenticator for one or many messages of their choice has only negligible probability of finding the true authenticator themselves for any other message. That is, we consider a game where the adversary can query you, the bearer of the secret key, for the authenticator for one or many messages of their choice; then the adversary wins if they can find, without asking you, the correct authenticator on any other message.
What if I want someone else to be able to verify integrity, without their being able to forge the messages? For example, I want to make a promise in a contract, and publish the contract so that anyone can read it, but I don't want to let anyone else alter the contract. I first share a public key with everyone, and then use the corresponding private key to digitally sign the contract. Anyone can use the public key to verify the signature. Only I, with secret knowledge of the private key, can make a signature that will pass verification. So anyone can verify, but only I can sign.
Standard signature algorithms include RSASSA-PSS, which is based on the mathematical magic of the RSA trapdoor permutation $m \mapsto m^3 \bmod n$ for $n = pq$ a product of large randomly chosen primes, and Ed25519, which is based on arithmetic in the scalar ring of the twisted Edwards elliptic curve $-x^2 + y^2 = 1 - \frac{121665}{121666} x^2 y^2$ over the finite field $\mathbb Z/(2^{255} - 19)\mathbb Z$.
The technical property that a digital signature scheme has is also called existential unforgeability under chosen-message attack, but in the public-key setting where the adversary also has access to the public key in addition to being able to query you for the signature on any message of their choice.

- 48,392
- 3
- 116
- 223
-
Using hash or cryptograpy, if the data was corrupted, It is necessary retransmission or it is recoverable? – Ed S Jul 14 '18 at 15:02
-
@EdS An authenticator or signature only detects errors. It cannot correct them. Whether your protocol can recover from a failure like this, or can retransmit, depends on your protocol. – Squeamish Ossifrage Jul 14 '18 at 15:11
-
MAC ( message authentication code) is similar a hash. What the advantages of using MAC instead of hash? – Ed S Jul 14 '18 at 15:26
-
@EdS ‘Hash’ is a very general term. If you mean a fixed public hash function like SHA-256, anyone can evaluate it, but only you can evaluate HMAC-SHA256 under your secret key. If, when storing a file $m$, you use $\operatorname{SHA256}(m)$ instead of $\operatorname{HMAC-SHA256}_k(m)$ as the ‘authenticator’ (where $k$ is your secret key), the server could replace it by a different file $m'$ and hand you $\operatorname{SHA256}(m')$ and you would be none the wiser. But the server can't compute $\operatorname{HMAC-SHA256}_k(m')$ without $k$. – Squeamish Ossifrage Jul 14 '18 at 15:29
-
It may also be worth noting that both Poly1305 and digital signatures can be computed over a hash of the message instead of the message itself. In the case of Poly1305 this can still be a bit faster than HMAC-SHA256, though slower than something like KMAC.
Since the hash function used is presumably immune to second preimage attacks the server can't find a new message m' that has the same output as Hash(m). So signing or MACing Hash(m) is just as good as signing or MACing M itself, and for digital signatures is often much faster.
– SAI Peregrinus Jul 14 '18 at 19:06 -
1@SAIPeregrinus Signing or authenticating $H(m)$ instead of $m$ itself makes you vulnerable to collisions in $H$. This mistake can be put into the design of the signature scheme itself too, like the standard and widely used RSASSA-PSS, which, despite having a randomization $r$, figures the message $m$ into the signature via $H(r \mathbin| H(m))$ rather than $H(r \mathbin| m)$. This mistake was actually exploited in an international incident of industrial espionage by the governments of the United States and Israel against Iran. – Squeamish Ossifrage Jul 14 '18 at 20:23
-
@SAIPeregrinus For a one-time authenticator, there's no reason to use $\operatorname{Poly1305}k(H(m))$ instead of either $\operatorname{Poly1305}_k(m)$, except to waste computation. For a many-time authenticator, you can [use $H{k_0}(\operatorname{Poly1305}{k_1}(m))$](https://crypto.stackexchange.com/a/59218/49826), which is faster than $\operatorname{Poly1305}{k_1}(H_{k_0}(m))$ because the slow PRF $H_{k_0}$ need be evaluated only on a 128-bit input, while the fast universal hash Poly1305 can handle the long message. – Squeamish Ossifrage Jul 14 '18 at 20:27
-
@SAIPeregrinus So, in general, no, it is not a good idea to figure the message $m$ into a signature via $H(m)$ for some fixed function $H$: it's either slower or less secure or both. In contrast, for example, Ed25519 figures the message in via $H(r \mathbin| m)$ where $r$ is a pseudorandom function of the message under the long-term private key, and NaCl crypto_secretbox_xsalsa20poly1305 derives a per-message one-time authenticator key for Poly1305 from the PRF XSalsa20: $\operatorname{Poly1305}_{H_k(\mathit{nonce})}(\mathit{ciphertext})$. – Squeamish Ossifrage Jul 14 '18 at 20:31
-
It is possible to use a plain hash (eg SHA-128) of the file, if you store the hash locally. The compromised server can hand you back the modified file, but when you recompute the hash and compare it to your saved value, there will be a mismatch. – Martin Bonner supports Monica Jul 14 '18 at 22:16
-
Given that Poly1305 requires you to save a different secret key for each file, why not just save the hash? (The hash may be slightly bigger, but on the other hand, an attacker wanting to compromise you would have to alter it - with a secret key, they just need to read it). – Martin Bonner supports Monica Jul 14 '18 at 22:17
-
1@MartinBonner It depends on the application. Maybe each file has a unique number, in which case you can derive the one-time Poly1305 key for each file as a PRF of its file number under a long-term secret. My point in mentioning both Poly1305 and HMAC-SHA256 is that there are multiple different ways to use cryptography for data integrity with qualitatively different specific security goals and performance characteristics: one-time authenticators, many-time authenticators, digital signatures. You need to be clear on the goals in order to use them. – Squeamish Ossifrage Jul 15 '18 at 00:10
-
@MartinBonner By the way, I've never heard of a SHA-128, but if it were what its name suggests—a 128-bit ‘collision-resistant’ hash—it would not be up to modern standards for security: a birthday attack costs $2^{64}$ hash evaluations, which is not only within the realm of feasibility for human engineering, but is actually done every second by the Bitcoin network today. Storing a known-good SHA-256 hash would work, though. – Squeamish Ossifrage Jul 15 '18 at 00:13
-
1@MartinBonner The reason I am reluctant to comment on storing a known-good SHA-256 hash is that it is tempting to take that advice and translate it into, e.g., placing a SHA-256 hash next to a download link for a file, which a priori does nothing to thwart forgery, unless the host of the web page with the download link and the SHA-256 hash is meaningfully separated from the download server. This story is more complicated and more delicate to convey. – Squeamish Ossifrage Jul 15 '18 at 00:19
-
Yes, SHA-128 was a brain-fart. I meant SHA-256. (I think I was thinking of AES-128 which people often forget is perfectly adequate in a pre-quantum world, except for stuff you are planning to keep secret for decades.) – Martin Bonner supports Monica Jul 15 '18 at 05:56
-
@MartinBonner (Note that AES-128 should not be considered to have even a 128-bit security level because the cost of a successful multi-target attack is much less than $2^{128}$ AES-128 evaluations, which means AES-128 provides qualitatively worse security than other cryptosystems like the X25519 DH function advertised to have a 128-bit security level. I recommend AES-256 over AES-128 even if you don't believe in quantum computers.) – Squeamish Ossifrage Jul 15 '18 at 06:02
-
Comments are not for extended discussion; this conversation has been moved to chat. – e-sushi Jul 15 '18 at 12:19
use cryptography
. From that perspective, your “hash vs cryptography” doesn't really make sense. Maybe you could edit your question to clarify it a bit more? For example, by defining what kind of hashes you're talking about and/or by describing a detailed scenario/problem you're trying to handle cryptographically? That would be great... thanks in advance for your related efforts. – e-sushi Jul 15 '18 at 12:23