byte significance with hash functions

Question

I am using the python hmac module to generate an authentication code. Given a key, message, and hash function (I am using md5), the library returns an authentication code as a python bytes object. I am using the first N bytes of the code, i.e. digest[0:N] if digest is the variable storing the code.

How do I unambiguously (i.e. in a library-independent fashion) describe the truncation I am using? Would it be the N most-signficant bytes? Least-significant? If I declare that I am using the HMAC algorithm (RFC 2104) with md5 hash, is it sufficient to say that I take the "first N bytes" of the MAC?

It is not a number to say LSB or MSB. You just got the first $N$ elements from the byte array. Thats all! — kelalaka, Nov 30 '20 at 22:43

fgrieu · Accepted Answer · 2020-12-01T10:22:13.783

I'd confidently go for "first N bytes". It's quite universally recognized that the output of practical cryptographic hashes are bitstrings or bytestrings; where a bytestring starts is subject to little ambiguity; and how to count starting from that is unambiguous.

That's not to say that an error is impossible. In particular, the MD5 output is 128 bits that internally are 4 words of 32 bits, and MD5 words are to be converted to bytes per little-endian convention. That has caused headaches including to researchers working on MD5, see this anecdote. And later hashes often used with HMAC (e.g. SHA-1, SHA-256 and SHA-512) use big-endian.

Note: even though HMAC-MD5 is still standing strong when the key is secret, is the small speed gain compared to HMAC-SHA-256 or HMAC-SHA-512 worth exposing a design to criticism?

I'll note that Blake3's MAC mode (keyed hash mode) is faster than MD5 on many systems, let alone HMAC-MD5. — SAI Peregrinus, Nov 30 '20 at 23:46

byte significance with hash functions

1 Answers1