Calculating the amount of zero bits to be appended to the message

Question

From FIPS 180-3

Suppose that the length of the message, $M$, is $l$ bits. Append the bit $1$ to the end of the message, followed by $k$ zero bits, where $k$ is the smallest, non-negative solution to the equation $l + 1 + k \equiv 448 \mod 512$ . [...] For example, the (8-bit ASCII) message “abc” has length $8 \times 3 = 24$, so the message is padded with a one bit, then $448 − (24 + 1) = 423$ zero bits [...].

The example only works for $0 \leq M < 448$, so I've come up with this formula for calculating $k$.

$$k = 448 - (l + 1) + 512 \cdot \left\lfloor \frac{l + 64}{512} \right\rfloor $$

or shorter

$$k = 447 - l + 512 \cdot \left\lfloor \frac{l + 64}{512} \right\rfloor $$

Is this the shortest/most performant way to calculate $k$?

Do you really need this? You just have to make sure that the total length is a multiple of 512 bits (= 64 bytes), so allocate an array with enough space, initialize it with zero, and put the length in the last 64 bits. — Paŭlo Ebermann, Aug 31 '11 at 16:59
I haven't gone through the rest of the document yet, but I assume I do because it specifically says k should be the smallest, non-negative solution. — Stijn, Aug 31 '11 at 17:05
"enough space" means here "just the right number of 512-bit-blocks", of course. I wanted to say that you really want to know how many such blocks to allocate, not how many bits of padding to use. The padding is simple there to fill the last block. — Paŭlo Ebermann, Aug 31 '11 at 17:07
I might have been trying to solve a non-existent problem then. — Stijn, Aug 31 '11 at 17:12
I added the way of calculating the number of blocks to my answer. — Paŭlo Ebermann, Aug 31 '11 at 17:26
What is the meaning of flooring, and how does one evaluate k = 447 - l + 512 * floor((l + 64) / 512) for l=24,448? — , Nov 21 '11 at 16:36
@praveen it is a formula I had constructed by trial and error. Please see the answers for a better formula, and also for why it's not neccesary to calculate k. — Stijn, Nov 22 '11 at 00:34

Paŭlo Ebermann · Accepted Answer · 2011-09-01T10:17:14.073

If you have an implementation of an integer modulo operator, then

$$k = (447 - l) \bmod 512$$

should be the right solution. If your modulo operator can return negative results, do this:

$$k = ((447 - l) \bmod 512 + 512) \bmod 512$$

This seems simpler than using your division and flooring.

That said, you actually don't really need the number of zero bits for the padding, but you need to know how many 512-bit (e.g. 64 byte) blocks to hash. These then will be filled by:

the data
one 1 bit
a number of 0 bits (between 0 and 511, to fill the block)
the length, encoded as a 64-bit number.

We need just enough ($b$) blocks to have space for the data and the length (and the one 1-bit):

$$ b = \left\lceil \frac{l + 1 + 64}{512} \right\rceil $$

If you hash only data in whole bytes (as usual), the formula becomes

$$ b = \left\lceil \frac{l_8 + 1 + 8}{64} \right\rceil, $$

with $l_8 = \frac l8$ the length of the data in bytes.

score 7 · Answer 2 · answered Sep 01 '11 at 07:25

In practice (i.e. when actually implementing the function), you do not really calculate $k$. Things rather work like this: you have a 64-byte buffer. You process incoming data through that buffer; when it is full, you apply the compression function, which mutates the internal state (five 32-bit words for SHA-1, eight for SHA-256), and begin again at offset 0 in the buffer. In other words, you keep an internal pointer (or index) to the first free byte in the buffer. With C notation, let's call buf the buffer and ptr the index. ptr always has a value between 0 (buffer is empty) and 63 (buffer is almost full, only needs one extra byte)(Note: I am assuming that you are processing byte-oriented data, i.e. you never have individual bits, only bytes -- I have yet to encounter a practical situation where this is not true).

When it comes to finishing the computation, you have, at that point, between 0 and 63 bytes of unprocessed data in your buffer. You add the extra "1" bit as an extra byte of value 0x80, which in C is:

buf[ptr ++] = 0x80;

Then you have to add "enough zeros, then the length over 64 bits", so the code should look like this:

if (ptr <= 56) {
    memset(buf + ptr, 0, 56 - ptr);
} else {
    memset(buf + ptr, 0, 64 - ptr);
    call_compression_function();
    memset(buf, 0, 56);
}
encode_length(buf + 56);
call_compression_function();

where encode_length() writes out the 64-bit bit length of the input message at the specified address (big-endian convention), and call_compression_function() invokes the compression function over the data in buf[].

So the value of $k$ is never really computed; it is the sum of the third arguments to the calls to memset() (+7 for the seven zeros implied in the 0x80 extra byte).

Self-promotion: you can see how such code looks like, in C and Java, in the opensource library sphlib.

score 3 · Answer 3 · answered Sep 01 '11 at 05:55

You don't need to compute this at all, this is just a mathematical description for defining the padding on the message as a whole.

This is because you don't need to process the whole message at once, which is useful if the message is very large (say a very large file), or if you are on a small system with little RAM (say a microcontroller).

You can fill a 512 bit buffer for a message block with blocks of data and do the hashing block by block until you reach the end of the data. Then add the one, pad the last block up to bit position 447 and add the 64 length bits at the end, and hash the final block.

A slight complication occurs when the data ends at position 448 or later. Then you have to fill the message block with padding and add another one containing only padding and the length.

Calculating the amount of zero bits to be appended to the message

3 Answers3