shake-128 in 32-bits platform

Question

I am implementing shake in 32-bits microcontroller, so I would like to use bit interleaving. I checked the code in: https://github.com/floodyberry/supercop/blob/master/crypto_hash/keccak/simple32bi/Keccak-simple32BI.c but I do not understand how is the bit interleaving transformation done in this code. Could anyone give an insight for understanding this.

Also, I compare the results of the function KeccakF for previous link and the 64-bit implementation (https://github.com/floodyberry/supercop/blob/master/crypto_hash/keccak/simple/Keccak-simple.c) and the result is not the same. they are supposed to be same or aren't they?

Thank you

I'm not answering the question, but if it's arm then check out https://github.com/XKCP/XKCP/tree/master/lib/low/KeccakP-1600/Optimized32biAsmARM — Ruggero, Aug 27 '18 at 15:12

score 2 · Accepted Answer · answered Aug 27 '18 at 20:29

The bit interleaving code looks like this:

{
    const UINT32 * pI = (const UINT32 *)in;
    UINT32 * pS = state;
    UINT32 t, x0, x1;
    int i;
    for (i = laneCount-1; i >= 0; --i)
    {
        x0 = *(pI++);
        t = (x0 ^ (x0 >>  1)) & 0x22222222UL;  x0 = x0 ^ t ^ (t <<  1);
        t = (x0 ^ (x0 >>  2)) & 0x0C0C0C0CUL;  x0 = x0 ^ t ^ (t <<  2);
        t = (x0 ^ (x0 >>  4)) & 0x00F000F0UL;  x0 = x0 ^ t ^ (t <<  4);
        t = (x0 ^ (x0 >>  8)) & 0x0000FF00UL;  x0 = x0 ^ t ^ (t <<  8);
        x1 = *(pI++);
        t = (x1 ^ (x1 >>  1)) & 0x22222222UL;  x1 = x1 ^ t ^ (t <<  1);
        t = (x1 ^ (x1 >>  2)) & 0x0C0C0C0CUL;  x1 = x1 ^ t ^ (t <<  2);
        t = (x1 ^ (x1 >>  4)) & 0x00F000F0UL;  x1 = x1 ^ t ^ (t <<  4);
        t = (x1 ^ (x1 >>  8)) & 0x0000FF00UL;  x1 = x1 ^ t ^ (t <<  8);
        *(pS++) ^= (x0 & 0x0000FFFF) | (x1 << 16);
        *(pS++) ^= (x0 >> 16) | (x1 & 0xFFFF0000);
    }
}

There is also the matching deinterleave code in the extract function.

I also wrote a Keccak/SHA3 class using that code (or something similar) as a base. For whatever reason I did not like the way that looked, and ended up writing my own, which ended up being 1.45X faster (just the interleave, not the hash) when compiled compared to my implementation of this interleave code, so there are faster ways to do it.

I also found it is easier to interleave/deinterleave the entire state during dev/testing when you absorb or extract as you can view intermediate values of the state working variables and compare to a reference implementation.

shake-128 in 32-bits platform

1 Answers1