2

On Wikipedia:

https://en.wikipedia.org/wiki/Salsa20#cite_note-5

they wrote Salsa20 offers speeds of around 4–14 cycles per byte in software on modern x86 processors. But is it implemented that way, it performs some operations in parallel? If so, how many cores ares used, four?

Tom
  • 1,221
  • 6
  • 16

1 Answers1

2

But is it implemented that way, it performs some operations in parallel?

The speed tests used with Salsa20 assume a single core.

On the other hand, those speed tests predate AVX512. I expect that, if the code was rewritten to take advantage of those instructions, it should go significantly faster.

In addition, Salsa20 uses counter mode to encrypt; hence it could be parallelized (with separate cores encrypting separate parts of the plaintext); assuming you have a long plaintext message to encrypt, you can use as many cores as you think appropriate. I personally suspect that, unless the message is absolutely huge, the time needed to synchronize the various threads would defeat the parallelization gain, and you'd be better off using the various threads to do different tasks.

poncho
  • 147,019
  • 11
  • 229
  • 360
  • Beware that the newer Intel CPU's have AVX512 removed. Funny enough the latest AMD 7000 series CPU's have the Intel AVX512 defined instructions added to the CPU. So AVX512 can provide a speedup, but it is rather uncertain if you get it. AVX2 is much more prevalent any way (but it of course depends on the crypto library / runtime if the instructions are used). – Maarten Bodewes Jan 30 '23 at 21:22