18

There are current state-of-the-art encryption algorithms, which considered absolutely safe currently, like AES. Their speed is around the 100MB/sec ballpark on current PCs (note: this is the speed when AES instruction set is not used - I'm interested in the general case, not when HW acceleration is available).

And there are fast, non-safe, obfuscation-only algorithms, like for example, xoring with some fast general-purpose, non-cryptographic random. Their speed can be very-very fast, possibly in the 10GB/sec range. But they can be broken easily (if a little part of the clear-text is known, they can be broken immediately).

Are there any algorithms, which are between these? Like, it has speed of ~1GB/sec, and it can be broken, with some computational effort (like the key can be found in a week/month/year of bruteforce searching on a current PC)?

It can be used in a scenario, when speed matters, but the data is not that sensitive, so it would not worth to break the encryption.

geza
  • 333
  • 1
  • 2
  • 9
  • 2
    Maybe I am wrong, but 100MB/sec sounds low for a current AES-NI enabled CPU. – Guut Boy Nov 08 '17 at 12:16
  • 9
    According to the Crypto++ benchmarks, AES is more in the 4GB/s ballpark than in the 100MB/s one. On my laptop, which is just a normal laptop with AES-NI instructions, running openssl speed -elapsed -evp aes-128-ccm is giving me values in the range of 300+GB/s (and it seems to be using just one core). – Lery Nov 08 '17 at 12:19
  • 2
    @GuutBoy: sure. But if AES-NI is not available, it is ~100MB/sec. But that's not the point here, it was just an example. All safe encryption algorithms (known by me) is around X*100MB/sec (where X is a small number). This question is not about which is the fastest possible AES implementation, but is there an algorithm, which is faster than the safe ones, while has less safety guarantees. – geza Nov 08 '17 at 12:23
  • @geza maybe you should rephrase the question then. As it stands now it sound like you are looking for a cipher in the ~1GB/s range. – Guut Boy Nov 08 '17 at 12:37
  • @Lery: that number seems pretty weird. For my machine, I got 600GB/sec, which means that each cycle, it processes 146 bytes. It seems impossible... ccm is 600GB/sec, while cbc is 940MB/sec. There is a 640x difference between them. But, as I've said to Guut Boy, that's not the point here. (I've edited my question a little bit, to make this clear). – geza Nov 08 '17 at 12:55
  • @geza True, I tried different options and took one of the fastest. It makes sense to me that CBC would be way slower, since it cannot parallelize the computations, but I've just tested with CTR and I obtained a 3GB/s rate, which is strangely far from the CCM ones... I guess it might by a bug in Openssl's benchmarking tools . – Lery Nov 08 '17 at 13:15
  • "The AES instruction set" is hardware acceleration. – chrylis -cautiouslyoptimistic- Nov 08 '17 at 19:33
  • @chrylis: That's right. What's the reason of your comment? – geza Nov 08 '17 at 20:03
  • It's probably better to talk about "cycles per byte" (or the inverse) rather than raw bandwidth that will be impacted by the clock if you want a more objective comparison. – Nick T Nov 09 '17 at 00:07
  • @Lery: I think as CCM is only defined for 16 byte blocks, for larger blocks, openssl simply upscales the numbers, presenting completely invalid benchmark results. – geza Nov 09 '17 at 01:24
  • rc4? that might be slow but it's very simple – anna328p Nov 09 '17 at 03:58
  • @geza it ended up being a known bug of OpenSSL – Lery Nov 09 '17 at 08:40
  • One category of encryption algorithms that needs to run fast on low-cost hardware even if it makes it easier to break, but not "too easy" to break, is "television encryption". Perhaps you would be interested in some of the algorithms developed for it? – David Cary Nov 11 '17 at 16:10
  • @DavidCary: do you have a specific algorithm in mind? I've checked some of them, and all of them are proprietary – geza Nov 11 '17 at 17:17
  • Another category of encryption algorithms that needs to run fast on low-power hardware is mobile phone encryption. KASUMI, A5/2, and A5/1. – David Cary Nov 17 '17 at 03:59
  • You may also be interested in the candidates for the eSTREAM collection of ciphers, such as Salsa20 and Trivium. Most are "free for any use"; all are "faster than AES-128". – David Cary Nov 17 '17 at 04:12
  • "If a little part of the clear-text is known, they can be broken immediately". Sometimes weakness like this is found, but that RNG then gets thrown away, I think. None of the eSTREAM finalists have this weakness, right? – bobuhito Dec 29 '19 at 20:09

2 Answers2

27

A classical table-based AES implementation would achieve about 160 MB/s on my current computer (a fairly recent MacBook Pro). However, one can do better; of course there are the AES-NI instructions, that easily bump up speed on that machine to the 5 GB/s mark (with a parallel mode such as AES-CTR; AES-CBC encryption is much slower). But even without these instructions, the Käsper-Schwabe implementation of AES-CTR would offer more than 400 MB/s, a substantial improvement.

Looking outside of AES, there is ChaCha20, as specified in RFC 7539. Using my own implementations, the purely generic, 32-bit plain C code (chacha20_ct) encrypts or decrypts data at 385 MB/s on my laptop; the SSE2-enhanced implementation (chacha20_sse2) offers a 584 MB/s.

Generally speaking, block ciphers like AES are versatile primitives, and it can be argued that, by forfeiting versatility and concentrating on the encryption/decryption role, better performance may be achieved. This is what stream ciphers like ChaCha20 are about.

About ten years ago, there was the eSTREAM project which resulted in a portfolio of stream ciphers. On my laptop, SOSEMANUK achieves about 1.64 GB/s, which is not bad for a design from ten years ago. Notably, it is 10 times faster than the table-based AES. (I wrote part of the code; I don't know who packaged it as a Zip archive with modified file names that break compilation.)

Among more modern designs, one may cite NORX. I encountered an implementation on small ARM systems that was consistently trouncing ChaCha20. I suppose it would also clear the 1 GB/s mark on a modern PC.

Summary: 1 GB/s is actually highly feasible with existing algorithms, on standard hardware, without using the AES instructions, and without sacrificing security: all of the above are currently unbroken, despite extensive exposure to vindictive cryptographers.

Of course, excluding the AES-NI instructions is rather artificial: it makes relatively little sense to make benchmarks on a modern CPU without using the features of that CPU. Performance on smaller, embedded systems without an hardware AES implementation may be more relevant.

Thomas Pornin
  • 86,974
  • 16
  • 242
  • 314
  • 1
    Thanks for the answer! SOSEMANUK seems very viable. I excluded AES-NI because I cannot depend on it. I need a cross-platform solution, which performs equally well on all platforms (I need to use the same algorithm on all platforms). – geza Nov 08 '17 at 14:08
  • Thomas, does your answer mean, that no "not totally secure but hard enough to be safe" ciphers developed? So researchers only develop ciphers which aim complete safety? – geza Nov 08 '17 at 16:09
  • 1
    @geza Yes, that's about it. In fact, there is no obvious way to make a cipher that is not immediately breakable, and yet faster than a safe one. Thus we simply aim at really secure algorithms. – Thomas Pornin Nov 08 '17 at 16:30
  • 3
    “performs equally well on all platforms” cannot be possible, as all platforms do not perform equally to start with – OrangeDog Nov 08 '17 at 16:38
  • @OrangeDog: I meant that compared to a non-HW-accelerated cipher, not that the performance should be the same on all platforms. HW-accelerated AES is not an option, because it has good performance only on platforms where the acceleration is available. But, for example, SOSEMANUK should be faster on all platforms compared to a non-HW-accelerated AES. – geza Nov 08 '17 at 16:44
  • @geza What platforms are you targeting? Practically all Intel/AMD chips have accelerated AES instructions, most ARMs as well, even embedded chips like Atmel XMEGAs have AES accelerators. – Nick T Nov 09 '17 at 00:11
  • @NickT: Thanks for the suggestion, I'll consider it. I'm targeting armv7 for example, which doesn't have it, if I'm not mistaken. But maybe that platform is not that important any more, so a slow AES is acceptable there. I like accelerated AES, as it is very fast, compared to any other (non-accelerated) cipher. I'm currently experimenting with ISAAC, it is very fast too. I'd like to pick a solution which doesn't need special HW, even it is widely available (if it were available 100%, I'd pick AES without hesitating). – geza Nov 09 '17 at 00:31
  • @ThomasPornin How did NORX win against ChaCha in a software implementation (assuming same number of rounds)? After all they're nearly identical, with NORX replacing addition with a multi instruction limited carry addition. – CodesInChaos Nov 14 '17 at 15:33
  • Out of curiosity, do you know how NSA's Simon / SPECK stack up against the AES and ChaCha20 performance numbers? – Mike Ounsworth Nov 14 '17 at 17:22
  • @CodesInChaos The NORX implementation I had under my hand used much fewer rounds than ChaCha20. – Thomas Pornin Nov 14 '17 at 18:20
5

There are some lightweight stream ciphers that might be 10x faster than some implementations of AES. Achterbahn is apparently 1 cycle per byte, Salsa20 might be as low as 4, compared to AES which might be 18.

In theory you could make something secure as fast as your lower bound (just XOR with a CSPRNG that you have created earlier).

daniel
  • 912
  • 5
  • 15
  • Thanks for the answer, quite useful, now I'm checking out the various ciphers. Where did you find the information that Achterbahn is 1 cycle/byte? Btw, I found this: http://www.ecrypt.eu.org/stream/ciphers/abc/abc.pdf. The authors claim that it can do 850MB/sec on a pentium 4, so with current CPUs, this can be even more faster. – geza Nov 08 '17 at 13:35
  • https://en.wikipedia.org/wiki/Stream_cipher#Comparison_of_stream_ciphers for the 1 cpb claim. \ – daniel Nov 08 '17 at 14:42
  • What hardware are those cycle counts on? The paper you linked only mentions CPB for AES on 8-bit RISC AVR microcontrollers, which is obviously going to be very different from a 32-bit ARM, or a superscaler / out-of-order x86-64 (even without using AES-NI). – Peter Cordes Nov 09 '17 at 03:31
  • @PeterCordes I don't know it's just from the Wikipedia table – daniel Nov 09 '17 at 08:08
  • Most of the CPB entries have a note that links to Pentium 4, Pentium III, ARMv7TDMI, or whatever. Achterbahn just says "hardware", so presumably they're talking about an FPGA or ASIC implementation. Hardly surprising that makes it very fast... – Peter Cordes Nov 09 '17 at 14:02