Are there any signature schemes for underpowered devices (8-bit microcontroller)?

Question

I am currently researching into a small scale home automation system, aiming for cost. The system architecture is basically one master and several slaves which are connected in parallel.

Recently i've bumped into the natural question of system security. I'm mainly concerned with authentication, that is, I need the slave to be able to tell if a command issued is really trustable. In other words, verify that some command was really sent by the master, and not some impersonator which has connected to the communication bus.

I've started researching and found out about several implementations directed towards low end microcontrollers, including elliptic curve cryptography.

That is all very fine indeed, but I am only worried about signing in a public key scheme. I don't need to do actual data encryption and decryption.

I am aiming for really low cost, so the space available for code is rather small.

I accept all kinds of suggestions, but here comes my real question:

Would it be unsafe to use a scheme like Rabin signature in an awkward way like this:

Use a very small-sized key - 160 bits
Generate a new set of keys and send the public key to the slaves often, so even if it takes a relatively small amount of time to factor the public key, the attack is impractical because it has already changed when the attacker is ready

Mind that this is mostly a theoretical experiment - my aim is only to gain knowledge into cryptography on small devices. Also, this safety would be merely an extra layer on my system and not critical. As I only need signing and verification and not en/decryption I can do without a lot of code dedicated to this. The master is a PC, so the key generation is easy. The slaves do NOT need to authenticate themselves to the master.

Also, note that the signing is to be made on a small block of data which would be sent along with the main command message. This block, when parsed correctly guarantees the identity of the master. As for sending a new key to the slaves, it too would be accompanied by a block which has to be parsed correctly by the old key, or else an attacker could overtake the communication bus.

Update:

Hardware specs

The target platform is for now the S08 family of Freescale's 8bit MCU line. The most likely candidate in this section, considering the low cost would be a MCU with 32K/2K FLASH/RAM. The CPU can reach a maximum operating frequency of 40MHz. Anything under one second for verification is fairly acceptable.

I also have access to a 128K/8K MCU of the same family. However, as a learning exercise I am trying to cut down everything that is possible and be able to get the same code to run on a scaled down version of this MCUs which is cheaper: 8K/768B FLASH/RAM.

In my tests, even the ultra-low spec one is able to run the verification of a 320bit Rabin-signed block. Of course I could throw away the low-spec parts and concentrate on the bigger ones, but this is to be really a learning experiment and I would like to push the boundaries as far as they can go.

Final Update

For those interested, my chosen solution is to abandon authentication for really small sized devices. These will be able to connect to the bus but are insecure, and so must not take part in tasks which are critical. The slaves which must support the authentication process will be built upon MCUs with some more RAM to support the 1536-bit or greater key suggested. Most likely the Rabin cryptosystem will be used for its simpler maths. As before, only authentication is interesting at this moment.

Does your attack scenario contains the case where the attacker can block the line and intercept the key update? Should the slaves then continue to use the old key (which soon will be insecure) or refuse any commands? Also, you should really authenticate the whole message, not just accompany it by an authenticated block (otherwise an attacker can replay that block with a different command, or in case of a man-in-the-middle, substitute the original command with a wrong one. — Paŭlo Ebermann, Apr 09 '13 at 18:22
Well, first, the attacker couldn't replay a block and get away with it; the internal structure of this block would be changing and the slave ignores a command if its block is the same as the last one. — Bruno Morais, Apr 09 '13 at 20:07
Secondly, I can't see the attacker blocking the line in this fashion because intercepting the key doesn't serve him any purpose, as this is only a public key. Also, the slaves are connected in parallel (RS485), so it is impossible to block some part of the bus from listening. An option is that the slave uses a timer to refuse commands if its key has expired and the master can then signal an intrusion alert. — Bruno Morais, Apr 09 '13 at 20:15
I didn't totally understand how your signature idea would work, but just "ignore if same as the previous block" as a defense against replay attacks sounds a bit weak. For the Man-in-the-middle, that depends on your attack scenario. If the attacker can't remove the existing cables, it might be impossible, yes. — Paŭlo Ebermann, Apr 10 '13 at 17:34
@brunosmmm: I'm dubious about your rationale for ditching authentication on the basis of available RAM. See revisited update 4 in my answer, with tighter RAM estimates, boiling down to 224 bytes in addition of the signature itself if that can be assumed to be at a fixed address, and the public modulus is in Flash in suitable form. And again, with many embedded development tools, it is very easy to share temporary RAM across non-concurrent uses, without having to handle overlay manually, simply by declaring these temporaries as local variables. — fgrieu, Apr 13 '13 at 16:25
@fgrieu: The main problem is not the cryptographic stack itself, but the other functions the device must support in order to be compliant to this communication bus I am developing. The RAM isn't the only problem. FLASH space itself is a problem when using this setup. For now, I'm putting the smaller devices aside, for I am not quite ready to reimplement several blocks in assembly. Remember that I have a communications protocol and other tasks as well running simultaneously and there is a lot of code. — Bruno Morais, Apr 13 '13 at 17:17
ECC uses a lot less CPU than RSA, especially with curves like secp192k1... — dandavis, May 26 '16 at 21:26

score 12 · Accepted Answer · edited Apr 13 '17 at 12:48

For your application: "I need the (underpowered 8-bit) slave to be able to tell if a command issued is really trustable", RSA signature with low public exponent ($e=3$), or Rabin (an analog with $e=2$), is likely the most appropriate, assuming you can't trust the slaves to keep a key secret, which is the only realistic assumption unless that slave uses hardware with a level of physical security comparable to a Smart Card, like this 8-bit IC (link to TOE).

For example, a standard 8051 core running @4MIPS peak (48MHz xtal) or a 68HC11 @6MHz EClock (24MHz xtal) can verify a 1536-bit RSA signature ($e=3$) per PKCS#1 or ISO/IEC 9796-2 in 1 second, or about half that for Rabin ($e=2$), with well under 4kB of code, and little RAM. There are even faster Rabin-like schemes (though less common).

One problem is size of the signature: it is the same as the modulus, that is in my example 1536 bits (192 bytes) for PKCS#1, but the overhead can be much less with ISO/IEC 9796-2: max(34, 192-M) bytes where M is the size in bytes of the message/command, using a 256-bit hash like SHA-256, thanks to message recovery.

Update: Do not use Rabin with 160-bit public modulus, this is entirely unsafe, see this; such a modulus can be factored in seconds with public-domain tools like GMP-ECM, or perhaps even (with more time) an applet.

Update 2: The 8051 code quoted above uses a minor variation of straight textbook modular arithmetic, without Montgomery (which does not help for modular squaring or cube, at least in standard RSA/Rabin schemes) or Karatsuba, but with extensive low-level optimization in assembly. The variation (relatively straightforward but unpublished AFAIK) is that modular reduction occurs during the multiplication, rather than after, and within the same scanning of intermediary result. It works with no restriction on the modulus bit size or value. That has been used in a commercial implementation, developed and sold over a decade ago for POST of several brands and CPUs, and was (perhaps: is still) used for verification of Smart Card certificates, and/or signed code update. There was strong code size constraints, thus things are not optimized to the max for speed (no loop unrolling), rather for a balance of simplicity and reliability, size and speed. Here is an old commercial doc in French (full disclosure: this is my code & company). If you can make a few reasonable constraints on the public modulus, there are other optimizations leading to further simplification, and moderate speedup.

Update 3: 320-bit RSA modulus was already considered marginally safe 25 years ago, and I wildly guess would be factorisable in minutes on a single modern machine with a good MPQS or SIQS implementation like SIMPQS, or msieve if you find a wizard to get it running. And even with more decent RSA modulus size like 640-bit, the idea of short-lived public keys is hard to get right, especially if the devices must survive power loss. Unless I err, a trusted long-term key is required (chained certificates of short-term keys will not do). Also, your devices need a trusted source of current time (either internal or external). My advice is: forget about it for something secure; a symmetric MAC (as proposed in another answer) might be preferable.

Update 4, revisited: Your S08 CPU has 8x8->16-bit multiplication in 5 cycles. Unless I missed a divisor somewhere, that seems to be 125ns @40MHz. This is over 13 times faster than 10 cycles @6MHz on the HC11 (of 12 years ago) on which my code does 1536-bit RSA, $e=3$ well under 1 second. Bus cycles are also faster (20MHz vs. 6MHz) and I did not see mention of wait states for internal RAM. Despite the lesser instruction set (no 16-bit registers and addition), I guesstimate that a similar highly-optimized implementation on that S08 thing would check a 1536-bit Rabin signature in 0.2 second, give or take a factor of 2, using ~~under 1kByte~~ 416 or 608 bytes of RAM, depending on if the public key (in a form directly usable for computation) is present in Flash or not; this RAM is needed during signature verification only, and can be shared with other non-concurrent tasks; the 192 bytes of signature are included in this low RAM budget, and could be discounted if the signature is at a fixed address. Again this is in hand-tuned assembly, C compilers easily eat a factor of 10 or worst on multi-precision arithmetic, and keeping the RAM budget low requires some care.

Update 5a: A 1536-bit signature is only 192 bytes; and can, as well as the temporary variables necessary to check it, reside in RAM shared with the rest of the application; that is, the stack (which in any half-decent 8-bit development system I use is not the physical stack, but simulated by static analysis of the call tree). Thus you do not face a difficult RAM size issue using a decent RSA modulus key size in your application.

Update 5b: A "chain of certificates" introducing short-term public keys certified by the previous one would not be secure if the slave has to survive power loss, because the adversary can capture the current public key, cut power, factor that, restore power, and then forge a chain of certificates for fake short-term public keys of known factorization, that will be accepted by the slave, and boom the slave is "pwned". This is even if we assume that the slave has access to a trusted source of date/time, and each certificate carries a date/time of expiration.

Update 5c: The question and comments show a tendency to under-size RSA modulus, under the fallacy: if I can't factor it, surely that's good enough to bother my adversaries. That line of thought is wrong and dangerous, at least until one carefully researched the public state of the art (which, one must assume, any intelligent adversary will use), and has used power comparable to modern adversaries who could distribute factorization work on many (perhaps pwned) computers, and can safely assume none of the adversaries is smart enough to improve the state of the art. The sources quoted by keylength.com go one step further, and want some confidence that conceivable improvements in the state of the art, and true computer power available at a price, are unlikely to cause a break in a parameterizable number of years; that's one of the reason they ask for sizes way above the academic state of the art; and perhaps their known state of the art encompass more than the academic; and they of course have a desire to be on the safe side. I'm borderline prudent when I even discuss 640-bit RSA modulus for very short-term keys: 512-bit keys are routinely factored in days by enthusiasts on a shoestring; again see this here, as well as the true tale of banks using 321-bit keys way longer than reason and advise, in a high visibility/stakes application.

Using Montgomery multiplication (without transformations) instead of normal multiplication simplifies the implementation. For obtaining the given running time of 1s did you use the Karatsuba algorithm? — j.p., Apr 09 '13 at 08:10
@jug: In my former comment (now mostly moved in the answer) I probably misunderstood what you meant by "Montgomery multiplication (without transformations)". I'm not aware of a standard signature scheme where the verifier can use Montgomery arithmetic without the initial and final preprocessing, but it likely could be done, and yes that would somewhat simplify the code (but still give little speedup). Any pointer? — fgrieu, Apr 09 '13 at 10:12
You were right. With the tools you mentioned I was able to factor a ~160bit integer in a matter of seconds, so this is out of the question. However, in my tests, a ~320bit integer took some 20 minutes of crunching and didn't get factored, so I started thinking on the lines of key renewal again. I could, say, produce a new key every 10 minutes and broadcast it to the slaves. I get the feeling that this is kind of an awkward move, but really this is an exceptional situation where I could indeed issue new keys. The constraints are really stringent, anything more than 320bits will be difficult. — Bruno Morais, Apr 09 '13 at 14:20
Well, your answer is very complete. Thanks for the ideas. I still fail to see why the "chain of certificates" wouldn't do, as this isn't a really serious application. One of my main concerns is the signature size, which must reside in RAM and so will eat a good chunk of it. Also, the signed messages will get really big. — Bruno Morais, Apr 09 '13 at 22:42
Another question came to mind just now. Why should the tendency to exaggerate the key size so readily accepted? In an application like this, its not likely an attacker will have access to a supercomputer to factor the key. Thanks again. — Bruno Morais, Apr 09 '13 at 23:10
@fgrieu: Sorry, I don't know of any public reference or standard that simply replaces modular multiplication by Montgomery multiplication. — j.p., Apr 10 '13 at 09:47
@fgrieu: This is getting very interesting and I thank you for your answers. I have begun to see the point in the long-term key, but then there is another issue to address: if the slave is not required to have any previous knowledge of the bus's master it is connecting to when the system is started up, then security is impossible, because an attacker could do the same thing described in your update 5b. To me, this means that the master's public key must be pre-programmed on the device, defeating the purpose of an asymmetric system. — Bruno Morais, Apr 10 '13 at 15:15
@brunosmmm: your new comment is far from the original problem (no more underpowered devices). For a detailed answer I suggest a separate question. In short: this can be solved by a classic PKI, with a hierarchy of long-term keys, rooted to a manufacturer key in every device. E.g. the manufacturer issues a certificate that a given device (identified by serial number) was sold to the the holder of the private key matching a certain public key. Similar to certificates in Annex 1B, appendix 11 of that — fgrieu, Apr 10 '13 at 16:30
@brunosmmm: or there is a simpler, thus perhaps better, possibility: assume the genuine owner's public key is injected in the slave once when new, and stored in Flash. — fgrieu, Apr 10 '13 at 19:34
@fgrieu: I have one final comment: if I were to inject a public, long-term key when the slave is new I might as well have injected a secret key for a symmetric system. So I don't know if staying asymmetric is worth it — Bruno Morais, Apr 10 '13 at 22:57
@brunosmmm: There is a huge difference: a secret key gives no security when it gets known, and it is almost impossible to keep a secret with a standard CPU, because of debug and factory testing features like JTAG, goofs, and side channels; when by contrast nothing needs to be kept secret when using a public key for integrity verification. We are back to the beginning of my answer. These things apply to almost any embedded device, regardless of CPU power, and should really be asked in a different question. — fgrieu, Apr 11 '13 at 05:34

score 3 · Answer 2 · answered Apr 09 '13 at 02:21

3

Have you considered using symmetric key crypto (MAC) instead ? Elliptic Curve Crypto or even regular (but costly) modular arithmetic might be overkill in your case.

As I understand it you would be able to precharge MAC keys into your master and slaves before deployment and you would be set. You can even generate a different key for each so that the compromission of one slave doesn't impact the whole system.

That way you don't need to embed a PRNG in your slaves or require them to have a crazy processing power

answered Apr 09 '13 at 02:21

Alexandre Yamajako

1,074
6
6

This isn't the case, as I've pointed out, the slaves do not en/decrypt or sign anything; they just verify the masters' signature. The trouble is that this setup won't work, as the master isn't required to have any knowledge about a slave that is connected to the system at some point, so this isn't the way for me, but thanks – Bruno Morais Apr 09 '13 at 14:02

Are there any signature schemes for underpowered devices (8-bit microcontroller)?

Update:

Hardware specs

Final Update

2 Answers2