2

$K_{pub} = (n, e)$

$K_{pvt} = d$

Then

$E_{K_{pub}}(x) \equiv x^e \mod n$

Practically, when RSA is used to encrypt strings, what is the $x$ here? You cannot take it byte by byte because $\mod n$ will result in values larger than a byte. So what is done?

fgrieu
  • 140,762
  • 12
  • 307
  • 587
user93353
  • 2,191
  • 3
  • 23
  • 43
  • 1
    Practically RSA is not used to encrypt strings. – Maeher Jan 14 '21 at 09:40
  • @Maeher - but it can be, right? So in case it needs to be done, how will it be done? – user93353 Jan 14 '21 at 09:43
  • @Maeher - and even if it's used to encrypt an AES key - the AES key will be a 256 bit - i.e. 32 bytes. So how will it be encrypted - it can't be done byte by byte because mod n will result in values larger than a byte – user93353 Jan 14 '21 at 09:49
  • What? Almost all encryption algorithms can be considered encrypting bit and bytes. Computationally, a string is a sequence of bytes. RSA keys are at least 1024 bits that make 256 bytes. What is your actual problem? – kelalaka Jan 14 '21 at 10:13
  • @kelalaka - I want to encrypt "ABCD". What would I raise to e to encrypt this? If I raise A (65) to the power e mod n, then the result wouldn't fit in a byte - so A encrypted may take more than 1 byte. It may take upto n/8 bytes. So do I chop the string "ABCD" into n bit size blocks & then iteratively raise each block to power e? – user93353 Jan 14 '21 at 10:46
  • Yout form an integer, possible big-endian style 65666769 then encrypt it of course using with a proper scheme like PKCS#1 v1.5 padding or OAEP – kelalaka Jan 14 '21 at 10:48
  • RSA implementations use big integers like GNU GMP, then you need to encode your message into a big integer that smaller than then the padding can support. – kelalaka Jan 14 '21 at 10:50
  • @kelalaka - you form a big integer from what? Each byte? Each set of bytes or what? – user93353 Jan 14 '21 at 10:56
  • Read from here, https://tools.ietf.org/html/rfc8017#section-7.2 – kelalaka Jan 14 '21 at 10:57
  • @kelalaka - thank you. So the max length of the message which can be encrypted is (k-11) and the whole message is treated as one big integer. Is there no standard scheme/algorithm where if the message is longer than (k-11), you treat it as multiple (k-11) size blocks & run the encryption on each (k-11) block? i.e. is it a hard limit - that the max size of message to be encrypted with RSA is (k-11) – user93353 Jan 14 '21 at 11:19
  • It appears that you're expecting encryption to be format preserving. (An encrypted byte will again be a byte.) That is generally not the case. In fact any secure public key encryption scheme is necessarily randomized and thus always expanding. – Maeher Jan 14 '21 at 11:19
  • RSA is not for encryption, rather signatures with RSASSA-PSS or key exchange with RSA-KEM. – kelalaka Jan 14 '21 at 11:20
  • @kelalaka - I understand - my final question was more theoretical than practical - so from you answer, I assume, RSA encryption for more than (k-11) size is not supported by any standard way. Thank you. If you can put your original comment(s) as an answer, I can accept it. – user93353 Jan 14 '21 at 11:23
  • Well, for this kind of answer Maarten of Fgriue will be better, since they know the standards way better than me :) – kelalaka Jan 14 '21 at 11:25
  • @Maeher - no I am not expecting it to be format preserving. My question was more about a huge increase in size - i.e. if it's done byte by byte - then each byte would be turned into k bytes. However, from kelalaka's answer, it's clear it's not done byte by byte. But just 1 calculation with a hard limit on the size of the string to be encrypted. – user93353 Jan 14 '21 at 11:32
  • I think this is a dupe of this question. I remember as I answered it :) – Maarten Bodewes Jan 15 '21 at 00:03

2 Answers2

6

Practically, when RSA is used to encrypt strings, what is the $x$ in $x^e\bmod n$?

That depends on the variant of RSA. Among the most common:

  1. Toy-sized textbook RSA, where the public modulus $n$ is small: it is customary to encrypt letter by letter (or pair of letters, as in the original RSA article's small example) and concatenate the RSA cryptograms. Thus $x$ is the rank of the letters in the encoding used (or $x=x_0\,b+x_1$ where $x_0$ and $x_1$ are the ranks of two letters, with $b$ a public constant greater than the maximum value of $x_i$, e.g. $b=100$ in said article). There is no security for small $n$: a toy hammer won't actually nail. Small $n$ is anything up to like a hundred decimal digits. That can be factored quickly, which allows decryption. See this for records.

  2. Textbook RSA with large $n$: it is customary to transform the string into bytes (e.g. per UTF-8, the modern compatible superset of ASCII), then from bytestring to integer $x$ (usually per OS2IP). In Python

    int.from_bytes(bytes('François wears a !', 'UTF-8'), byteorder='big', signed=False)
    

    There is a size limitation to $k-1$ bytes, where $2^{8(k-1)}<n<2^{8k}$, which insures $0\le x<n$. On decryption, leading zero bytes are ignored/removed (due to the simplistic conversion from string to bytestring). Variations abound (some encoding of size to allow any bytestring, padding on the right so that $x$ is large even for small strings, endianness…).

    Caution: Textbook RSA in not secure under Choosen Plaintext Attack:

    • An attacker can trivially verify a guess of the plaintext: just encrypt the guess and check against the cryptogram. That attack is devastating for names on the class roll, credit card number…
    • When short strings encode to small integers $x$, several other attacks apply, including
      • when $x$ happens to we writable as $x=x_a\cdot x_b$ for integers $x_a$ and $x_b$ small enough to be found by enumeration, there's a meet-in-the-middle attack
      • when $e<\log_2(N)/\log_2(x)$, it stands $x^e\bmod N\,=\,x^e$, and thus it's trivial to find $x$ by $e\text{th}$ root extraction.
  3. RSAES-PKCS1-v1_5: similar to 2 plus random padding, and means to remove it on decryption. $x$ is a combination of the string to encode, 3 constant bytes, and at least 8 random (non-zero) bytes. The string is thus limited to $k-11$ bytes (per §7.2.1 step 1). This method is better, but still has serious defects:

    • Implementations of decryption are difficult to protect against side-channel attacks. The first was Daniel Bleichenbacher's Chosen ciphertext attacks against protocols based on the RSA encryption standard PKCS #1, in proceedings of Crypto 1998, and there are many variations.
    • Unless we lower the $k-11$ limit, encryption is inherently vulnerable to an attack under CPA costing $2^{63}$ encryptions.

    For these reasons, RSAES-PKCS1-v1_5 should not be used in any new design.

  4. RSAES-OAEP: this is a major improvement of the above, using a hash. The string is transformed by the padding process into integer $x$ that is sort of random with $0\le x<2^{8(k-1)}$, and that's undone in decryption. Secure implementations of decryption are easier than for RSAES-PKCS1-v1_5. Security is theoretically reducible to that of the hash and of the RSA problem (finding a random $x$ given $x^e\bmod n$). The size limitation becomes $k-2h-2$ bytes (per §7.1.1 step 1.b) where $h$ is the size of the hash (e.g. $h=32$ bytes for SHA-256).

  5. Hybrid encryption, e.g. RSA-KEM. A random value $x$ with $0\le x<N$ is RSA-encrypted with no padding, a symmetric encryption key is derived from that, and that key is used to encrypt(-and-MAC) the string to encrypt. Some avenues of implementation mistakes on decryption that still exist in RSAES-OAEP are gone. Security is theoretically reducible to that of the encryption and the RSA problem, with a simpler proof and/or quantitatively better assurance than for RSAES-OAEP. There is no size limitation. However the size of the cryptogram is slightly increased, and we need a Key Derivation Function and an authenticated cipher, when that's built into RSAES-OAEP.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
  • (cough) using the 8-bit bytes (formally octets) used by PKCS1 (and nearly everyone since about 1980) SHA256 is 32 bytes – dave_thompson_085 Jan 16 '21 at 03:36
  • @dave_thompson_085: thanks, sharp-eyed! Yes I no longer bother distinguishing a byte and an octet; I'm even sometime assuming a C char is a byte, and occasionally that INT_MAX is at least $2^{31}-1$. Where's that emphasis on portability that I once had? – fgrieu Jan 16 '21 at 11:52
0

Composed the answer I was looking for from the different comments in response to the question

  • Input is considered as an array of bytes/octets (8 bit).
  • k is the octet length of the RSA modulus (n)
  • Maximum number of octets which can be encrypted with RSA is k - 11
  • The array of octets after padding is considered to be a Big Integer - x
  • The Big Integer x is encrypted using the public key - $E_{K_{pub}}(x) \equiv x^e \mod n$

For more info, look at OS2IP and PKCS1

user93353
  • 2,191
  • 3
  • 23
  • 43
  • 1
    The method discussed in this answer is RSAES-PKCS1-v1_5, which has serious defects: implementations of decryption are difficult to protect against side-channel attacks; and (unless we change the $k-11$ limit) encryption is inherently vulnerable to an attack costing $2^{63}$ encryption under Choosen Plaintext Attack. For this reason the modern options are RSAES-OAEP (same ref), or hybrid encryption, e.g. RSA-KEM. – fgrieu Jan 15 '21 at 08:12
  • @fgrieu - your comment is only as regards the padding, right? Overall, the encryption happens as said in the answer - except for the padding details? – user93353 Jan 15 '21 at 08:14
  • What I had to say no longer fits a comment, so I made an answer. – fgrieu Jan 15 '21 at 10:03