Is injective encoding of a message important for Elliptic Curve ElGamal?

Question

I'm trying to understand if using non-injective encodings for Elliptic Curve ElGamal encryption is dangerous.

A standard probabilistic encoding defined by Koblitz for elliptic curves over $\mathbb{F}_p$ works roughly as follows (see this answer for details):

Fix some value $\ell < bitlength(p)$
Choose a random integer $r \gets \mathbb{Z}_p$
Set $x=r∥m$, compute corresponding $y^2$, and check if it's a square. If so, take one of the two possible $y$ values and set $(x,y)$ as the encoding $m$. Otherwise, sample new $r$ and start over.

It's not constant time. So, timing attacks are possible.

It's not an injective encoding. The same message $m$ can be mapped to multiple points with different $r$, so it's not a one-to-one correspondence. However, I don't know if it's possible to exploit point collisions and encode $m||r$ to $P$ but then decode it as $m^*$.

Elligator paper justifies the importance of injective encoding via traffic inspection avoidance case. But it's not the case here - the encoded point will be encrypted.

So, does message encoding have to be injective? If so, why?

EDIT: I was wrong to assume that mapping the message to multiple points means we must have collisions. Message space is small, and the Koblitz algorithm uses a fixed length $r$, so there are no collisions. Each message can be mapped to several points, but those points do not intersect.

My confusion comes from this paper, which states that:

...one can construct a probabilistic injective encoding with equal to about half of the size of $G$, as we show in §2.4, but we do not know of provable constructions achieving a better $\ell$ in general.

I checked the Koblitz algorithm and didn't see any "half of the size of the $G$" restrictions.

Now, it seems that half of the $G$ size might be set due to error probability or the desire to guarantee that all messages up to $\ell$ bits can be encoded.

Anyway, is there any advantage (from a security point of view) of using say Elligator encoding (bijective mapping) instead of Koblitz's probabilistic one for encoding the plaintext?

It's used "injective" figuratively, since the encoding is not even a function. — fgrieu, Feb 14 '24 at 17:48
Minor correction: "Choose a random integer $r \gets \mathbb{Z}_p$" - actually, the requirement is that $r || m < p$, hence if $0 \le m < 2^{128}$, then $r < p / 2^{128}$ — poncho, Feb 14 '24 at 19:13
Also, when applied to functions, "injective" means "two different inputs are never mapped to the same output"; for encryption algorithms (at least, the ones that are always uniquely decryptable), that's the case. When you say it's not "an injective encoding", what you mean is that it's not deterministic (and my answer explained why, for public key encryption, we almost always go with a nondeterministic option) — poncho, Feb 14 '24 at 21:29
@poncho Do you know where 2^128 is coming from? Negligible probability of failing to encode? — pintor, Feb 15 '24 at 11:13
@poncho Ah, so we cannot have 2 messages mapped to the same point? I've assumed the message space m||r is much larger than the number of points and collisions are unavoidable. I guess I was wrong — pintor, Feb 15 '24 at 11:15

poncho · Answer 1 · 2024-02-14T19:25:51.317

It's not constant time. So, timing attacks are possible.

This is (mostly) not true. Timing attacks are of interest only if the timing is correlated to some secret. After all, practical implementations are usually nonconstant time, because of interrupts, cache accesses by unrelated processes and DMA accesses. This doesn't open up any attacks, because these effects are uncorrelated to any secret the attacker may be interested in (and hence those secrets are not exposed).

In this case, we iterate if $r || m$ happens to not be a possible $x$ value. Now, for any particular $m$ value, it appears likely (I don't know of a proof) that the number of $r || m$ values which are not possible $x$ value are going to be approximately the same. Now, they're not going to be exactly the same (and hence there will be some correlation); however that correlation will be extremely weak; that gives you, at best, extremely weak probabilistic information.

For example, if $r$ and $m$ are both 128 bits, then the expected bias between 'likely' and 'unlikely' messages would be circa $2^{-64}$. That means that would need about $2^{128}$ encryptions of the same message for an attacker to make a determination whether a guessed message was plausibly correct - not a real concern.

(That said, the above argument is not perfectly satisfying. My opinion is that, to do public key encryption with ECC, you're better off we ECIES - we get away from the plausibility argument, and we no longer have any length limitation on the message. Of course, this is off-topic for your question).

It's not an injective encoding. The same message $m$ can be mapped to multiple points with different $r$, so it's not a one-to-one correspondence.

Actually, with public key encryption, that's a good thing. Consider the opposite: given a message $m$, it would encrypt to only one possible ciphertext $c$. In that case, the adversary could take a guess of the message $m'$, encrypt it, and if it's $c$, he learns his guess was correct.

Because of this, we add nondeterminism (you call it 'injective encoding') to public key encryption.

I see. You are right! There are no timing attacks for the original Koblitz algorithm. I was thinking about a modified Koblitz, where r is not random but more of a nonce that goes from 1 to some K. — pintor, Feb 15 '24 at 11:18
So we can map the same message to several points, but those points never intersect for any two messages, right? Huh, I've somehow assumed message space was much larger than the point space, and collisions would be inevitable. — pintor, Feb 15 '24 at 11:23

Is injective encoding of a message important for Elliptic Curve ElGamal?

1 Answers1