4

Let's say I'm naive and want to generate a random integer in the range [0, m), so I do the following:

k = cryptographically_secure_rng() % m

where cryptographically_secure_rng returns a random integer in the range [0,n).
Obviously, assume m <= n. Now in general k isn't distributed uniformly anymore.

It seems to me that for any reasonably nontrivial value of m and n, this can't possibly cut the attacker's time by more than half -- and in general, it would seem to cut it by a much smaller fraction.
Yet my impression from security/crypto is that such a naive RNG would be catastrophic for the security of the system.

So my question is:
How bad is this from the standpoint of an attacker trying to attack a system by exploiting this function?

Could such a bias be abused and amplified (e.g. exponentially) to attack a system?
If so, in what kind of a system might this occur, and how?
If not, then is the problem worth worrying about, and why?

user541686
  • 1,349
  • 1
  • 11
  • 23
  • @those who migrated this question: I wasn't just talking about crypto though... imagine an attacker trying to DDoS a service by making it execute worst-case behavior on a hashtable. Or whatever. – user541686 Jan 30 '16 at 21:34
  • I don't think it is a serious problem. The distribution of k will not be uniform (unless n%m=0), but it will be close to uniform. As an attacker I will start from zero and work up, rather than m-1 and work down. – emory Jan 30 '16 at 21:41
  • I would like to understand how the modulo operation influences the uniformity of k. Can someone link or summarize the information for me? – Ella Rose Jan 30 '16 at 21:43
  • 1
    @E.Rose take for instance $n = 10$ and $m=7$. So, the value 1 has double of the chance to appear, because 1 % 7 = 8 % 7 = 1 ... – Hilder Vitor Lima Pereira Jan 30 '16 at 21:47
  • 1
    I think this question already has an answer. Unless $m$ is a factor of $n$, your RNG is no longer cryptographically secure. – r3mainer Jan 31 '16 at 00:18
  • 1
    @squeamishossifrage : ​ The distinguishing advantage increases by less than m/(2$\hspace{-0.02 in}\cdot$n), so it'll also be secure when that is negligible. ​ ​ ​ ​ –  Jan 31 '16 at 00:26
  • To add to what @RickyDemer said, aren't current cryptographic systems far more difficult to crack than by a factor of 2? What's the problem here? – user541686 Jan 31 '16 at 00:28
  • "the problem here" is that you've assumed m <= n but nothing else about m and n. ​ ​ –  Jan 31 '16 at 00:29
  • @RickyDemer: Can you elaborate? Give me an example? Pretend I'm dumb. That tells me nothing about your thought process. – user541686 Jan 31 '16 at 00:30
  • If m=2 and n=3, then using the simple method on true randomness would have a 2/3 probability of outputting 0 and a 1/3 probability of outputting 1. ​ (That differs by 1/6 from the uniform distribution on {0,1}.) ​ ​ ​ ​ –  Jan 31 '16 at 00:33
  • @RickyDemer: Yeah so that's already way less than the one-half I talked about (I think that's the max too? I didn't really think about it). My question stands exactly as-is: why is that a real problem for a system that is otherwise designed to be secure? Who cares if the attacker saves 1/6 of his time? – user541686 Jan 31 '16 at 04:03
  • ... "than" what one-half you talked about? ​ (I only see a mention of "a factor of 2".) ​ For example, if you used that as the keystream for a stream cipher, then an eavesdropper could easily distinguish between [encryptions of long mostly-0 plaintexts] and [encryptions of long mostly-1 plaintexts]. ​ ​ ​ ​ –  Jan 31 '16 at 04:12
  • @RickyDemer: Yeah that factor of 2 is precisely what I was referring to. Your point about the stream cipher could be a good answer if you'd like to write one. – user541686 Jan 31 '16 at 04:15
  • Modulo would affect the output distribution if range is not a multiple of the generator's range. Hopefully, it exists many ways to avoid the issue, see Efficiently Generating a Number in a Range for some examples – Yann Droneaud Aug 02 '22 at 10:48

1 Answers1

5

The answer is that it depends on how much larger $n$ is than $m$, and also depends on the application. If $n$ is much larger than $m$ (say 64-128 bits longer) then you are fine, as pointed out by @Ricky Demer. However, otherwise, you will have bits that have a bias.

If you are using this key in HMAC, then it doesn't matter. However, in general, it can matter and can matter a lot. Thus, I strongly recommend against doing this without making $n$ at least 64-128 bits longer than $m$. If you want to see a work where small biases are used to recover plaintext, then Analysing and Exploiting the Mantin Biases in RC4 is a good place to start.

Yehuda Lindell
  • 27,820
  • 1
  • 66
  • 83