Exactly how bad is using 'mod' to clamp reduce numbers to a given range?

Question

Let's say I'm naive and want to generate a random integer in the range [0, m), so I do the following:

k = cryptographically_secure_rng() % m

where cryptographically_secure_rng returns a random integer in the range [0,n).
Obviously, assume m <= n. Now in general k isn't distributed uniformly anymore.

It seems to me that for any reasonably nontrivial value of m and n, this can't possibly cut the attacker's time by more than half -- and in general, it would seem to cut it by a much smaller fraction.
Yet my impression from security/crypto is that such a naive RNG would be catastrophic for the security of the system.

So my question is:
How bad is this from the standpoint of an attacker trying to attack a system by exploiting this function?

Could such a bias be abused and amplified (e.g. exponentially) to attack a system?
If so, in what kind of a system might this occur, and how?
If not, then is the problem worth worrying about, and why?

@those who migrated this question: I wasn't just talking about crypto though... imagine an attacker trying to DDoS a service by making it execute worst-case behavior on a hashtable. Or whatever. — user541686, Jan 30 '16 at 21:34
I don't think it is a serious problem. The distribution of k will not be uniform (unless n%m=0), but it will be close to uniform. As an attacker I will start from zero and work up, rather than m-1 and work down. — emory, Jan 30 '16 at 21:41
I would like to understand how the modulo operation influences the uniformity of k. Can someone link or summarize the information for me? — Ella Rose, Jan 30 '16 at 21:43
@E.Rose take for instance $n = 10$ and $m=7$. So, the value 1 has double of the chance to appear, because 1 % 7 = 8 % 7 = 1 ... — Hilder Vitor Lima Pereira, Jan 30 '16 at 21:47
I think this question already has an answer. Unless $m$ is a factor of $n$, your RNG is no longer cryptographically secure. — r3mainer, Jan 31 '16 at 00:18
@squeamishossifrage : The distinguishing advantage increases by less than m/(2$\hspace{-0.02 in}\cdot$n), so it'll also be secure when that is negligible. — , Jan 31 '16 at 00:26
To add to what @RickyDemer said, aren't current cryptographic systems far more difficult to crack than by a factor of 2? What's the problem here? — user541686, Jan 31 '16 at 00:28
"the problem here" is that you've assumed m <= n but nothing else about m and n. — , Jan 31 '16 at 00:29
@RickyDemer: Can you elaborate? Give me an example? Pretend I'm dumb. That tells me nothing about your thought process. — user541686, Jan 31 '16 at 00:30
If m=2 and n=3, then using the simple method on true randomness would have a 2/3 probability of outputting 0 and a 1/3 probability of outputting 1. (That differs by 1/6 from the uniform distribution on {0,1}.) — , Jan 31 '16 at 00:33
@RickyDemer: Yeah so that's already way less than the one-half I talked about (I think that's the max too? I didn't really think about it). My question stands exactly as-is: why is that a real problem for a system that is otherwise designed to be secure? Who cares if the attacker saves 1/6 of his time? — user541686, Jan 31 '16 at 04:03
... "than" what one-half you talked about? (I only see a mention of "a factor of 2".) For example, if you used that as the keystream for a stream cipher, then an eavesdropper could easily distinguish between [encryptions of long mostly-0 plaintexts] and [encryptions of long mostly-1 plaintexts]. — , Jan 31 '16 at 04:12
@RickyDemer: Yeah that factor of 2 is precisely what I was referring to. Your point about the stream cipher could be a good answer if you'd like to write one. — user541686, Jan 31 '16 at 04:15
Modulo would affect the output distribution if range is not a multiple of the generator's range. Hopefully, it exists many ways to avoid the issue, see Efficiently Generating a Number in a Range for some examples — Yann Droneaud, Aug 02 '22 at 10:48

score 5 · Answer 1 · answered Jan 31 '16 at 05:59

The answer is that it depends on how much larger $n$ is than $m$, and also depends on the application. If $n$ is much larger than $m$ (say 64-128 bits longer) then you are fine, as pointed out by @Ricky Demer. However, otherwise, you will have bits that have a bias.

If you are using this key in HMAC, then it doesn't matter. However, in general, it can matter and can matter a lot. Thus, I strongly recommend against doing this without making $n$ at least 64-128 bits longer than $m$. If you want to see a work where small biases are used to recover plaintext, then Analysing and Exploiting the Mantin Biases in RC4 is a good place to start.

Exactly how bad is using 'mod' to clamp reduce numbers to a given range?

1 Answers1