0

I'm sorry if the answer to this is actually simpler than it seems to me.

I'm running AES-GCM to encrypt some data keys, but I don't actually know how to go about calculating the probability of collisions for my setup, or how to derive the maximum number of plaintexts I can encrypt without violating NIST standards on key/IV reuse. I understand the math behind the birthday problem specifically (at least, the bit where each successive student has an (n-1)/365 chance of a matching birthday), but I don't know how I can apply it practically to the case below:

How would I find out the maximum number of plaintexts I can encrypt while remaining below a given collision probability for a scheme such as the following?

  • For each plaintext, generate a 16-byte random value, a 32-byte random salt, and a 12-byte random IV.
  • Hash the 16-byte random value with the 32-byte salt with something like HKDF (SHA-256).
  • Encrypt a plaintext with the hash as a key and the 12-byte IV via AES-GCM.

(For clarification, I'm hoping to learn a general method I can apply to figure out the answer to any similar/modified scheme)


I know there are several methods (and even online calculators) for something like "approximate maximum keys given a 12-byte IV and maximum collision probability of 2-32" (it's 232) and similarly for a 32-byte value (it's about 2112). But I get the feeling that even though both the hash and IV need to match to be a collision, simply multiplying numbers won't get the right number. My gut instinct says to multiply 2256 and 296 to get 2352 (because you need both the IV and hash output to be the same as another set of IV+hash to get a collision) and then approximate the chance of a collision using Stirling or Taylor with 2352 as the "number of days in a year" space, but that can't be right...right?

Side note: I mention the hashing/salting process as well because I feel like hashing might increase the chance of collisions due to the Pigeonhole Principle-- I'm feeding in 16 + 32 bytes of input and getting 32 bytes as output, could this possibly affect the end result?

0 Answers0