2

I'm trying to create (or find) an algorithm to iterate over a range (e.g.: 1-100, etc) but randomly, without any duplicate values (similar result to random sort, etc) but without actually storing the full range in memory (such as using list(range(1, 100)) in Python, etc).

Problem is, I'm not sure how to tackle this. As mentioned, I know about random sort as mentioned, but I would need to store the full range in memory (or large part of the range). Besides that, there are many other way, but they all depend on the full range being stored in memory too.

Now how random it should be? I don't really mind the quality of the randomness. For comparison’s sake, I want it to be as random as using randrange from the random module in Python. As long as it "look" random and does not have any duplicate.

Is there any way to do this?

  • 1
    Can you tolerate a small amount of omissions or repetitions? Can you tolerate poor-quality randomness? – Andrej Bauer Oct 11 '23 at 19:12
  • 1
    Would it be acceptable to store 1 bit per number? – Andrej Bauer Oct 11 '23 at 19:14
  • I don't really mind if it's truly random or not. I probably should edit my post to reflect what I mean by "random", but basically, what I meant was getting result similar to randrange (in Python), but without any duplicate or storing the full range in memory. So I don't really mind the quality of the randomness. I would prefer if it does not have any omissions or repetitions, but if you think it's an interesting way to tackle this, I don't mind you posting the answer :) @AndrejBauer – Nordine Lotfi Oct 11 '23 at 19:14
  • I don't mind if you store 1 bit per number either. This does sound like an interesting approach @AndrejBauer (sorry for double ping, saw your comment too late) – Nordine Lotfi Oct 11 '23 at 19:15
  • 1
    Look up Reservoir sampling it’s a method for selecting a sample of items from a larger list without loading the entire list. I’ll write some python code if you can’t solve it. – DrJay Oct 11 '23 at 22:46

1 Answers1

4

One approach is to use the sequence of values $E(1), E(2), E(3), E(4), \dots$ where $E$ is a format-preserving encryption cipher with a random key. This ensures the sequence won't repeat and that each element in the sequence will be in the desired range.

See also Pseudo random, unqiue integer numbers in a given range, Shuffling a collection too large for memory.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • Thanks, this is good. But will it also work if say, the amount of numbers I want to generate is equal to the max range? say in python, range(1, 101) but what I want to generate are also 100 random number, but without duplicate (and without storing the full range)? I recall I tried one such implementation for Feistel cipher before, and it worked as long as I limited the amount of number I wanted to generate (it worked fine for preventing duplicate, etc) but if I tried to do it using the same amount of number as the max range, then it had the "normal" range as output (e.g.: 1, 2, 3...100). – Nordine Lotfi Oct 12 '23 at 07:08
  • @NordineLotfi, yes, it will. I don't understand what problem you were facing before. – D.W. Oct 12 '23 at 08:06
  • I mean, this is extremely intricate to implement, especially based on the two paper I found for it. And existing implementation seems to not support any length either (e.g.: https://github.com/mysto/python-fpe for example does not support anything larger than 48 in length, etc). – Nordine Lotfi Oct 16 '23 at 05:13