You are given a List of numbers (length unknown).
Let's say the length is 10.
GetRandom(List) is called once. If implemented correctly, each number has 1/10 probability of being returned.
GetRandom(List) is called 100 times. If implemented correctly, each number will appear 10 times in the result.
Fine?
You now have to do the same for a Stream of numbers.
GetRandom(Stream, 5) is called. This adds 5 to the Stream. Stream is of length N=1, then 5 is returned (probability = 1/N = 1)
GetRandom(Stream, 3) is called. 3 is added to stream. N=2. Either 3 or 5 is returned (prob = 1/2).
How will this be tested for correctness?
If GetRandom(Stream) (without adding any more numbers) is called 10 times when length of list is 2, each number (3 & 5) should be returned ~5 times.
GetRandom(Stream, 7) is called. 7 is added to Stream. N = 3. One of the 3 numbers (5, 3, 7) is returned (probability = 1/3).
But how will this be tested for correctness?
If GetRandom(Stream) is called 10 times when N = 3, each number is returned ~3 times.
So far, so good ?
Alright, here is my algorithm:
N = 0
Pointer = 0
GetRandom(Stream, Number = NULL):
Pointer += 1
if Number is NOT NULL:
N += 1
else:
if Pointer == N:
Pointer = 1 # Reset
return Stream[Pointer] # Assume 1-based indexes
This simply cycles through all numbers in order / round-robin fashion.
If GetRandom(Stream) is called on a Stream with 100 numbers, 1000 times, each number will appear exactly 10 times.
If GetRandom(Stream, 77) is called on a Stream with 100 numbers (77 is the 101st number), while Pointer got reset to initial location 1. Then when GetRandom(Stream) is called 101 times, then on the 101st call, 77 will be output, which satisfies the required probability of 1/101. If it's called 202 times, then on the 202th call, 77 will be output, which satisfies 2/202.
Why bother with Reservoir Sampling k/k+1, Why bother with a random number generator?