2

Is there a more general form for the answer to this question where a random number within any range can be generated from a source with any range, while preserving uniform distribution?

This question for example looks familiar and is changing a range of 1-5 to 1-7

daniel
  • 123
  • 1
  • 7
  • In general: you can use repeated throws of an $N$ sided die to generate a "decimal" in base $N$. Thus you approximate a uniform distribution on $[0,1]$ so you can certainly generate a random integer in any range you like. Of course, with any finite number of throws there is a non-zero chance that you'll fail but in the limit it works. – lulu Apr 24 '17 at 15:56
  • So the result is any random positive integer, and the random number generator outputs a number with a range from 0 to any positive integer. – daniel Apr 24 '17 at 15:57
  • Yes....but keep in mind that it might fail if you limit the number of throws. Say you try to generate $1,2,3$ with two tosses of a fair coin. You can declare that $HH=1, HT=2,HT=3$ but if you throw $TT$ you fail. – lulu Apr 24 '17 at 15:59
  • Yep in general I was thinking if the numbers don't share common factors, then the rejection sampling comes into the answer. – daniel Apr 24 '17 at 16:01

4 Answers4

3

The simplest way to proceed is rejection sampling. For this we need to first assume $M \geq N$. If this isn't the case (maybe you're playing D&D with a d6, so $N=20,M=6$), then you should roll at least $k=\lceil \log_M(N) \rceil$ times, and then label the $k$-tuples of rolls. For the discussion of the rejection approach let us assume you have already done this and accordingly rename $M^k$ as $M$ if need be.

Now rejection sampling amounts to assigning each of the numbers $1,2,\dots,N$ to $q(M,N)$ of the possible rolls, where $q$ is the quotient when $M$ is divided by $N$. If you got one of the assigned rolls, you terminate, otherwise you start over.

This rejection method takes constant space. It fails with probability $\frac{r(M,N)}{M}$ (where $r$ is the remainder when $M$ is divided by $N$), so the average number of steps taken is $\frac{M}{M-r(M,N)}$. In theory this can run forever but the probability of long runtimes decays exponentially fast.

A different way to proceed is to simply solve a more general problem and then apply it to this one. Note that a statistically equivalent procedure to rolling a $N$ sided die is to generate a random variable $U$ which is uniform on $(0,1)$ and then return $\lceil NU \rceil$. Such a variable can be obtained using infinitely many rolls of an $M$ sided die: $U=\sum_{n=1}^\infty (X_n-1) M^{-n}$ where $X_n$ are uniform on $\{ 1,2,\dots,M \}$.

Now as it stands, that seems like a bad thing, because you can't do infinitely many rolls. But you don't need full resolution, you only need to resolve which of the intervals $(k/N,(k+1)/N]$ that $U$ will eventually be in. Given $R$ rolls and a current value of $U$, say $U_R$, you know that the final value of $U$ will be somewhere between $U_R$ and $U_R+\sum_{n=R+1}^\infty (M-1) M^{-n}=U_R+M^{-R}$. If these numbers fall in the same interval of the form above then you are done computing.

Again the runtime of this alternative method is random and not bounded. Additionally, the memory footprint of this method is also random and not bounded. The advantage of it is that it does not throw away entropy, so the probability that you will finish in the next step improves as you go on. Also, if $M<N$, although you need to roll at least $k=\lceil \log_M(N) \rceil$ times, you do not have to perform units of $k$ rolls, which could be good if for some reason $k$ were large.

An example of the latter method: $N=20,M=6$. I roll a 4, putting me in $[3/6,4/6]$, but $\lceil 3 \cdot 20/6 \rceil = 10 \neq 14 = \lceil 4 \cdot 20/6 \rceil$. Then I roll a 6, putting me in $[23/36,24/36]$. I'm still not done because $\lceil 23 \cdot 20/36 \rceil = 13 \neq 14 = \lceil 24 \cdot 20/36 \rceil$. I roll again and get a $1$, and now my roll is understood as a 13.

Ian
  • 101,645
  • I don't understand how this doesn't throw away entropy, if M and N do not share common factors I thought the answer usually depends on rejection sampling (re rolling) and then there is entropy thrown away. – daniel Apr 24 '17 at 16:23
  • @daniel The first version throws away more and more entropy each time it fails. The second one does not, because it does not truly fail at all, but keeps the information that was obtained so far. – Ian Apr 24 '17 at 16:23
  • I don't think the second (interval) method has an advantage in regards to entropy compared to the first(power/rejection) method ( - If these two methods have better names let me know). Ex. looking at rolling a 2 sided die to get 1 out of 3, after 4 dice rolls the first method has (1/16) chance of needing a further 2 rolls, the second method has (3/16) chance of needing a further re roll. Also the rejection method might have the option to return part of the rejected number back to the entropy pool. – daniel Apr 26 '17 at 07:32
  • If there was a good name for method 1 and 2 I'd accept this as the answer, as the other answers are equivalent to method 1. – daniel Apr 26 '17 at 08:15
  • @daniel Well, method 1 is a kind of rejection method. Method 2 is a way of using the $M$ sided die as an entropy source to implement the probability integral transformation. – Ian Apr 26 '17 at 10:46
1

I wrote a Python implementation a while back that did this:

def uniform_generator(m, n): #mimics m-sided die using an n-sided die
    """
    Expected number of rolls
    E = r * n^r / m
    where r = int(ceil(log(m, n)))
    """
    r = int(ceil(log(m, n)))
    while True:
        candidate = sum(n**power * randint(0, n-1) for power in range(r)) + 1
        if candidate <= m:
            return candidate

Not optimized or anything, but gets the job done.

How this works (using an $n$-sided die to mimic a $m$-sided die), starting off with two facts:

  1. Generating a number from something like $\{1, 2, 3, ..., n\}$ is the same as generating a number from $\{0, 1, 2, ..., n-1\}$ and adding $1$. We'll focus on this latter version.

  2. If we had $m < n$ (using a bigger die to mimic a smaller one), we could simply continue rolling the $n$-sided die until we got something $\leq m$ and then just take that result. Let's say we wanted to know the probability of rolling a $1$ using this strategy. There is a $1/n$ chance we roll a $1$, and a $(n-m)/n$ chance we roll too high and have to start again. This implies $p = 1/n + (n-m)p/n$, and we get $p = 1/m$, the same as if we had used a $m$-sided die to begin with. The point of this paragraph is to show that you can mimic a smaller uniform distribution by using a larger one and simply trying again if the result is too large.

Moving on: Instead of generating a number from $\{0, 1, 2, ..., n-1\}$ directly, we're actually building a base-$n$ number that is capable of returning any number in this set (and possibly a few more numbers, which we would ignore).

For instance, if you're familiar with binary (base-$2$), we could write the number $5$ as $(1 \cdot 2^2) + (0 \cdot 2^1) + (1 \cdot 2^0)$. Each term in parentheses you can think of as a "bit column" -- we're generating each digit of the expansion independently using our $n$-sided die (any digit in base $n$ ranges from $0$ to $n-1$).

In base $n$, the smallest number that can be generated is $0$, and the largest is $n^r-1$ given $r$ columns. Each result is equally likely since each digit of that result is being generated independently.

We just need to ensure we have enough columns such that we can generate any number from $0$ to $m-1$, i.e. where $n^r \geq m$, and this occurs at $r = \lceil \log_n(m) \rceil$. Then we use the $n$-sided die to generate a digit for each column and then add up the result from the expansion.

If the result plus $1$ (see fact $1$ from earlier) happens to be greater than $m$, we start over (see fact $2$ from earlier).

1

A general strategy for obtaining a uniform discrete distribution on $1$ to $N$ using an $M$-sided die is to find the least power $r$ such that $M^r\ge N$, then roll the die $r$ times. Assign radix $M$ place-values $0$ to $M-1$ to each face of the die, and compute the radix $M$ value:

$$ m_r m_{r-1} \ldots m_1 $$

using the place-values generated by the rolls. Add $1$ to the result, and if it is not more than $N$, accept that as the answer. Otherwise repeat the process until eventually a value between $1$ and $N$ is obtained.

hardmath
  • 37,015
0

The questions you linked to have strategies that are easily generalized. If $M \gt N$ you can just roll the die, accept any number $\le M$ and roll again if the number is $\gt M$. This is very simple, but may lead to a lot of rolling if $M$ is rather larger than $N$. If $M \gt 2N$ you can take the largest multiple of $N$ that is less than $M$ and accept any roll up to that multiple, take it $\bmod N$ to get the result, and reroll anything larger.

You can just roll a number of times, add up the sum, and take that $\bmod N$. This will not be exactly even, but it will be very close.

Ross Millikan
  • 374,822