7

A shuffling algorithm is supposed to generate a random permutation of a given finite set. So, for a set of size $n$, a shuffling algorithm should return any of the $n!$ permutations of the set uniformly at random.

Also, conceptually, a randomized algorithm can be viewed as a deterministic algorithm of the input and some random seed. Let $S$ be any shuffling algorithm. On input $X$ of size $n$, its output is a function of the randomness it has read. To produce $n!$ different outputs, $S$ must have read at least $\log(n!) = \Omega(n \log n)$ bits of randomness. Hence, any shuffling algorithm must take $\Omega(n \log n)$ time.

On the other hand, the Fisher-Yates shuffle is widely believed to run in $O(n)$ time. Is there something wrong with my argument? If not, why is this belief so widespread?

Bernhard Barker
  • 945
  • 6
  • 13
Alex Smart
  • 175
  • 6

2 Answers2

12

Your argument appears to be perfectly valid (Fisher-Yates does indeed require $\log (n!)$ bits of randomness), the discrepancy comes in by making different assumptions about the complexity of the random number generation.

You're assuming generating a random number between $0$ and $n$ takes $O(\log n)$.

But, when saying that the Fisher-Yates shuffle is $O(n)$, one assumes generating a random number takes $O(1)$.

An integer index (which is used in the Fisher-Yates shuffle) is generally considered $O(1)$ space (even though it's technically $O(\log n)$) because you can index way more data with a simple 64-bit integer than can currently reasonably fit into the memory of any computer most of us have access to $^{1}$. And the space complexity per number is same as the complexity for the number of random bits that must be generated for that number.

1: To go beyond the number of elements a 64-bit integer can index, we would require $2^{64}$ = $18446744073709551616$ elements, and, assuming 1 bit (which is obviously very small) per element, this requires 2 exabytes.

Bernhard Barker
  • 945
  • 6
  • 13
  • I am not making any assumptions about the complexity of random number generation. Just to read $\Omega(n \log n)$ random bits would still require $\Omega(n \log n)$ time. – Alex Smart Aug 28 '13 at 17:59
  • 2
    The same applies whether we're talking about generating them or reading them. – Bernhard Barker Aug 28 '13 at 18:06
  • @AlexSmart To be more specific: the number of random bits is certainly a lower bound on the (total) time a random bit generator takes to generate them. Of course, reading them yields a lower bound on the whole algorithm -- depending on your cost model. – Raphael Aug 29 '13 at 08:49
  • @AlexSmart: I can read $\Omega(n \log n)$ random bits in $\Omega(n)$ time if I have $\Omega(\log n)$ processors working in parallel. If you're going to count each individual operation on a $\Theta(b)$-bit register as $\Theta(b)$ operations, then you also have to acknowledge that your processor is built out of circuits that do $\Theta(b)$ operations in parallel. –  Mar 03 '18 at 03:56
6

In the analysis of Fisher-Yates shuffle and similar algorithms we typically assume that we can draw a number from $[1..n]$ uniformly at random in time $O(1)$. Also, we usually assume a RAM model with uniform cost, that is we can read such numbers in time $O(1)$, too.

If you include random number generation into your runtime analysis (which you probably should do), make sure to be clear about whether you are talking about worst-case or expected time.

Depending on what kind of random source you have at hand, assuming constant-time random number generation may or may not be warrented. For instance, a (black-box, constant-time) source of $\mathcal{U}(0,1)$ (uniform) real numbers can be used thus, but a $\mathcal{B}(0.5)$ (Bernoulli) bit source can not. See here for details.

Raphael
  • 72,336
  • 29
  • 179
  • 389