Analyzing the time and space requirements of a Most Significant Digit first radix sort algorithm

Question

In a previous question of mine, I asked how efficient is the Least Significant Digit first radix sort algorithm for sorting 32-bit integers. It turns out that the bounds are:

Time: $ \Theta (\frac{32}{k}(n+2^k))$

Space: $\Theta (2n + 2^k)$

I have read various articles online about the efficiency of sorting the numbers by loking at the first $k$ most significant digits, applying counting sort, and recursively looking at the next $k$ digits of every new group that was just created. Most of the articles suggest that the MSD approach is actually as efficient as the LSD approach and some times even more efficient because it is more cache efficient. I tried to do the time and space analysis to see if that is the case, at least theoretically.

MSD is recursive so in every level, we have to work with $n$ elements in total because the union of all the groups of that level is going to give you the input set. However, we also have to take into account the buckets, and in this case the total amount of bucket arrays that we will need is going to be logarithmic. We know that the bucket has $2^k$ elements and now we need to find the total amount of levels. In the first level, we have a problem of size $n$, in the second level a problem of size $\frac{n}{2^k}$, in the i-th level a probem of size $\frac{n}{2^{ik}}$. When does the problem size become 1? When $\frac{n}{2^{ik}} = 1 $ so when $i = \frac{log_{2}n}{k}$.

We have $\frac{log_{2}n}{k}$ levels and in every level we need a new bucket array and the input, $\Theta(2n + 2^k \frac{log_{2}n}{k}) = \Theta(2n + 2^{k-1} \log_{2}n)$. We have $2n$ because during the counting phase we need that extra temporary array.

Now what about the time? The problem with the time is that for every sub problem generated, we will have to scan through the entire bucket array to find the next sub problems!

In every level however, we will only have to spend $\Theta(n)$ time for reading stuff and writing from/to the input. However I am really stuck at this point.

In the first level, we have one problem, in the second level $2^k$ problems.. so in total we have

$\sum_{i=1}^{\frac{\log_{2}n}{k}}2^{ik}$ sub problems?? So the time is:

$\Theta(2n + 2^k\sum_{i=1}^{\frac{\log_{2}n}{k}}2^{ik})$

This looks very bad compared to LSD radix sort... Am I doing something wrong here?

Obligatory $\Theta$-police comment: since $32$ is a constant and $k \leq 32$, all you have there is $\Theta(n)$ for both algorithms. So if you want to observe actual differences, you'll have to get rid of that $\Theta$ and look more closely. — Raphael, Mar 11 '15 at 18:38
you are right that k is constant so it can be removed, but I would really want to see an analysis in terms of both n and k. — jsguy, Mar 11 '15 at 18:42
I understand that. In that case, you need to do away with $\Theta$, as I said. You say, "oh noes, there's an extra $2^k$ term!" but the other $\Theta$ may hide an extra factor $2^{64}$. Mathematically speaking, of course. You need more rigorous analysis here. Maybe our reference question can be of help. — Raphael, Mar 11 '15 at 18:46
My analysis was actually wrong in the way that I did it. The number of levels is going to be $32/k$ so there might be an extra constant term in that case.. but I tried and can not figure out how to find the exact form as the one in LSD.
the total amount of times that we will have to read the bucket array, is going to be $\sum_{i=1}^{32/k} 2^{ik}$ which according to wolfram alpha is $\frac{2^{32}2^k}{2^k-1}$, so the time complexity becomes $\Theta(\frac{32}{k}2n + \frac{2^{32}2^k}{2^k-1})$ — jsguy, Mar 11 '15 at 19:09
after thinking a little bit more, I realized that I am actually looking at the very worst case scenario... I am really stuck at the moment. Why is MSD performing as good as LSD in theory? — jsguy, Mar 11 '15 at 19:48
@Raphael You can still use asymptotic notation, but you have to replace $32$ with an arbitrary $m$, and state that your asymptotic notations hide constants depending only on $n$. — Yuval Filmus, Mar 11 '15 at 19:52
@YuvalFilmus You can, but then you have to a) deal with multiple limit processes and b) be careful not to draw wrong conclusions since your input grows differently in the parameters. But yes, it can be done. — Raphael, Mar 11 '15 at 20:16
@Raphael On the contrary, there are no multiple limit processes. All limits are with respect to $n$. — Yuval Filmus, Mar 11 '15 at 20:17
@YuvalFilmus Then $m$ is still a constant and my above comment applies. (Of course, one can be careful not hide these $m$-constants in the Landau class, but the result does not say much, strictly speaking.) — Raphael, Mar 11 '15 at 20:18
@Raphael No, $m$ is not a constant, it's a parameter. For example, $2m = O(m)$ but $2m \neq O(1)$. — Yuval Filmus, Mar 11 '15 at 20:20
Yuval, with m do you think the analysis would be easier? I have a feeling that when I say "in the second level we will have 2^k sub problems", I am looking at the worst case scenario, because the sub problems can actually be much less than that depending on the distribution of the input. But the worst case scenario for LSD seems to be much better than that for MSD... — jsguy, Mar 11 '15 at 20:52
@YuvalFilmus The only way in which that makes sense if you only allow strict inequalities for bounding $m$, i.e. such that hold for all $n$. Is that what you are saying? Because "$_ = O(m)$" says, by definition, something about $m \to \infty$. — Raphael, Mar 11 '15 at 22:06
@Raphael That's a very narrow view of big O notation. I say that $f = O(g)$ if there exists $N,C$ such that for $n \geq N$ we have $f(n) \leq Cg(n)$. The functions $f,g$ are, however, allowed to have other variables, which are universally quantified. Under this definition, indeed $2m = O(m)$ while $2m$ is not $O(1)$, since you cannot find $C$ such that $2m \leq C$ for all $m$. — Yuval Filmus, Mar 11 '15 at 22:09
@YuvalFilmus Fair (as I say above), but if you do that, you should clearly indicate the limit process (which replaces $n \geq N$) you use. $O_{m \to \infty}(m)$ is not the same as $O_{n \to \infty}(m)$. Also, it's not clear to me how your notation distinguishes between $\leq m$ and $= m$ (it probably doesn't, which might make the $O$ weaker than it may need to be). I think the OP (and many others) would be better served by investing some more effort and (if possible) express their result as something of the form $f(n,\dots) + o_{n \to \infty}(f(n,\dots))$. — Raphael, Mar 12 '15 at 06:45

Analyzing the time and space requirements of a Most Significant Digit first radix sort algorithm

0 Answers0

Linked