Why can't hash tables provide O(n) sorting?

Question

Since a sufficiently large hash table takes constant time to both insert and retrieve data, should it not be possible to sort an array by simply inserting each element into the hash table, and then retrieving them in order?

You just insert each number into the hash table, and remember the lowest and highest number inserted. Then for each number in that range, in order, test if it is present in the hash table.

If the array being sorted contains no gaps between values (i.e. it can be [1,3,2] but NOT [1,3,4]), this should give you O(N) time complexity.

Is this correct? I don't think I've ever heard of hash tables being used this way - am I missing something? Or are the restrictions (numeric array with no gaps) too much for it to be practically useful?

@Raphael 1) They are not ordered, but if you require the the only data stored in it is a series of numbers, and you know the min and max number, then you know all of the entries present (although you also "know" some that might not be). Right, they have O(N)... wait, I see what you're saying - O(N) N times is N^2... hm. Maybe I did misunderstand that? — Benubird, Jun 18 '15 at 13:22
@Raphael Ok, apparently hash tables CAN have O(1) worst case: http://cs.stackexchange.com/a/43192/34565 — Benubird, Jun 18 '15 at 13:37

score 7 · Answer 1 · answered Jun 22 '15 at 15:14

The reason you've never heard of hash tables being used like this is that hash tables are either "too much" or "not enough" in this situation.

If the range of elements being sorted is small, then you can use counting sort, or something similar. But for counting sort, you would almost certainly want to use a simple array rather than a hash table. If you know the max and min values of the numbers, then the array can be of size max-min+1, and value x would be associated with index x-min. By using an array, you avoid the extra complications of hash tables. Those extra complications are buying you nothing in this application, so hash tables are "too much".

Notice that your "no gaps" restriction ensures that the range of numbers is small (no larger than the number of elements in your original input).
However, if gaps can be present then the range of elements is potentially MUCH larger than the number of elements in your original input. Then your running time is dominated by the size of that range, NOT by the size of your original input. Hash tables do not help you deal with those gaps efficiently, so in this case hash tables are "not enough".

score 5 · Accepted Answer · answered Jun 18 '15 at 10:09

5

The algorithm you give is exponential time, not linear. If you're given $n$ $b$-bit entries, the size of your input is $nb$ bits but the algorithm takes time $\Theta(2^b)$, which is exponential in the input length. In particular, your algorithm takes $2^k$ steps to sort the roughly $2k$-bit input $\{0, 2^k\}$.

answered Jun 18 '15 at 10:09

David Richerby

81,689
26
141
235

Given b bits, the maximum count of numbers that can be represented by it is 2^b - which means if the time complexity is O(2^b), it is at most O(N). Or did you mean it would be O(2^nb)? I don't follow why that would be the case. – Benubird Jun 18 '15 at 13:40
@Benubird Your algorithm is somewhat effective if you restrict yourself to permutations of ${1, \dots, n}$ as inputs. If you allow arbitrary numbers, it fails: you need to traverse an array of size approximately $2^b$ as David explains. – Raphael Jun 18 '15 at 14:30
@Benubird That's backwards. $N$ is (if you require all elements to be distinct) at most $2^b$ but $2^b$ is not at most $N$. – Tom van der Zanden Jun 18 '15 at 14:32

score 0 · Answer 3 · answered Jan 17 '22 at 13:49

kudos, The title of question if alone is to be considered- does have an ambitious idea, as there does exist a related research paper which sorts in linear time provided the constraints of no duplicates and knowing the range of input (gaps are allowed): Hash sort: A linear time complexity multiple-dimensional sort algorithm

However the steps mentioned in the paper are not as trivial as the question.

Coming to your last sub-question : Or are the restrictions (numeric array with no gaps) too much for it to be practically useful?

Given that we have the range (lowest and highest) and there are no gaps in numbers, why would we take the toll to sort it, the same practically can be represented by an interval [low, high], numbers if needed in order of this interval can practically be generated simply by incrementing which is not greater than linear time.

The sole purpose of sorting is to find an ordered collection, if the resulting ordered collection is known given the constraints (range and no gaps), its practically of no use.

The paper you link looks like baloney to me. You can't trust random papers that haven't been peer-reviewed; some are awesome, and some are nonsense. I find the paper hard to follow, but the paper appears to make the additional assumption -- without clearly stating it -- that the data falls into a small range. In that case, counting sort or radix sort is already fine, you don't need a hash function. On a superficial glance, I wonder if that paper is reinventing radix sort. In any case, the claims in the first paragraph of your answer are wrong. — D.W., Jan 17 '22 at 15:44

Why can't hash tables provide O(n) sorting?

3 Answers3