Why can't we use a hash tables for collision resolving in hash tables?

Question

To prevent collisions, hash tables with open addressing use a methodology to chain the contents. Why can't we use another hash table allocated to each slot of the primary hash table?

Your question is quite thin. Please give some detail of the data structure you are proposing here. — Raphael, Oct 28 '12 at 10:58
duplicate of "Using hash tables instead of lists for buckets in hash tables"? — David Cary, Aug 15 '16 at 04:02

score 8 · Accepted Answer · edited Aug 05 '16 at 07:53

The method you propose is, as far as I know, the historically first one for "perfect" hashing in linear space. In perfect hashing, lookup takes $O(1)$ time in the worst-case. (Recall that in most simple hash tables, lookup takes $O(1)$ time only in expectation.)

The idea is to use chaining (rather than open addressing), but make each chain a hash table of size $\Omega(m^2)$ where $m$ is the number of items in the bucket.

This is sometimes called "FKS", after the initials of the inventors. Here are some freely available resources:

"Universal and Perfect Hashing" by Avrim Blum
"Storing a Sparse Table with $O(1)$ Worst Case Access Time" by Fredman et al.
"Dynamic perfect hashing" (Wikipedia) supports dynamic insertion and deletion with expected amortized $O(1)$ time, in addition to (like other perfect hashing algorithms) $O(1)$ worst-case lookup time.

score 4 · Answer 2 · answered Oct 28 '12 at 06:18

4

The short answer is that this is, more or less, equivalent to having one hash table. Let's say you're hashing $n$ items into $m$ slots, and each slot has its own hash table of size $c$. You propose first hashing into one of the $m$ slots using some function $h$, then hashing again into one of the $c$ slots using some function $g$. This is more or less equivalent to hashing to any of the $cm$ slots initially, with a new hash function that combines your initial two hash functions. In other words, instead of taking $h(x)$ for which of the $m$ slots to hash it to and then $g(x)$ to find the appropriate slot in the subtable, you can take $f(x) = h(x)m + g(x)$ and get the same answer with a new hash function $f$.

Something else to consider is what you do if there are collisions in the second hash. You need...some sort of methodology to chain the contents! So you're back where you started with linear probing/chaining/etc, and all you've done is increase your hash table size from $m$ to $cm$.

answered Oct 28 '12 at 06:18

SamM

1,702
13
20

1

Dynamically allocate inner hash tables, and then nested hashes are different than your suggestion. Certainly it's still true that eventually you need to break the hash inside hash recursion. – rrenaud Oct 28 '12 at 16:53
@rrenaud In that case, why not dynamically allocate the outer one? – Raphael Nov 02 '12 at 07:10
@Raphael: One potential reason is that--especially in multi-threaded applications it's much easier to reason about mutable structures whose identity is immutable, than those whose content and identity are both mutable. Among other things, it's very important to prevent any situation where two methods which think they are working with the same object are in fact modifying different objects, but in fact one has a reference to an object which has been superseded by the other. Never having any object get superseded avoids that problem. – supercat Jan 30 '14 at 17:55
If hypothetically some system always used the same g(x) secondary hash function independent of the value of h(x), then that system would have the problems you describe. That may be why, when people actually do implement dynamic perfect hashing, they use a different secondary hash function for every one of the secondary hash tables. When the hash functions are chosen and re-chosen appropriately, they don't need any methodology to chain the contents. – David Cary Aug 13 '16 at 03:56

Why can't we use a hash tables for collision resolving in hash tables?

2 Answers2

Linked