3

Will the following simple hash table construction algorithm be able to construct a static hash table in $O(n)$ expected time, and will the worst case access time be $O(1)$? If not, what are the problems, and is there a simple solution?

Construction

Let's assume we have a static set of $n$ keys. We also have a fully random universal hash function $h_i(x)$ (let's say we use SHA-2 with $i$ as the IV). Now we try to partition the set of keys into $m$ buckets, where $m = \lceil n/100 \rceil$. First we try with $i$ 0. If one bucket has more than 1000 entries (this is possible, but extremely unlikely, much less likely than to get a SHA-2 hash collision), start from scratch, that is re-build the hash table but use a different hash function, that is $h_1(x)$. Do this in a loop until we find an index $i$ that works. Now the largest bucket has less than 1000 entries. Each bucket is stored as a list, sorted by hash value. The index $i$ is also stored.

Evaluation

For the static hash table constructed above, with the stored index $i$, calculate $h_i(x)$. Calculate the bucket, and in this bucket, do a binary search. I believe (hope) that this is an O(1) operation, because there are at most 1000 entries in this bucket.

Closely related: (When) is hash table lookup O(1)?. But I have a static set, and I have a fully random universal hash function. I know about FKS hashing, but I can't use it in my case, because it would require too much memory, and I would like to have a simpler algorithm. I understand my algorithm is terribly inefficient, but I'm mostly interested in a guaranteed $O(1)$ worst case access time.

  • You should think about FKS hashing again. 1. It is simple 2. It uses only $O(n)$ space 3. It has guaranteed $O(1)$ access time. – A.Schulz Apr 05 '16 at 16:34
  • @A.Schulz ok I will have a look at that, but I'm afraid I can't use it. I might need to ask more questions... By the way, the O(n) space is not a problem for me. – Thomas Mueller Apr 05 '16 at 17:30
  • What mathematics makes you confident that the likelihood of a single bucket having more than 1000 items is low? Have you looked at "Balls into bins - a simple and tight analysis"? – jbapple Apr 06 '16 at 02:41
  • Have you considered using other non-FKS perfect hashing schemes, like cuckoo hashing? – jbapple Apr 06 '16 at 02:45
  • @jbapple thanks for mentioning "Balls into bins"! Yes, I need to proof that the probability is extremely low. I am writing my own minimal perfect hashing algorithm, so I don't want to (can't) use cockoo hashing and so on. – Thomas Mueller Apr 06 '16 at 07:02

1 Answers1

4

Yes, your access time is $\mathcal{O}(1)$. Your construction time is a bit more complicated. Let $P(k)$ be the propability, that a set structure containing $k$ elements has a bucket with more than 1000 entries. If there is an overflow after the initial construction in $\mathcal{O}(k)$, everything is rearranged with costs of $\mathcal{O}(k)$. But this might clash again. Thus you get expected costs of $\mathcal{O}(k) + \sum \limits^\infty_{j=1}P(k)^j\mathcal{O}(k)=\mathcal{O}\left(\frac{k}{1-P(k)}\right)$.

Martin Glauer
  • 529
  • 4
  • 8
  • 3
    Thanks for the edits. The first sentence looks correct, but now I'm afraid the rest of the answer looks wrong, for a different reason. It looks like you've misunderstood the proposal. There is no "insert" operation. As the question says, this is a static set of keys. There is a one-time computation to build the data structure. That computation does not work by checking for bucket-overflow after adding each item (as your analysis seems to assume). Rather, it hashes all $n$, then checks for bucket-overflow. Consequently, the running time of the initial stage is just $O(n/(1-P(n)))$. – D.W. Apr 06 '16 at 15:25