1

See below a screenshot from CLRS 3rd Edition (Section 11.5, "Perfect Hashing"). The last sentence of the last paragraph says that the choice of $m_j = n^2_j$ leads to collision-free constant-time lookup.

Why? Is it because:

  1. It assumes that one could, in theory, try or sample several hashing functions $\mathcal{h} \in \mathcal{H}$, checking for each for collisions, until we find one $h'$ that doesn't have them?
  2. the probability of sampling $h$ without collisions is 1/2, so it shouldn't be that difficult to find $h'$?

If not, why?

enter image description here

Josh
  • 296
  • 1
  • 11
  • 1
    Read the first sentence of this quoted paragraph again. That must have been explained somewhere. – Pseudonym Mar 10 '20 at 01:50
  • Thanks @Pseudonym I expanded the text to include it all. – Josh Mar 10 '20 at 03:23
  • Excellent. Did that answer your question? – Pseudonym Mar 10 '20 at 03:32
  • Not really, @Pseudonym thanks. Why would p(collision) < 1/2 lead to collision-free constant-time lookups? The secondary hash table isn't even designed to handle collisions (e.g. chaining or open addressing). It's just a hash table. And even if it did somehow, a p<1/2 doesn't necessarily mean that you get collision-free lookups (even on avg) since it would depend on how you handle those collisions. – Josh Mar 10 '20 at 03:35
  • I suppose it doesn't say there exactly how you find those hash functions. It only says that you can. If so, maybe a e.g. brute force search sampling from $\mathcal{H}$ does it? – Josh Mar 10 '20 at 03:40
  • 1
    The statement that you quote is informal. When the authors actually use Theorem 11.9, you will see how exactly the theorem gets used. Just read on. – Yuval Filmus Mar 10 '20 at 15:06

1 Answers1

1

If $n_j$ keys hash to slot $j$ of the first-level hashtable, and if $n_j > 1$, we will need to use a second-level hashtable for that slot. Theorem 11.9 helps us in the sense that, if we keep this second hashtable size as $n_j^2$, then atleast 1/2 of the hash-functions in any universal-class (which all hash to range 0 to $n_j^2-1$) must give no collisions for the $n_j$ keys.

We can NOT conclude that if we pick any random function from this class, we will have a collision-free case. We will need to do some trials, and can hope to succeed for this set of $n_j$ keys very soon, due to the 1/2 probability provided by Theorem 11.9.

Please also refer below answer for details: https://cs.stackexchange.com/a/134386/123596.

Nitin Verma
  • 307
  • 1
  • 10