In perfect hashing, why does a secondary hash table that is quadratic in size leads to no collisions?

Question

See below a screenshot from CLRS 3rd Edition (Section 11.5, "Perfect Hashing"). The last sentence of the last paragraph says that the choice of $m_j = n^2_j$ leads to collision-free constant-time lookup.

Why? Is it because:

It assumes that one could, in theory, try or sample several hashing functions $\mathcal{h} \in \mathcal{H}$, checking for each for collisions, until we find one $h'$ that doesn't have them?
the probability of sampling $h$ without collisions is 1/2, so it shouldn't be that difficult to find $h'$?

If not, why?

Read the first sentence of this quoted paragraph again. That must have been explained somewhere. — Pseudonym, Mar 10 '20 at 01:50
Not really, @Pseudonym thanks. Why would p(collision) < 1/2 lead to collision-free constant-time lookups? The secondary hash table isn't even designed to handle collisions (e.g. chaining or open addressing). It's just a hash table. And even if it did somehow, a p<1/2 doesn't necessarily mean that you get collision-free lookups (even on avg) since it would depend on how you handle those collisions. — Josh, Mar 10 '20 at 03:35
I suppose it doesn't say there exactly how you find those hash functions. It only says that you can. If so, maybe a e.g. brute force search sampling from $\mathcal{H}$ does it? — Josh, Mar 10 '20 at 03:40
The statement that you quote is informal. When the authors actually use Theorem 11.9, you will see how exactly the theorem gets used. Just read on. — Yuval Filmus, Mar 10 '20 at 15:06

score 1 · Answer 1 · answered Jan 21 '21 at 07:43

If $n_j$ keys hash to slot $j$ of the first-level hashtable, and if $n_j > 1$, we will need to use a second-level hashtable for that slot. Theorem 11.9 helps us in the sense that, if we keep this second hashtable size as $n_j^2$, then atleast 1/2 of the hash-functions in any universal-class (which all hash to range 0 to $n_j^2-1$) must give no collisions for the $n_j$ keys.

We can NOT conclude that if we pick any random function from this class, we will have a collision-free case. We will need to do some trials, and can hope to succeed for this set of $n_j$ keys very soon, due to the 1/2 probability provided by Theorem 11.9.

Please also refer below answer for details: https://cs.stackexchange.com/a/134386/123596.

In perfect hashing, why does a secondary hash table that is quadratic in size leads to no collisions?

1 Answers1