4

I read in our university lecture on hashing that it would be good (even though it is way to space intensive) if we could take the set of all function from $U \rightarrow V$ to satisfy the following universal hashing condition: for all $\forall x,y\in U$ so that $x≠y$, we have $Pr[h(x)=h(y)]≤\frac{1}{|V|}$

I actually don´t understand why the set of all functions would satisfy our condition. I can definitively see that there are some function which fit this condition, but I can think of other functions which dont meet the condition. So I would have to make a statement about average probabilty over the set of all functions, but how could something like this be defined precisely?

Or am I taking something wrong here and the Set of all functions wouldn´t meet the condition, but would actually be better that the condition taking the case that h is taking randomly from H? Even if so it seems to come down to the same problem.

OuttaSpaceTime
  • 195
  • 1
  • 6

1 Answers1

4

First note that the set of all functions $U\to V$ contains for every value $u\in U$ and every value $v\in V$ a function $f$ such that $f(u)=v$. Also note that if you fix a point $u\in U$ and take a random function from $U\to V$, $f(u)$ is uniformly distributed in $V$.

Now if you apply that, fix any two points $x\neq y,\quad x,y\in U$. Now choose $h$ randomly out of $U\to V$. Note that $h(x)$ is uniformly distributed and $h(y)$ is uniformly distributed. The probability for $h(x)$ and $h(y)$ to be equal now should be $1/|V|$ which implies the desired universality.


Given that the above might not be entirely clear, let's do an example: $$U=\{1,2,3\}\quad V=\{1,2,3\}$$ Clearly there are 27 functions mapping from $U$ to $V$ and clearly there are 6 pairs of unequal values $(x,y)\in U\times U$ for which we need to check whether the probability holds.

  • $(x,y)=(1,2):$ There are three possible ways to map $1$: to either 1,2, or 3, let's denote the actual value $h(x)$ for the moment. There are also three possible values to map $2$ to: 1,2, or 3, let's denote the actual value as $h(y)$. Given that these two choices are independent, there are 9 cases here.

    • If $h(x)=1$ then the tested equation only holds if $h(y)=1$ as well which is a 1/3 chance
    • If $h(x)=2$ then the same argument as for $h(x)=1$ applies with $h(y)=2$
    • $h(x)=3$ goes analogously

Now counting the cases where the equation holds for $(x,y)=(1,2)$ we get $3/9\leq 1/3$ as demanded.

If we go through the other pairs as well, a very similar argument will ensue and we'll be getting $3/9$ for all of them. Therefore a random function from $U$ to $V$ satisfies universality.

SEJPM
  • 45,967
  • 7
  • 99
  • 205
  • Can you give me a hint how the uniform distribution can be recognized within this context and how it play into the situation of probability. Also for example what about functions like any x maps to the same v for all x in U. This would be probability of 1 for h(x) = h(y) and not $\frac{1}{|V|}$ – OuttaSpaceTime Dec 14 '20 at 11:57
  • 1
    @FelixOuttaSpace Note that the probability is taken only over the choice of the function, not over the values of x and y. As for the constant functions, there are "only" $|V|$ of them but $|V|^{|U|}$ total functions. To better understand the set of all functions, it might be helpful to realize that if $|U|=1$ then a random choice of function equals a random choice of an output value which then gets "cartesian multiplied" to larger sizes for $U$. – SEJPM Dec 14 '20 at 13:25
  • 1
    The “collision” probability, that $h(x)=h(y)$, is $1/|V|$, not $1/|V|^2$. – Chris Peikert Dec 14 '20 at 13:54
  • @ChrisPeikert yeah, I was a bit unsure and guessed wrong, woops. – SEJPM Dec 14 '20 at 13:55
  • @SEJPM the only part I am confused with is that you say the probability is taken over the choice of the function, though the defintion states Pr[h(x)=h(y)] with the same h and not a different h within it. Wouldnt it then be $Pr[h(y)=g(x)]$ with h,g $\in H_0$ then? – OuttaSpaceTime Dec 16 '20 at 13:09
  • @FelixOuttaSpace no, you still pick the same function for both evaluations. Universality then ensures that for every pair of input values, the probability of a collision on that pair is bound by $1/|V|$. – SEJPM Dec 16 '20 at 13:14
  • @SEJPM okay but what about the case that h(x) = v for all x within U and v is set to a bound value in V and let |V| > 1, then $Pr[h(x)=h(y)] = 1 > \frac{1}{|V|}$ or is such a function not within $H_0$? Then $H_0$ would be the set of all bijective functions. – OuttaSpaceTime Dec 16 '20 at 14:59
  • @FelixOuttaSpace such a function is in the set of all functions. However, randomly choosing such a "bad" function has a probability that is (way) smaller than $1/|V|$. – SEJPM Dec 16 '20 at 15:06
  • @SEJPM but what about if I set U = {1,2,3} and V = {1,2,3}, then there are 3 functions with Pr[h(x)=h(y)] = 1 (for all three values in image(h) are the same value), 18 functions with Pr[h(x)=h(y)] = 1/2 (6+6+6 for two even values in image(h)) and only 6 functions with Pr[h(x)=h(y)] = 1/3 (6! permuations), so a chance of $21/27 > 1/3$ to pick a function which doesn´t have probabilty of $1/|V = 1/3|$ – OuttaSpaceTime Dec 16 '20 at 16:28
  • @FelixOuttaSpace I have added an example, I think the issue here is that you first need to fix the inputs $x,y$ and then check if only $1/|V|$ (or less) functions produce collisions for that pair. – SEJPM Dec 16 '20 at 19:16
  • Ok nice, what is a little bit confusing for me right now is that you don´t specify the probabilty of which function you pick (as I noted in my last comment, there are 3 different kind of functions) I imagine this as a binary tree and the first probabilty is for picking different functions and the second probability for picking the same outcome on a given second value with the chosen function so $\frac{3}{27}1+\frac{18}{27}\frac{1}{2}+\frac{6}{27}\frac{1}{3}=\frac{14}{27} > 1/3$ – OuttaSpaceTime Dec 16 '20 at 19:37
  • @FelixOuttaSpace The issue in your counting is that you allow $x=y$. If you enforce $x\neq x$ the associated probability for the permutations drops to 0 and the associated probability for the non-constant ones drops to 1/3 which gives you the desired summed probability. – SEJPM Dec 17 '20 at 08:19
  • a closer look at the permuations for h(x) gave me a better understanding. the fixed probabilty should be $3/271+18/271/3+6/24*0$ which is giving the disered outcome. – OuttaSpaceTime Dec 18 '20 at 19:11