1

I've been exploring the idea of using Szudzik's pairing function as a rolling hash. I have a series of integers as in:

$ 1,2,3,4,5 $

The idea is to apply the pairing function (nice online implementation here) to integers, using the result as the first input to the next call. As in (given the input above):

$ f(1,2) = p_{1,2} $

$ f(p_{1,2},3) = p_{2,3}$

$ f(p_{2,3},4) = p_{3,4}$

$...$

The problem is, the interim $p_{i,j}$ values grow very quickly and they exceed the limits of 64 bit integers after only a few applications of the pairing function.

I noticed that if I use a large negative number as the first element of the pair though, like the seed value of a random random generator implementation, the growth rate is much, much smaller, and my interim values remain rather close to their initial large negative seed value, if I may call it that. This allows me to hash much larger sets of input. You can use the above link to observe this with a pair such as $(-100000000,1)$ as the starting point.

However, I've seen various statements implying that Cantor and Szudzik do not 'work' with negative integers, such as this one and there are suggested changes to allow the negative axis to be used for output values, but those changes do not change the very quick growth of interim values (in both and negative axis).

So given that using a negative seed in my clueless attempt to use a pairing function as a rolling hash, I'd really like to know what would be the problem in that case. I.e. What does 'not working' mean when negative input is used as I've done? Am I losing the uniqueness? Is it some other property of the function that no longer holds? I do not have the necessary level of math to figure this out, so some explanation would allow me to understand if I can use this function in this very specific scenario: hashing a number of integers to a single one, while using computer representation of integers efficiently (i.e. not overflowing).

  • About pairing functions: https://en.wikipedia.org/wiki/Pairing_function – Jean Marie Aug 31 '22 at 08:56
  • What's your goal in building a hash function in this way? A hash function that maps arbitrary-length inputs to fixed-length outputs can't be injective (i.e. there will necessarily be collisions) due to the pigeonhole principle – Karl Aug 31 '22 at 14:55
  • @karl It's very fast, so performance is a plus. I also see some value in being able to invert the output of the function to its inputs (the integer pair). In all honesty, it is more of a thought exercise, I think Rabin-Karp would be the obvious alternative here, but I am curious about using a pairing function this way. – GrumpyRodriguez Aug 31 '22 at 15:23
  • I think the main point about using the pairing function in this way is that you can't expect to benefit from its invertibility. You lose invertibility when your output is smaller than your input. – Karl Aug 31 '22 at 15:49
  • Thanks. Let's assume that's not a problem either. The reason I asked this question is to understand (if I can) what is meant by 'not working with negative numbers' and how its properties change when I have a negative argument. Dont get me wrong, I appreciate the input but "don't use it that way" is not the answer to my question :) I'm asking for help to see if there's a way to avoid the extremely fast growth of intermediate results. That's what'd be most helpful to me. – GrumpyRodriguez Aug 31 '22 at 15:52
  • 1
    You could be interested by this question and its answers dealing with spiral traversals of the whole set of points with integer coordinates $\mathbb{Z}^2$. – Jean Marie Sep 01 '22 at 10:20

0 Answers0