Partition stream of data between n servers, with m duplicates

Question

I'm partitioning a stream of input data between n servers. The simple way of doing it is to use the hash of some property of each input data packet to assign a single server, using mod or similar, and be done with it.

However, I want some degree of resiliancy - if one server goes down, nothing is lost. I want to partition each data packet to m servers, where 1 < m < n, with each data packet guaranteed to go to at least m servers (but it can be more). Furthermore, I want the partitioning to be stateless, deterministic, and well-distributed - the calculation only uses the hash(es) of the input data.

This feels like something that research papers have been written about, but my google-fu has failed me. Are there any existing algorithms which do this, ideally generalisable across n and m?

That is not relevant - this question is, given a hash value, how you use it to assign to several different buckets, ensuring a good distribution across buckets — thecoop, May 12 '16 at 14:04
Pretty sure that being well-distributed precludes the possibility of being stateless and deterministic. That is if by well-distributed, you mean avoiding hotspots. Otherwise, a simple function like mod (ni/m), where ni is initial number of servers, is probably well-distributed on a large enough time scale. :) But otherwise it doesn't behave like you want a cluster to behave... adding nodes only adds more redundancy. To increase performance (increase n while keeping m constant), you'd have to rebalance the data between nodes. — Kasey Speakman, May 18 '16 at 22:56

Partition stream of data between n servers, with m duplicates

0 Answers0