I'm partitioning a stream of input data between n
servers. The simple way of doing it is to use the hash of some property of each input data packet to assign a single server, using mod or similar, and be done with it.
However, I want some degree of resiliancy - if one server goes down, nothing is lost. I want to partition each data packet to m
servers, where 1 < m < n
, with each data packet guaranteed to go to at least m
servers (but it can be more). Furthermore, I want the partitioning to be stateless, deterministic, and well-distributed - the calculation only uses the hash(es) of the input data.
This feels like something that research papers have been written about, but my google-fu has failed me. Are there any existing algorithms which do this, ideally generalisable across n
and m
?
mod (ni/m)
, whereni
is initial number of servers, is probably well-distributed on a large enough time scale. :) But otherwise it doesn't behave like you want a cluster to behave... adding nodes only adds more redundancy. To increase performance (increase n while keeping m constant), you'd have to rebalance the data between nodes. – Kasey Speakman May 18 '16 at 22:56