Rendezvous hashing

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Key Takeaways (Rendezvous hashing)

Rendezvous hashing, also known as highest random weight (HRW) hashing.
Rendezvous hashing solves a general version of the distributed hash table problem.
Rendezvous hashing distributes keys uniformly across the nodes.
Rendezvous hashing was invented by David Thaler and Chinya Ravishankar at the University of Michigan in 1996.
Apache Ignite distributed database uses Rendezvous hashing

Introduction

Hashing algorithms are very important for efficient mapping and searching of objects to nodes or servers. For example in a library if all the books are arranged randomly then how can anyone search for any book? he have to look each and every book one by one inorder to find the book he want to need. But if we arrange book according to type or first alphabet it will be easy to find book.

Rendezvous hashing was invented by David Thaler and Chinya Ravishankar at the University of Michigan in 1996. The examples of areas Rendevous hashing is used in Github load balancer, the Apache Ignite distributed database, the Tahoe-LAFS file store, the CoBlitz large-file distribution service, Apache Druid, IBM's Cloud Object Store, the Arvados Data Management System, Apache Kafka, and by the Twitter EventBus pub/sub platform.

Rendezvous hashing also known as highest random weight hashing, It is an algorithm that allows objects to achieve distributed agreement on a set of servers out of a possible set of n options.

Properties of Rendezvous hashing

Rendezvous hashing function uses an efficient hash function making the low overhead.
A site can be represented in mulitple, for example if a site has twice the capacity it can be represented twice.
When a server is removed or it fails then only the objects that are mapped to that particular servers are remapped other objects remains same.
The objects are distributed on n number of servers by selecting sites with higher wieghts.

Algorithm

Rendezvous hashing solves a general version of the distributed hash table problem, Unlike traditional hashing methods that use a fixed hash function to map keys to servers, rendezvous hashing allows for dynamic server additions and removals without much remapping of keys to servers.

The problem is that, We are given a set of n servers then how can any set of clients can agree on which server to assign object to? For this Rendezvous hashing comes to play.

The basic idea is that each site is assigned with a score or weight for each object then the sites with highest weight or scored is the one object is assigned to.

Pseudo Code

function RendezvousHash(object, servers):
    maxScore = -Infinity
    selectedServer = null
    
    for server in servers:
        score = hash(object + server) 
        
        if score > maxScore:
            maxScore = score
            selectedServer = server
    
    return selectedServer

servers = [server1, server2, server3]
objectToAssign = "some_object_key"

selectedServer = RendezvousHash(objectToAssign, servers)
print("Object", objectToAssign, "assigned to server", selectedServer)

Time Complexity: O(N) - The algorithm's execution time grows linearly with the number of servers (N).

Space Complexity: O(1) - The algorithm uses a constant amount of memory space, regardless of the input size.

Assign a Unique Weight to Each Server:
Initially each server or node in the pool is assigned with a unique weight value using a hash function they are agreed upon, the weight can be assigned based on the server's capacity or load or any other metric that will determine how servers should be utilized.
Hash the Input Key with Each Server's Weight:
For a given input key and object, calculate the weight using a hash function.
Select the Server with the Highest Weight:
The server or node with the highest score or weight among all the servers is chosen to assign the object.
On New Server Added:
When a new server is added to the pool than the existing servers could now map to this new server by calculating the weights again and assigning according to the server with the highest weight.
On Existing Server Removed
When an existing server is removed then all the keys mapped to that server will again calculate the weight/score and map key the previous second choicec server because now it is new first choice.

Load Balancing:
Rendezvous hashing offers good load balancing because it distributes the keys across the servers based on their weights. Servers with higher weights are more likely to be chosen, but the distribution of keys is still relatively uniform, leading to a balanced load distribution.

Comparison with Consistent Hashing


Parameter	Rendezvous Hashing	Consistent Hashing
Complexity	Relatively simple to implement	More complex
Generality	Works for any number of nodes (servers) and keys.	Adaptable to a variety of distributed system configurations.
Overhead	Lower computational overhead.	Can have large overhead.
Deterministic	Provides deterministic mapping of keys to nodes.	Deterministic mapping, but strategies can reduce predictability issues.
Load Balancing	Limited load awareness; may lead to uneven distribution.	Offers better load balancing, especially with advanced variants.
Failure Tolerance	Have some degree of failure tolerance.	Relatively higher failure tolerance.
Flexibility	Suitable for various applications.	Offers more flexibility.
Remapping	Relatively high number of remapping.	Reduces the number of keys that must be remapped when a node is added or removed.

Advantages

Deterministic Hashing: It means that for a given set of nodes and keys, the same node will always be selected for a particular key. This property is essential for applications where consistency is required.
Load Balancing: Rendezvous hashing distributes keys uniformly across the nodes. This ensures that the load is balanced among the nodes.
No Central Authority: Rendezvous hashing does not require a central authority for mapping keys to nodes, Each node can independently compute the mapping, making it easy to implement in decentralized and distributed systems.
Simple Implementation: The algorithm is relatively simple to implement, making it an attractive choice for various applications.
Low Overhead: The computational overhead of computing the hash function and selecting the appropriate node is relatively low, making it efficient for real-time applications and systems with low-latency requirements.

Disadvantages

Complex Handling of Server Addition/Removal: When a node/server is added or removed the first choice of an existing key may change hence making the remapping of keys complex for large systems.
High Time Complexity for Weight Computation: The time complexity of weight computation is O(N) it means it increases linearly with the increase in nodes.
Scalability Challenges: As this has High Time Complexity so the system may face bottlenecks and other challanges when scaling the system by adding more nodes/servers.
Not Geographically Aware: It does not take into account the geographical location of nodes, which might be important for certain applications like content delivery networks.

With this article at OpenGenus, you must have the complete idea of Rendezvous hashing.