Consistent Hashing Explained & Visualized

What is Consistent Hashing?

Consistent hashing is a technique used in distributed systems (like caches, databases, and load balancers) to assign data or requests to nodes (servers). Its primary goal is to minimize disruption when nodes are added or removed from the system.

The Problem with Simple Hashing (Modulo N)

Imagine you have N servers and you want to distribute keys (e.g., user IDs, cache keys) among them. A simple approach is:

server_index = hash(key) % N

This works initially, but what happens when you add or remove a server?

How Consistent Hashing Solves It

Consistent hashing uses a different approach:

  1. Imagine a Ring: Think of a conceptual ring or circle representing a large range of hash values (e.g., 0 to 232-1, or 0-359 degrees in our visualization).
  2. Place Servers on the Ring: Hash each server's identifier (like its IP address or name) to get a position on this ring. Place the server at that point.
  3. Place Keys on the Ring: Hash each data key to get its position on the same ring.
  4. Assign Keys to Servers: To find which server should handle a key, start at the key's position on the ring and move clockwise until you encounter the first server. That server is responsible for the key.
Consistent Hashing Ring Diagram

The Key Benefit: Minimal Re-mapping

Virtual Nodes (Replicas)

A potential issue is uneven distribution. If servers happen to hash close together, one might get a disproportionately large segment of the ring. To fix this, we use virtual nodes (or replicas):

Visualize and Play

Use the controls below (or click servers) to add/remove servers and keys, and adjust virtual nodes to see consistent hashing in action.

Log messages will appear here...