Database Sharding in System Design: Range vs Hash vs Directory, Resharding & Hot Shards (Visualized)

Sharding is the practice of horizontally partitioning a dataset across multiple database nodes, where each node (a shard) holds a disjoint subset of the rows. Instead of one machine storing every row, the data is split by a shard key so that reads and writes for a given key land on exactly one shard.

Sharding is what you reach for once a single database can no longer hold the data or absorb the write load, and a read replica is no longer enough. By spreading rows across N nodes, you roughly multiply your storage and write throughput by N — at the cost of significantly more operational complexity. The art of sharding is almost entirely in two decisions: how you partition and which key you partition on.

Why Shard at All?

Vertical scaling (a bigger box) and read replicas (more read copies) both hit a wall: a single primary still has to absorb every write, and a single disk still has finite size. Sharding breaks that ceiling by giving each shard its own CPU, memory, disk, and write path. Ten shards can hold roughly ten times the data and accept roughly ten times the writes. The trade-off is that your application — or a routing layer in front of it — must now figure out which shard owns each request.

Choosing a Shard Key

The shard key is the column (or set of columns) whose value decides which shard a row lives on. A good shard key has three properties: high cardinality (many distinct values), uniform access (no single value dominates traffic), and alignment with your queries (most lookups already filter by it). Picking user_id for a social app usually satisfies all three; picking country fails on cardinality and uniformity because a few countries hold most users.

A poor shard key is expensive to undo: once data is laid out, changing the key means rewriting every row to a new shard. Treat this decision like a database schema you will live with for years.

Partitioning Strategy 1: Hash Sharding

Hash sharding runs the shard key through a hash function and uses the result to pick a shard, for example shard = hash(user_id) % N. Because a good hash spreads inputs evenly, rows are distributed almost uniformly across shards regardless of the key's natural ordering. This is the default for write-heavy, point-lookup workloads where you almost always know the exact key.

Rows routed to shards by hash of the key

Each incoming row is hashed on its shard key, then routed to shard = hash(key) % N. Counters show how evenly rows spread across the four shards.

The weakness of plain modulo hashing is resharding: change N from 4 to 5 and hash(key) % N moves almost every key to a different shard. That is why production systems use consistent hashing (covered below) or a fixed large number of logical partitions that map onto physical shards.

Partitioning Strategy 2: Range Sharding

Range sharding assigns contiguous ranges of the shard key to each shard — users A–F on shard 1, G–M on shard 2, and so on. Because rows stay in order, range scans are cheap: "all orders from last week" or "users between two timestamps" can be served from a small number of adjacent shards. The cost is uneven load: if your key is monotonically increasing (like an auto-increment ID or a timestamp), every new write lands on the last shard, creating a hot shard.

Partitioning Strategy 3: Directory (Lookup) Sharding

Directory sharding keeps an explicit lookup table that maps each key (or key range) to a shard. The router consults the directory before every request. This gives you maximum flexibility — you can move any tenant to any shard, isolate a noisy customer, or rebalance arbitrarily — but the directory itself becomes a critical, highly-read component you must make fast and highly available, and a potential single point of failure if you do not.

	Range	Hash	Directory
Maps by	Contiguous key ranges	hash(key) % N (or ring)	Explicit lookup table
Range scans	Cheap (adjacent shards)	Expensive (scatter-gather)	Depends on layout
Even distribution	Risky with monotonic keys	Excellent	Manual / tunable
Rebalancing	Split / merge ranges	Move keys on N change	Just update the map
Used by	HBase, Bigtable, MongoDB (ranged)	Cassandra, DynamoDB, MongoDB (hashed)	Many app-level shard routers

Hot Shards: The Number-One Sharding Failure

A hot shard is a single shard that receives a disproportionate share of traffic while its siblings sit idle. It defeats the entire point of sharding: your cluster has plenty of total capacity, but one node is saturated and everything routed there slows down. Hot shards come from skewed keys (a celebrity user, a viral video, a giant tenant), monotonic keys (timestamps, sequential IDs), or low-cardinality keys. The fix is usually a better shard key, a composite key, or salting the key to spread one logical entity across several shards.

A hot shard overloading while others idle

A skewed shard key sends most traffic to Shard 2. Its load bar climbs into the red and requests queue, while the other shards stay nearly idle — wasted capacity.

Consistent Hashing and Resharding

Consistent hashing places both shards and keys on a circular hash ring. Each key walks clockwise to the first shard it meets. When you add a shard, only the keys that fall between the new node and its predecessor have to move — roughly 1/N of the data — instead of nearly everything as with hash(key) % N. Virtual nodes (placing each physical shard at many points on the ring) smooth out the distribution and make rebalancing fine-grained. This is the mechanism behind DynamoDB, Cassandra, and most modern caches.

Consistent hashing: adding a node moves only some keys

Keys sit on a ring and belong to the next node clockwise. When a fourth node is added, only the keys in its arc are reassigned (highlighted) — everything else stays put.

Even with consistent hashing, resharding is never free: you must copy data, keep the source and destination in sync during the move, cut over atomically, and avoid double-counting writes. Production tooling like Vitess automates this with online resharding (it copies, tails the binlog to catch up, verifies, then switches traffic), and MongoDB and Citus provide their own balancer and shard-move primitives.

Cross-Shard Queries and Joins

The hardest part of sharding is any operation that touches more than one shard. A query that filters by the shard key hits exactly one node. A query that does not — "count all orders over $100 this month" — must be sent to every shard and the partial results merged. This is called scatter-gather, and its latency is bounded by the slowest shard. Joins across shards are worse: either you co-locate related data on the same shard (e.g. store a user and all their orders together by sharding both on user_id), or you denormalize, or you accept an expensive fan-out join in the routing layer.

Distributed transactions across shards (two-phase commit) are possible but slow and failure-prone. The common pattern is to design your shard key so a single business transaction stays within one shard, and use sagas or eventual consistency for anything that spans shards.

# Application-level shard router: same-shard lookup vs scatter-gather
SHARDS = ["db0", "db1", "db2", "db3"]

def shard_for(user_id: int) -> str:
    # Stable hash -> one shard owns this user
    return SHARDS[hash(("user", user_id)) % len(SHARDS)]

def get_user(user_id: int):
    # Single-shard: fast, routes to exactly one node
    return query(shard_for(user_id), "SELECT * FROM users WHERE id=%s", user_id)

def count_premium_users():
    # Cross-shard scatter-gather: ask every shard, then merge
    total = 0
    for s in SHARDS:                       # fan out
        total += query(s, "SELECT COUNT(*) FROM users WHERE plan='premium'")
    return total                           # gather: latency = slowest shard

Rebalancing and Operational Reality

Shards drift out of balance as data grows unevenly, so you periodically rebalance — splitting a large shard, merging small ones, or moving partitions to new hardware. A practical trick used by Vitess, Citus, and many in-house systems is to over-provision logical partitions (say 1024 of them) up front and map many logical partitions onto each physical node. Rebalancing then just reassigns logical partitions between nodes without rehashing keys, and you can grow the cluster by handing some partitions to new machines. Backups, schema migrations, and monitoring all become per-shard concerns, which is the real ongoing cost of sharding.

Real Systems and Their Trade-offs

Vitess shards MySQL transparently behind a query router (used by YouTube and Slack) and excels at online resharding. MongoDB supports both ranged and hashed shard keys with an automatic balancer. Citus turns PostgreSQL into a distributed database using a distribution column and co-location groups. DynamoDB and Cassandra hash a partition key onto a ring and hide shard management entirely, but push the burden of picking a good partition key — and avoiding hot partitions — onto you. The common thread: every system makes you choose a key, and that choice dominates your performance ceiling.

Frequently Asked Questions

What is the difference between sharding and partitioning?

Partitioning is the general idea of splitting a table into pieces; it can happen within a single database (think PostgreSQL table partitions on one server). Sharding is partitioning where the pieces live on separate database nodes, so it also scales hardware, not just organize data. All sharding is partitioning, but not all partitioning is sharding.

How do I avoid hot shards?

Pick a high-cardinality shard key with uniform access, avoid monotonically increasing keys like timestamps or auto-increment IDs as the sole key, and use hashing rather than raw ranges for write-heavy workloads. For unavoidable skew (a celebrity account), salt the key — append a small random suffix so one logical entity spreads across several shards — and aggregate on read.

When should I start sharding?

As late as you safely can. Sharding adds permanent complexity to every query, transaction, and migration, so exhaust cheaper options first: indexing, caching, a bigger instance, and read replicas. Shard when a single primary can no longer hold the data or absorb the write volume — and plan the shard key carefully before you do, because changing it later means rewriting your entire dataset.

Sharding multiplies your capacity by N and your operational complexity by more than N. The shard key you pick is the one decision you will not get to cheaply change.
— alokknight Engineering