Cache Stampede (Thundering Herd / Dogpile): Causes & Mitigations (Visualized)
A cache stampede happens when a hot cache key expires and thousands of concurrent requests all miss at once, hammering the database in a thundering herd that can cascade into an outage. This guide explains why it happens and how to stop it โ single-flight locks, probabilistic early expiration, stale-while-revalidate, and TTL jitter โ with live animations.
A cache stampede โ also called a thundering herd or dogpile โ occurs when a single popular cache key expires and many concurrent requests all miss the cache at the same instant, so they all fall through to the database (or origin service) simultaneously to recompute the same value. Instead of one request rebuilding the entry, hundreds or thousands do redundant, identical work at once.
Under normal load the cache absorbs almost all traffic and the database sees a trickle. But the moment a hot key's TTL elapses, the protective layer briefly vanishes for that key and the full firehose of traffic hits the origin at once. The result is a sudden load spike that can overwhelm the database, slow down every request, and โ in the worst case โ cascade into a full outage.
Why a Single Expiring Key Causes a Flood
Caches like Redis and Memcached store values with a time-to-live (TTL). A typical read path is: check the cache; on a hit, return the value; on a miss, query the database, write the result back to the cache, then return it. This works beautifully โ until a key that serves a large fraction of traffic expires.
Consider a key that receives 10,000 requests per second. Recomputing its value takes, say, 200 ms. The instant it expires, every request that arrives during that 200 ms window โ about 2,000 requests โ sees a miss. None of them find the value in the cache yet because nobody has finished writing it back. So all 2,000 issue the same expensive query at the same time. The database, sized for a handful of cache-miss queries per second, is suddenly asked to run thousands of identical ones.
Why It Can Cascade Into an Outage
The danger is not just the spike โ it is the feedback loop. As the database saturates, each query gets slower, so the recompute window stretches from 200 ms to seconds. A longer window means even more requests pile up before the first one finishes, which slows the database further. Connection pools exhaust, request queues back up, latency climbs across the whole service, and timeouts trigger retries that add still more load. A stampede on one key can drag down the entire system โ a classic metastable failure.
Mitigation 1: Request Coalescing (Single-Flight Locks)
The most direct fix is to ensure only one request recomputes a missing key while everyone else waits for the result. This is called request coalescing or a single-flight pattern. On a miss, a request tries to acquire a short-lived lock (for example, SET key:lock 1 NX EX 10 in Redis). The one that wins the lock queries the database and repopulates the cache; the losers either wait briefly and re-read the cache, or serve a stale value. Either way the database sees exactly one query instead of thousands.
import time, redis
r = redis.Redis()
def get_with_singleflight(key, recompute, ttl=60, lock_ttl=10):
val = r.get(key)
if val is not None:
return val # cache hit
lock_key = f"{key}:lock"
# Only ONE request wins the lock and recomputes.
if r.set(lock_key, "1", nx=True, ex=lock_ttl):
try:
val = recompute() # the single DB query
r.set(key, val, ex=ttl)
return val
finally:
r.delete(lock_key)
# Losers: wait briefly, then re-read the now-warm cache.
for _ in range(50):
time.sleep(0.02)
val = r.get(key)
if val is not None:
return val
return recompute() # fallback if the winner stalledMitigation 2: Probabilistic Early Expiration (XFetch)
Instead of waiting for a hard expiry, let each request randomly decide โ with rising probability as the TTL approaches โ to refresh the value early, in the background, while the still-valid cached value keeps being served. This is the XFetch algorithm (probabilistic early recomputation). Because only an occasional request triggers a refresh and it happens before the key actually expires, the value is renewed by a single straggler and the cache never goes empty. There is no synchronized cliff for the whole herd to fall off.
import math, random, time
# XFetch: refresh early with probability rising near expiry.
# delta = how long recompute() took; beta tunes eagerness (~1.0).
def should_refresh_early(expiry, delta, beta=1.0):
now = time.time()
# Negative gap * delta * beta * ln(rand) crosses zero near expiry.
return now - delta * beta * math.log(random.random()) >= expiry
def get_xfetch(key, recompute, ttl=60, beta=1.0):
entry = cache_get(key) # {value, delta, expiry} or None
if entry and not should_refresh_early(entry["expiry"], entry["delta"], beta):
return entry["value"] # serve cached, no refresh yet
start = time.time()
value = recompute()
delta = time.time() - start
cache_set(key, {"value": value, "delta": delta,
"expiry": time.time() + ttl}, ttl)
return valueMitigation 3: Stale-While-Revalidate
With stale-while-revalidate you keep two windows: a fresh window and a longer stale window. After the fresh window ends, requests are served the stale value immediately while a single background task revalidates it. Users never wait on the database, and the origin only ever sees one revalidation per key. This is the same idea HTTP caches use via the stale-while-revalidate Cache-Control directive, and it pairs naturally with a single-flight lock so only one revalidation runs at a time.
Mitigation 4: Staggered TTLs and Jitter
A subtle stampede happens when many keys expire at the same instant โ for example, you warmed 10,000 keys in a tight loop with an identical 600 second TTL, so they all expire together 600 seconds later. The fix is TTL jitter: add a small random offset to each key's TTL (say, ttl + random(0, 60)) so expirations are spread out over time and never line up. The same trick prevents a fleet of clients from all refreshing on the same wall-clock boundary.
Mitigation 5: Cache Warming
Cache warming proactively recomputes and writes hot keys before they expire โ on a schedule or driven by an event โ so the cache is never empty for a popular key. It is especially valuable for known-expensive, known-hot values (a homepage feed, a leaderboard, a pricing table). Combine it with jitter so the warming job itself does not create a synchronized expiry wave.
Comparing the Mitigations
| Strategy | How it helps | Trade-off / cost |
|---|---|---|
| Single-flight lock | Only one request recomputes; others wait or serve stale | Adds lock coordination; waiters add a little latency |
| Probabilistic early expiry (XFetch) | Refresh early at random before the cliff; no empty window | Slightly more total recomputes; needs delta tracking |
| Stale-while-revalidate | Serve stale instantly, revalidate in background | Briefly serves slightly stale data |
| Staggered TTLs / jitter | Spreads expirations so they never align | Trivial to add; does not help a single super-hot key |
| Cache warming | Keys refreshed before they ever expire | Extra background work; must know the hot keys |
In practice these compose well. A robust setup often uses jitter on every key to avoid synchronized expiry, a single-flight lock to coalesce the inevitable concurrent misses, and stale-while-revalidate so users never block on a recompute. XFetch is a strong drop-in for hot keys where you want refreshes to happen smoothly and early.
Frequently Asked Questions
What is the difference between a cache stampede and a thundering herd?
They describe the same failure from different angles. Cache stampede (or dogpile) emphasizes the cause: a popular cache entry expires and many requests pile onto the origin to rebuild it. Thundering herd is the more general term for any situation where many waiters are released at once and all rush a shared resource. A cache stampede is a thundering herd triggered by cache expiry.
Does adding more cache servers prevent a stampede?
No. The bottleneck is the database recomputing the value, not the cache serving it. When a hot key expires, every request misses regardless of how many cache nodes you have, and they all fall through to the same origin. You need coordination โ single-flight, early refresh, stale serving, or warming โ not more cache capacity.
Which mitigation should I reach for first?
Start with TTL jitter โ it is one line and removes synchronized expiry across many keys for almost no cost. Then add a single-flight lock so concurrent misses on any one key collapse to a single origin query. Layer in stale-while-revalidate or XFetch when you have hot keys that must never block users on a recompute, and warm the handful of keys you know are both hot and expensive.
A cache stampede is the cache protecting you right up until the moment it stops. Coalesce the misses, refresh before the cliff, and jitter your TTLs โ so the herd never gathers in the first place.
โ alokknight Engineering
