Cache Stampede (Thundering Herd / Dogpile): Causes & Mitigations (Visualized)

A cache stampede — also called a thundering herd or dogpile — occurs when a single popular cache key expires and many concurrent requests all miss the cache at the same instant, so they all fall through to the database (or origin service) simultaneously to recompute the same value. Instead of one request rebuilding the entry, hundreds or thousands do redundant, identical work at once.

Under normal load the cache absorbs almost all traffic and the database sees a trickle. But the moment a hot key's TTL elapses, the protective layer briefly vanishes for that key and the full firehose of traffic hits the origin at once. The result is a sudden load spike that can overwhelm the database, slow down every request, and — in the worst case — cascade into a full outage.

Why a Single Expiring Key Causes a Flood

Caches like Redis and Memcached store values with a time-to-live (TTL). A typical read path is: check the cache; on a hit, return the value; on a miss, query the database, write the result back to the cache, then return it. This works beautifully — until a key that serves a large fraction of traffic expires.

Consider a key that receives 10,000 requests per second. Recomputing its value takes, say, 200 ms. The instant it expires, every request that arrives during that 200 ms window — about 2,000 requests — sees a miss. None of them find the value in the cache yet because nobody has finished writing it back. So all 2,000 issue the same expensive query at the same time. The database, sized for a handful of cache-miss queries per second, is suddenly asked to run thousands of identical ones.

The stampede: a hot key expires and the herd hits the database

Phase 1 the cache absorbs traffic and DB load is flat. Phase 2 the hot key's TTL expires. Phase 3 every request misses and floods the database at once — the stampede. Watch the DB load meter (req/s) spike red.

Why It Can Cascade Into an Outage

The danger is not just the spike — it is the feedback loop. As the database saturates, each query gets slower, so the recompute window stretches from 200 ms to seconds. A longer window means even more requests pile up before the first one finishes, which slows the database further. Connection pools exhaust, request queues back up, latency climbs across the whole service, and timeouts trigger retries that add still more load. A stampede on one key can drag down the entire system — a classic metastable failure.

Mitigation 1: Request Coalescing (Single-Flight Locks)

The most direct fix is to ensure only one request recomputes a missing key while everyone else waits for the result. This is called request coalescing or a single-flight pattern. On a miss, a request tries to acquire a short-lived lock (for example, SET key:lock 1 NX EX 10 in Redis). The one that wins the lock queries the database and repopulates the cache; the losers either wait briefly and re-read the cache, or serve a stale value. Either way the database sees exactly one query instead of thousands.

import time, redis
r = redis.Redis()

def get_with_singleflight(key, recompute, ttl=60, lock_ttl=10):
    val = r.get(key)
    if val is not None:
        return val  # cache hit

    lock_key = f"{key}:lock"
    # Only ONE request wins the lock and recomputes.
    if r.set(lock_key, "1", nx=True, ex=lock_ttl):
        try:
            val = recompute()           # the single DB query
            r.set(key, val, ex=ttl)
            return val
        finally:
            r.delete(lock_key)

    # Losers: wait briefly, then re-read the now-warm cache.
    for _ in range(50):
        time.sleep(0.02)
        val = r.get(key)
        if val is not None:
            return val
    return recompute()  # fallback if the winner stalled

Same scenario, with a single-flight lock

The hot key expires again — but now the first request grabs the lock and is the ONLY one to query the DB (green). Every other request waits and is served the freshly refreshed value. DB load stays flat instead of spiking.

Mitigation 2: Probabilistic Early Expiration (XFetch)

Instead of waiting for a hard expiry, let each request randomly decide — with rising probability as the TTL approaches — to refresh the value early, in the background, while the still-valid cached value keeps being served. This is the XFetch algorithm (probabilistic early recomputation). Because only an occasional request triggers a refresh and it happens before the key actually expires, the value is renewed by a single straggler and the cache never goes empty. There is no synchronized cliff for the whole herd to fall off.

import math, random, time

# XFetch: refresh early with probability rising near expiry.
# delta = how long recompute() took; beta tunes eagerness (~1.0).
def should_refresh_early(expiry, delta, beta=1.0):
    now = time.time()
    # Negative gap * delta * beta * ln(rand) crosses zero near expiry.
    return now - delta * beta * math.log(random.random()) >= expiry

def get_xfetch(key, recompute, ttl=60, beta=1.0):
    entry = cache_get(key)  # {value, delta, expiry} or None
    if entry and not should_refresh_early(entry["expiry"], entry["delta"], beta):
        return entry["value"]  # serve cached, no refresh yet
    start = time.time()
    value = recompute()
    delta = time.time() - start
    cache_set(key, {"value": value, "delta": delta,
                     "expiry": time.time() + ttl}, ttl)
    return value

Mitigation 3: Stale-While-Revalidate

With stale-while-revalidate you keep two windows: a fresh window and a longer stale window. After the fresh window ends, requests are served the stale value immediately while a single background task revalidates it. Users never wait on the database, and the origin only ever sees one revalidation per key. This is the same idea HTTP caches use via the stale-while-revalidate Cache-Control directive, and it pairs naturally with a single-flight lock so only one revalidation runs at a time.

Mitigation 4: Staggered TTLs and Jitter

A subtle stampede happens when many keys expire at the same instant — for example, you warmed 10,000 keys in a tight loop with an identical 600 second TTL, so they all expire together 600 seconds later. The fix is TTL jitter: add a small random offset to each key's TTL (say, ttl + random(0, 60)) so expirations are spread out over time and never line up. The same trick prevents a fleet of clients from all refreshing on the same wall-clock boundary.

TTL jitter spreads expirations so they never align

Top row: identical TTLs — every key expires on the same tick and the whole herd misses at once (red flash). Bottom row: jittered TTLs — expirations are spread out, so at most one or two keys refresh at a time and the DB stays calm.

Mitigation 5: Cache Warming

Cache warming proactively recomputes and writes hot keys before they expire — on a schedule or driven by an event — so the cache is never empty for a popular key. It is especially valuable for known-expensive, known-hot values (a homepage feed, a leaderboard, a pricing table). Combine it with jitter so the warming job itself does not create a synchronized expiry wave.

Comparing the Mitigations

Strategy	How it helps	Trade-off / cost
Single-flight lock	Only one request recomputes; others wait or serve stale	Adds lock coordination; waiters add a little latency
Probabilistic early expiry (XFetch)	Refresh early at random before the cliff; no empty window	Slightly more total recomputes; needs delta tracking
Stale-while-revalidate	Serve stale instantly, revalidate in background	Briefly serves slightly stale data
Staggered TTLs / jitter	Spreads expirations so they never align	Trivial to add; does not help a single super-hot key
Cache warming	Keys refreshed before they ever expire	Extra background work; must know the hot keys

In practice these compose well. A robust setup often uses jitter on every key to avoid synchronized expiry, a single-flight lock to coalesce the inevitable concurrent misses, and stale-while-revalidate so users never block on a recompute. XFetch is a strong drop-in for hot keys where you want refreshes to happen smoothly and early.

Frequently Asked Questions

What is the difference between a cache stampede and a thundering herd?

They describe the same failure from different angles. Cache stampede (or dogpile) emphasizes the cause: a popular cache entry expires and many requests pile onto the origin to rebuild it. Thundering herd is the more general term for any situation where many waiters are released at once and all rush a shared resource. A cache stampede is a thundering herd triggered by cache expiry.

Does adding more cache servers prevent a stampede?

No. The bottleneck is the database recomputing the value, not the cache serving it. When a hot key expires, every request misses regardless of how many cache nodes you have, and they all fall through to the same origin. You need coordination — single-flight, early refresh, stale serving, or warming — not more cache capacity.

Which mitigation should I reach for first?

Start with TTL jitter — it is one line and removes synchronized expiry across many keys for almost no cost. Then add a single-flight lock so concurrent misses on any one key collapse to a single origin query. Layer in stale-while-revalidate or XFetch when you have hot keys that must never block users on a recompute, and warm the handful of keys you know are both hot and expensive.

A cache stampede is the cache protecting you right up until the moment it stops. Coalesce the misses, refresh before the cliff, and jitter your TTLs — so the herd never gathers in the first place.
— alokknight Engineering