Caching in System Design: Cache-Aside, Write-Through, Eviction Policies & More (Visualized)

Caching is the technique of storing a copy of data in a fast-access storage layer so that future requests for that data can be served without repeating the original, slower computation or network round-trip. The cached copy is called a cache entry; the fast store is the cache; the authoritative source it fronts is the origin or backing store.

Almost every layer of a modern system caches aggressively: the CPU caches RAM, the OS caches disk blocks, the browser caches assets, the CDN caches HTTP responses, the application server caches query results in Redis or Memcached, and the database engine caches disk pages in its buffer pool. Each layer shaves off latency for the layers above it. Understanding when and how to cache — and, crucially, when not to — is one of the most impactful skills in system design.

Cache Hit, Cache Miss, and Hit Ratio

When a request arrives, the system checks the cache first. A cache hit means the data is present and is returned immediately — typically in under a millisecond from an in-memory store like Redis. A cache miss means the data is absent; the system must fetch it from the origin (a database, external API, or computation), pay the full latency penalty, and usually store the result back in the cache for next time.

The hit ratio (also called hit rate) is the fraction of requests served from cache: hits / (hits + misses). A hit ratio of 0.90 means 90 % of requests skip the origin entirely. Even small improvements matter — going from 90 % to 95 % halves the load on your database. For read-heavy workloads, a well-tuned cache with a hit ratio above 95 % can let a single database serve ten times the traffic it could unaided.

Cache-Aside (Lazy Loading) vs Read-Through

There are two fundamental patterns for populating a cache on reads. In cache-aside (also called lazy loading), the application code owns the logic: check the cache, and if it misses, fetch from the origin, write the result into the cache, then return it. The cache is populated only on demand. In read-through, the cache sits transparently in front of the database; the application always reads from the cache, and the cache itself fetches from the origin on a miss, hiding the complexity from the caller.

Cache-aside is the most widely used pattern (Redis + application code) because it gives you full control: you decide what to cache, how long to cache it, and how to invalidate it. The downside is that cold starts or node failures leave the cache empty, causing a burst of misses — a cold cache problem. Read-through simplifies the application layer but requires a cache library or proxy that supports it (e.g., a database proxy or ORM-level cache).

Cache-Aside Read Flow: Hit vs Miss

Watch a cache HIT return instantly from cache, while a cache MISS falls through to the database and back-fills the cache for next time.

Write Strategies: Write-Through, Write-Back, and Write-Around

How you handle writes is just as important as how you handle reads. The three canonical strategies differ in the order and synchrony of cache and database updates:

Write-through: Every write goes to the cache and to the database synchronously before returning to the caller. The cache is always consistent with the database, so reads after writes are always fresh. The cost is extra write latency (two writes per operation) and cache churn — you cache data that may never be read again.

Write-back (write-behind): Writes go to the cache only and are asynchronously flushed to the database in the background, usually in batches. This makes writes extremely fast and allows write coalescing, but creates a window of data loss if the cache node fails before the flush. Use it for high-write workloads where some data loss is tolerable or where a WAL (write-ahead log) provides durability.

Write-around: Writes bypass the cache entirely and go directly to the database. The cache is populated only on subsequent reads (lazy). This prevents cache pollution from write-heavy data that is rarely re-read, but the first read after a write will always be a miss.

Write-Through vs Write-Back Side by Side

Left panel: write-through writes to cache AND database together. Right panel: write-back writes cache only, then flushes to DB asynchronously in the background.

Strategy	Write path	Consistency	Write latency	Risk
Write-through	Cache + DB (sync)	Always consistent	Higher (2 writes)	Cache churn, unused writes
Write-back	Cache only; DB async	Lag until flush	Very low	Data loss on crash
Write-around	DB only; cache on next read	Always consistent	Normal	First read always misses

Eviction Policies: LRU, LFU, and FIFO

A cache has finite memory, so when it fills up it must evict an existing entry to make room for a new one. The eviction policy determines which entry is removed. Choosing the right policy is a major lever on hit ratio.

LRU (Least Recently Used) evicts the entry that has not been accessed for the longest time. The intuition is that if you haven't touched something recently, you probably won't touch it again soon. LRU is the default in Redis (allkeys-lru or volatile-lru) and works well for most workloads. It is typically implemented with a doubly-linked list and a hash map so both access and eviction are O(1).

LFU (Least Frequently Used) evicts the entry with the lowest access count. It handles scan-resistant workloads better — a one-off large scan won't flush your hot data — but requires more bookkeeping (access counters that decay over time). Redis supports LFU with the allkeys-lfu policy. FIFO evicts the oldest-inserted entry regardless of access pattern; it is simple but rarely optimal.

LRU Cache Eviction in Action

Each access moves an entry to the front of the LRU queue. When the cache is full and a new entry arrives, the least-recently-used entry (at the back) is evicted.

Policy	Evicts	Best for	Weakness
LRU	Least recently accessed	General-purpose; most workloads	Scan sensitivity — a full scan evicts hot data
LFU	Least frequently accessed	Skewed access patterns; hot-cold data	More bookkeeping; slow-start for new entries
FIFO	Oldest inserted entry	Simple; predictable memory churn	Ignores access frequency and recency
TTL / Expiry	First entry whose TTL expires	Time-sensitive data (sessions, tokens)	Expiry storms if all keys expire at once

TTL and Cache Expiry

TTL (Time To Live) sets a maximum age on a cache entry. After the TTL elapses the entry is considered stale and is either deleted immediately or served with revalidation. TTL is the primary mechanism for controlling staleness: a short TTL (seconds to minutes) keeps data fresh at the cost of more origin hits; a long TTL (hours to days) maximizes the hit ratio but risks serving outdated data.

In Redis, every key can have an independent TTL set with EXPIRE key seconds or at write time via SET key value EX seconds. For HTTP caching, the Cache-Control: max-age=N header plays the same role — it tells the browser and any intermediate CDN how many seconds the response may be cached. When setting TTLs, add a small random jitter (e.g., TTL = base + rand(0, 30)) to avoid a thundering herd of simultaneous expirations hitting the origin at once.

Where Caches Live: The Caching Stack

Modern systems cache at every layer of the stack. Understanding each layer helps you pick the right tool:

Browser cache: The browser stores responses locally for static assets (JS, CSS, images) and even API responses if the server sets appropriate Cache-Control headers. A cache hit here costs zero network latency and zero server load — the best possible outcome.

CDN (Content Delivery Network): A geographically distributed network of edge nodes (Cloudflare, Fastly, AWS CloudFront) caches responses close to the user. A CDN hit avoids the round-trip to your origin server entirely, slashing latency from hundreds of milliseconds to single digits for users far from your data center. CDNs are essential for serving static assets and public API responses at scale.

Application-level cache (Redis / Memcached): The most flexible layer. Your application code stores computed results, database query outputs, session tokens, or rendered HTML fragments keyed by an arbitrary string. Redis adds persistence, pub/sub, and data structures; Memcached is simpler and slightly faster for pure key-value caching.

Database buffer pool: Relational databases (PostgreSQL, MySQL, InnoDB) maintain an in-memory buffer pool that caches frequently-accessed disk pages. This is automatic and transparent; tuning shared_buffers (PostgreSQL) or innodb_buffer_pool_size (MySQL) to use most of available RAM is one of the highest-ROI database tuning steps.

Cache Pitfalls: Staleness, Invalidation, and Stampede

Phil Karlton's famous quip — "There are only two hard things in Computer Science: cache invalidation and naming things" — remains true. The core challenge is keeping the cache consistent with the origin after data changes.

Staleness: A cache entry can outlive the data it represents. If a product price changes in the database but the cached price has a 1-hour TTL, users see the wrong price for up to an hour. The fix is shorter TTLs, active invalidation on write, or an event-driven approach where the application explicitly deletes or updates cache keys whenever the underlying data changes.

Cache stampede (thundering herd): When a popular cache entry expires, many concurrent requests simultaneously find a miss and all race to the origin to rebuild the entry. The origin is suddenly hit by N requests instead of one, potentially crashing it. Mitigations include: (1) probabilistic early recomputation — start rebuilding the entry slightly before expiry so only one request goes to the origin; (2) mutex / lock — the first request to find a miss acquires a lock and rebuilds; others wait for the lock to release and then read from cache; (3) stale-while-revalidate — serve the stale value while a background task refreshes it.

Cache penetration: Requests for keys that never exist in the cache or database (e.g., invalid IDs) always miss and hit the origin on every request. Cache a negative result (a sentinel value like null with a short TTL) or use a Bloom filter to quickly reject requests for non-existent keys before they reach the cache.

Frequently Asked Questions

When should I use Redis vs Memcached?

Choose Redis for almost every new project: it supports richer data structures (hashes, sorted sets, lists, streams), optional persistence to disk, Lua scripting, pub/sub messaging, and Cluster mode for horizontal sharding. Choose Memcached only if you need the absolute simplest possible key-value store and are already running it in production — it has marginally lower overhead for pure set/get workloads but lacks almost every feature Redis offers. For a greenfield system, Redis is the default.

How do I decide what TTL to set?

Start with the tolerance for staleness of the business domain: user profile data might be fine stale for 5 minutes (TTL=300); product prices might need 30 seconds; a live sports score might need 5 seconds. Then measure your origin's capacity — if it can handle 1 % miss-through, a longer TTL is fine; if it is CPU-bound, tighten TTL only where freshness is truly required. Always add ±10–20 % random jitter to avoid synchronised expiry storms. Static assets can have TTLs of days to months paired with cache-busting URL versioning (main.abc123.js) so invalidation is instant on deploy.

What is a cache stampede and how do I prevent it?

A cache stampede happens when a popular entry expires and many threads simultaneously find a miss, each independently querying the origin. The most practical prevention is a mutex lock: the first thread to detect a miss acquires a distributed lock (e.g., a Redis SET NX EX key), rebuilds the cache entry, releases the lock; all other threads either spin-wait and then read the now-populated cache, or are served a slightly stale value while the rebuild is in progress. The stale-while-revalidate HTTP directive implements this pattern at the CDN layer: the CDN serves the old cached value to all users while a single background request refreshes it from the origin.

A well-tuned cache is like compound interest — small improvements in hit ratio compound into massive reductions in latency and infrastructure cost. Get the invalidation strategy right from day one, and the rest follows naturally.
— alokknight Engineering