CDN Caching in System Design: Edge Nodes, Cache-Control Headers, Invalidation & Origin Shield (Visualized)

CDN caching is the practice of storing copies of web assets (HTML, images, JavaScript, API responses) at geographically distributed edge nodes so that subsequent requests are served from the nearest node rather than travelling all the way back to the origin server. The result is dramatically lower latency, reduced origin load, and resilience during traffic spikes.

Without a CDN, every request made by a user in Tokyo to an origin server in Virginia crosses roughly 14,000 km of fibre — adding 140–200 ms of round-trip latency before a single byte is sent. A CDN solves this by placing edge nodes (also called Points of Presence, or PoPs) in Tokyo, Singapore, Frankfurt, and dozens of other cities, so the same request is served in under 10 ms from a local cache.

How CDN Edge Caching Works: HIT vs MISS

When a user requests a resource, the request first reaches the nearest CDN edge node. The edge checks its local cache: if the resource is present and fresh (cache HIT), the edge returns it immediately. If not (cache MISS), the edge forwards the request to the origin server, stores the response in its local cache with a TTL, and then delivers it to the user. Every subsequent request from any user hitting that same edge node is served from cache — the origin is only consulted once per TTL window per edge.

Cache HIT vs Cache MISS at the edge

Watch a MISS travel to the far origin then populate the edge; the next request is a HIT served instantly from the edge.

The cache hit ratio — the fraction of requests served from edge cache without hitting the origin — is the single most important CDN metric. A hit ratio of 95 % means your origin only processes 5 % of total traffic. Hit ratio is determined by TTL length, traffic volume (popular content warms fast), cache key specificity, and whether resources vary by query string or cookie.

Cache-Control, Expires, and TTL

The origin server instructs both browsers and CDN edges how long to keep a response cached via HTTP response headers. The two primary mechanisms are Cache-Control (HTTP/1.1, flexible) and the older Expires (HTTP/1.0, absolute date). When both are present, Cache-Control wins. The TTL (Time-to-Live) dictates how many seconds a cached copy is considered fresh; after expiry the edge must revalidate or re-fetch the asset.

Key Cache-Control directives include: max-age=N (fresh for N seconds), s-maxage=N (overrides max-age for shared caches like CDN edges), no-cache (must revalidate before serving), no-store (never cache — bank pages, personalised dashboards), public (any cache may store it), private (browser-only, no CDN), and immutable (asset will never change; skip revalidation even when stale). ETags (entity tags) are fingerprints of the response body: on revalidation the edge sends If-None-Match: <etag> and the origin replies 304 Not Modified (no body transfer) if the content is unchanged, cutting bandwidth dramatically.

# Nginx origin — typical CDN-friendly response headers

# Versioned JS/CSS bundle (hash in filename) — cache forever
Cache-Control: public, max-age=31536000, immutable

# HTML page — short TTL at edge, revalidate with ETag
Cache-Control: public, s-maxage=60, max-age=0, must-revalidate
ETag: "a3f2b9c1"

# API JSON — serve stale for 1 s while revalidating in background
Cache-Control: public, s-maxage=30, stale-while-revalidate=1

# Private user dashboard — browser only, never at CDN edge
Cache-Control: private, no-store

Cache-Control TTL lifecycle at the edge

Watch the edge cache age from FRESH → STALE → revalidation with the origin, then become FRESH again.

Cache Keys and Vary

A cache key is the string the CDN uses to look up a stored response. By default the key is the full URL including query string (https://example.com/image.png?v=42). CDNs let you customise the key — stripping UTM parameters that do not affect the response (boosting hit ratio), or adding request headers like Accept-Encoding or Accept-Language to serve the right variant.

The origin can also instruct caches to vary the stored copy by header with the Vary response header: Vary: Accept-Encoding causes the edge to store a separate compressed and uncompressed copy. Overusing Vary (e.g., Vary: Cookie) fragments the cache so severely that the hit ratio collapses — avoid it for anything cacheable at the CDN layer.

Cache Invalidation and Purging

When you deploy new content you sometimes cannot wait for the TTL to expire — you need the edge to serve fresh content right now. Cache purging (also called invalidation) is the act of telling CDN edges to delete their cached copy of a URL, prefix, or tag so the next request goes back to the origin. Most CDN APIs accept a list of URLs or a wildcard path; enterprise CDNs support surrogate keys (cache tags) that let you invalidate by arbitrary dimension — for example, all pages referencing a specific product ID.

Cache purge invalidation flow

A deploy triggers a purge API call; edges clear their cache; the next user request causes a MISS that re-fetches fresh content from origin.

A common best practice is to never rely solely on TTL-based expiry for mutable content. Instead, use cache-busting: embed a content hash in asset filenames (main.a3f2b9c1.js) and set an immutable, year-long TTL. Because the URL changes on every deploy, old URLs naturally expire and new ones are cached fresh — purge APIs are only needed for HTML and API responses.

Origin Shield

During a cache MISS, every edge node that does not have the asset independently fires a request to the origin — a thundering herd at deploy time or when a popular resource expires. Origin shield (also called a mid-tier cache, or shield PoP) adds a second caching layer between the edges and the origin: edge MISSes hit the shield first, and only one request per shield node ever reaches the true origin. This can reduce origin request volume by 80–99 % on large CDN networks and dramatically reduces the blast radius of a cache purge.

Caching Static vs Dynamic Content

Static content (images, fonts, JS/CSS bundles, videos) is the easiest to cache: it is identical for all users, changes only on deploy, and can safely have TTLs measured in days or years. Dynamic content (HTML pages, API responses, personalised feeds) is harder — it may vary by user session, geography, A/B test bucket, or query parameter. Strategies here include: short TTLs (5–60 s) to bound staleness; edge-side includes (ESI) to assemble cached fragments; request coalescing so concurrent MISSes share a single origin fetch; and stale-while-revalidate to serve stale content instantly while a background refresh happens.

Content type	Typical TTL	Cache-Control example	Invalidation strategy
Versioned JS/CSS (hashed)	1 year	public, max-age=31536000, immutable	URL changes on deploy — no purge needed
Images / fonts	30 days	public, max-age=2592000	Purge or version URL on update
HTML pages	60 s	public, s-maxage=60, must-revalidate	Purge API on deploy
REST API (public, slow-changing)	30 s	public, s-maxage=30, stale-while-revalidate=5	Surrogate-key purge on data change
User-personalised API	0	private, no-store	Never cached at CDN edge
Streaming / WebSocket	n/a	no-store	Bypass CDN entirely

stale-while-revalidate

stale-while-revalidate (SWR) is a Cache-Control extension defined in RFC 5861 that lets an edge serve a stale response immediately — achieving zero added latency — while simultaneously issuing a background revalidation request to the origin. The next request after revalidation completes gets the fresh copy. Example: Cache-Control: public, s-maxage=30, stale-while-revalidate=10 means the edge serves the response as fresh for 30 s; for the next 10 s after expiry it serves stale while refreshing in the background; after 40 s it must synchronously revalidate before responding. SWR dramatically improves p99 latency for semi-dynamic content like news feeds or product listings.

CDN Caching vs Browser Caching

Both CDN edges and browsers cache HTTP responses, but they serve different roles. The browser cache is private (one user, one machine) and removes origin traffic only for repeat visits from the same user. The CDN edge cache is shared — one cached copy serves all users hitting that edge node. The s-maxage directive targets only shared caches (CDN), allowing you to set a short browser TTL (so users always get fresh content after a hard reload) while keeping a longer edge TTL (so the CDN absorbs traffic). Use private to prevent CDN caching while still allowing browser caching.

Dimension	CDN Edge Cache	Browser Cache
Scope	Shared — serves all users at that PoP	Private — serves one user on one device
TTL directive	s-maxage (overrides max-age)	max-age
Capacity	Gigabytes to terabytes per PoP	Hundreds of MB per browser profile
Invalidation	Purge API / surrogate keys	Hard reload, Cache-Control: no-cache
When bypassed	Cache-Control: private / no-store	Cache-Control: no-store
Primary benefit	Offloads origin, reduces latency globally	Eliminates network round-trip for repeat visits

Frequently Asked Questions

What is the difference between a CDN cache hit and a cache miss?

A cache hit occurs when a CDN edge node has a fresh, valid copy of the requested resource and serves it directly to the user — no origin request is made. The response arrives in single-digit milliseconds regardless of where the origin server is. A cache miss occurs when the edge does not have the resource (or its cached copy has expired): the edge must forward the request to the origin (or origin shield), wait for the response, cache it, and then deliver it. The miss adds the full round-trip latency to the origin, which is why a high hit ratio is essential to CDN performance.

How do I cache API responses on a CDN without serving stale personalised data?

Separate public from private API endpoints. Public endpoints — product listings, blog posts, currency rates — can safely use Cache-Control: public, s-maxage=30, stale-while-revalidate=5 with short TTLs. Personalised endpoints must return Cache-Control: private, no-store so CDN edges never cache them; they bypass the CDN entirely or are handled at the edge with token-based authentication. A common architecture uses a public API path (/api/v1/products) for cacheable data and an authenticated path (/api/v1/me/cart) with no-store, giving you CDN benefits where safe and privacy where required.

How do CDN providers charge for caching, and does a high hit ratio save money?

Most CDN providers charge primarily for egress bandwidth (data transferred from edge to end users) plus, in some cases, a per-request fee and origin transfer costs. A higher cache hit ratio reduces origin egress (which is typically billed at a higher rate than CDN egress) and can reduce total origin infrastructure costs. However, CDN egress itself is not free, so the economics depend on comparing CDN bandwidth pricing against the combined cost of origin compute, bandwidth, and latency penalty. For most high-traffic sites the math strongly favours a CDN: cloud egress costs $0.05–$0.15 / GB while CDN egress is $0.008–$0.04 / GB, with the added benefit of lower latency and higher availability.

Your CDN hit ratio is your system's heartbeat — push it above 95 % and your origin becomes a cache-warmer rather than a traffic handler.
— alokknight Engineering