CDN Caching in System Design: Edge Nodes, Cache-Control Headers, Invalidation & Origin Shield (Visualized)
A CDN caches your content at edge nodes worldwide so every user gets sub-50 ms responses regardless of where your origin server lives. This guide covers how edge caching works, Cache-Control and ETag headers, cache-key design, hit ratio, purging, origin shield, and stale-while-revalidate โ with live animations.
CDN caching is the practice of storing copies of web assets (HTML, images, JavaScript, API responses) at geographically distributed edge nodes so that subsequent requests are served from the nearest node rather than travelling all the way back to the origin server. The result is dramatically lower latency, reduced origin load, and resilience during traffic spikes.
Without a CDN, every request made by a user in Tokyo to an origin server in Virginia crosses roughly 14,000 km of fibre โ adding 140โ200 ms of round-trip latency before a single byte is sent. A CDN solves this by placing edge nodes (also called Points of Presence, or PoPs) in Tokyo, Singapore, Frankfurt, and dozens of other cities, so the same request is served in under 10 ms from a local cache.
How CDN Edge Caching Works: HIT vs MISS
When a user requests a resource, the request first reaches the nearest CDN edge node. The edge checks its local cache: if the resource is present and fresh (cache HIT), the edge returns it immediately. If not (cache MISS), the edge forwards the request to the origin server, stores the response in its local cache with a TTL, and then delivers it to the user. Every subsequent request from any user hitting that same edge node is served from cache โ the origin is only consulted once per TTL window per edge.
The cache hit ratio โ the fraction of requests served from edge cache without hitting the origin โ is the single most important CDN metric. A hit ratio of 95 % means your origin only processes 5 % of total traffic. Hit ratio is determined by TTL length, traffic volume (popular content warms fast), cache key specificity, and whether resources vary by query string or cookie.
Cache-Control, Expires, and TTL
The origin server instructs both browsers and CDN edges how long to keep a response cached via HTTP response headers. The two primary mechanisms are Cache-Control (HTTP/1.1, flexible) and the older Expires (HTTP/1.0, absolute date). When both are present, Cache-Control wins. The TTL (Time-to-Live) dictates how many seconds a cached copy is considered fresh; after expiry the edge must revalidate or re-fetch the asset.
Key Cache-Control directives include: max-age=N (fresh for N seconds), s-maxage=N (overrides max-age for shared caches like CDN edges), no-cache (must revalidate before serving), no-store (never cache โ bank pages, personalised dashboards), public (any cache may store it), private (browser-only, no CDN), and immutable (asset will never change; skip revalidation even when stale). ETags (entity tags) are fingerprints of the response body: on revalidation the edge sends If-None-Match: <etag> and the origin replies 304 Not Modified (no body transfer) if the content is unchanged, cutting bandwidth dramatically.
# Nginx origin โ typical CDN-friendly response headers
# Versioned JS/CSS bundle (hash in filename) โ cache forever
Cache-Control: public, max-age=31536000, immutable
# HTML page โ short TTL at edge, revalidate with ETag
Cache-Control: public, s-maxage=60, max-age=0, must-revalidate
ETag: "a3f2b9c1"
# API JSON โ serve stale for 1 s while revalidating in background
Cache-Control: public, s-maxage=30, stale-while-revalidate=1
# Private user dashboard โ browser only, never at CDN edge
Cache-Control: private, no-storeCache Keys and Vary
A cache key is the string the CDN uses to look up a stored response. By default the key is the full URL including query string (https://example.com/image.png?v=42). CDNs let you customise the key โ stripping UTM parameters that do not affect the response (boosting hit ratio), or adding request headers like Accept-Encoding or Accept-Language to serve the right variant.
The origin can also instruct caches to vary the stored copy by header with the Vary response header: Vary: Accept-Encoding causes the edge to store a separate compressed and uncompressed copy. Overusing Vary (e.g., Vary: Cookie) fragments the cache so severely that the hit ratio collapses โ avoid it for anything cacheable at the CDN layer.
Cache Invalidation and Purging
When you deploy new content you sometimes cannot wait for the TTL to expire โ you need the edge to serve fresh content right now. Cache purging (also called invalidation) is the act of telling CDN edges to delete their cached copy of a URL, prefix, or tag so the next request goes back to the origin. Most CDN APIs accept a list of URLs or a wildcard path; enterprise CDNs support surrogate keys (cache tags) that let you invalidate by arbitrary dimension โ for example, all pages referencing a specific product ID.
A common best practice is to never rely solely on TTL-based expiry for mutable content. Instead, use cache-busting: embed a content hash in asset filenames (main.a3f2b9c1.js) and set an immutable, year-long TTL. Because the URL changes on every deploy, old URLs naturally expire and new ones are cached fresh โ purge APIs are only needed for HTML and API responses.
Origin Shield
During a cache MISS, every edge node that does not have the asset independently fires a request to the origin โ a thundering herd at deploy time or when a popular resource expires. Origin shield (also called a mid-tier cache, or shield PoP) adds a second caching layer between the edges and the origin: edge MISSes hit the shield first, and only one request per shield node ever reaches the true origin. This can reduce origin request volume by 80โ99 % on large CDN networks and dramatically reduces the blast radius of a cache purge.
Caching Static vs Dynamic Content
Static content (images, fonts, JS/CSS bundles, videos) is the easiest to cache: it is identical for all users, changes only on deploy, and can safely have TTLs measured in days or years. Dynamic content (HTML pages, API responses, personalised feeds) is harder โ it may vary by user session, geography, A/B test bucket, or query parameter. Strategies here include: short TTLs (5โ60 s) to bound staleness; edge-side includes (ESI) to assemble cached fragments; request coalescing so concurrent MISSes share a single origin fetch; and stale-while-revalidate to serve stale content instantly while a background refresh happens.
| Content type | Typical TTL | Cache-Control example | Invalidation strategy |
|---|---|---|---|
| Versioned JS/CSS (hashed) | 1 year | public, max-age=31536000, immutable | URL changes on deploy โ no purge needed |
| Images / fonts | 30 days | public, max-age=2592000 | Purge or version URL on update |
| HTML pages | 60 s | public, s-maxage=60, must-revalidate | Purge API on deploy |
| REST API (public, slow-changing) | 30 s | public, s-maxage=30, stale-while-revalidate=5 | Surrogate-key purge on data change |
| User-personalised API | 0 | private, no-store | Never cached at CDN edge |
| Streaming / WebSocket | n/a | no-store | Bypass CDN entirely |
stale-while-revalidate
stale-while-revalidate (SWR) is a Cache-Control extension defined in RFC 5861 that lets an edge serve a stale response immediately โ achieving zero added latency โ while simultaneously issuing a background revalidation request to the origin. The next request after revalidation completes gets the fresh copy. Example: Cache-Control: public, s-maxage=30, stale-while-revalidate=10 means the edge serves the response as fresh for 30 s; for the next 10 s after expiry it serves stale while refreshing in the background; after 40 s it must synchronously revalidate before responding. SWR dramatically improves p99 latency for semi-dynamic content like news feeds or product listings.
CDN Caching vs Browser Caching
Both CDN edges and browsers cache HTTP responses, but they serve different roles. The browser cache is private (one user, one machine) and removes origin traffic only for repeat visits from the same user. The CDN edge cache is shared โ one cached copy serves all users hitting that edge node. The s-maxage directive targets only shared caches (CDN), allowing you to set a short browser TTL (so users always get fresh content after a hard reload) while keeping a longer edge TTL (so the CDN absorbs traffic). Use private to prevent CDN caching while still allowing browser caching.
| Dimension | CDN Edge Cache | Browser Cache |
|---|---|---|
| Scope | Shared โ serves all users at that PoP | Private โ serves one user on one device |
| TTL directive | s-maxage (overrides max-age) | max-age |
| Capacity | Gigabytes to terabytes per PoP | Hundreds of MB per browser profile |
| Invalidation | Purge API / surrogate keys | Hard reload, Cache-Control: no-cache |
| When bypassed | Cache-Control: private / no-store | Cache-Control: no-store |
| Primary benefit | Offloads origin, reduces latency globally | Eliminates network round-trip for repeat visits |
Frequently Asked Questions
What is the difference between a CDN cache hit and a cache miss?
A cache hit occurs when a CDN edge node has a fresh, valid copy of the requested resource and serves it directly to the user โ no origin request is made. The response arrives in single-digit milliseconds regardless of where the origin server is. A cache miss occurs when the edge does not have the resource (or its cached copy has expired): the edge must forward the request to the origin (or origin shield), wait for the response, cache it, and then deliver it. The miss adds the full round-trip latency to the origin, which is why a high hit ratio is essential to CDN performance.
How do I cache API responses on a CDN without serving stale personalised data?
Separate public from private API endpoints. Public endpoints โ product listings, blog posts, currency rates โ can safely use Cache-Control: public, s-maxage=30, stale-while-revalidate=5 with short TTLs. Personalised endpoints must return Cache-Control: private, no-store so CDN edges never cache them; they bypass the CDN entirely or are handled at the edge with token-based authentication. A common architecture uses a public API path (/api/v1/products) for cacheable data and an authenticated path (/api/v1/me/cart) with no-store, giving you CDN benefits where safe and privacy where required.
How do CDN providers charge for caching, and does a high hit ratio save money?
Most CDN providers charge primarily for egress bandwidth (data transferred from edge to end users) plus, in some cases, a per-request fee and origin transfer costs. A higher cache hit ratio reduces origin egress (which is typically billed at a higher rate than CDN egress) and can reduce total origin infrastructure costs. However, CDN egress itself is not free, so the economics depend on comparing CDN bandwidth pricing against the combined cost of origin compute, bandwidth, and latency penalty. For most high-traffic sites the math strongly favours a CDN: cloud egress costs $0.05โ$0.15 / GB while CDN egress is $0.008โ$0.04 / GB, with the added benefit of lower latency and higher availability.
Your CDN hit ratio is your system's heartbeat โ push it above 95 % and your origin becomes a cache-warmer rather than a traffic handler.
โ alokknight Engineering
