DNS in System Design: How Domain Name Resolution Works (Visualized)

DNS (Domain Name System) is the globally distributed, hierarchical naming system that translates human-readable domain names — such as api.example.com — into the IP addresses (93.184.216.34) that routers use to move packets across the internet. Without DNS every user would need to memorise numeric addresses; with it, the entire internet hides behind memorable names.

DNS is not a single server — it is a tree-shaped namespace replicated across hundreds of thousands of servers worldwide. Each lookup climbs that tree, but smart caching means most queries never leave your local network. At massive scale, the same DNS infrastructure that resolves names also drives traffic-routing strategies like GeoDNS and anycast, making DNS a core pillar of reliability engineering.

The DNS Resolution Chain

When your browser visits www.example.com for the first time, it cannot just ask one server — it walks a chain of four actors, each responsible for one layer of the namespace hierarchy:

1. Recursive Resolver — operated by your ISP or a public provider like Google (8.8.8.8) or Cloudflare (1.1.1.1). It does the legwork on your behalf, querying other servers and assembling the final answer. Your OS sends the query here first.

2. Root Nameserver — there are 13 logical root server addresses (A–M) mirrored to thousands of physical locations via anycast. They do not know where www.example.com lives, but they know who manages .com, so they refer the resolver to the right TLD nameserver.

3. TLD Nameserver — the Top-Level Domain server is authoritative for all names ending in .com, .org, .io, and so on. It refers the resolver to the nameservers that hold the actual zone for example.com.

4. Authoritative Nameserver — owned by the domain registrant (or their DNS provider like Route 53, Cloudflare DNS, or NS1). It holds the zone file with all records for example.com and returns the final answer: an IP address.

DNS Resolution Chain — Walking the Hierarchy

Watch a DNS query walk resolver → root → TLD → authoritative nameserver and return an IP address to the client. The active node is highlighted; the narration caption updates each phase.

This full walk — called an iterative resolution — happens only when neither your OS, browser, nor resolver has a cached answer. In practice, cached results short-circuit most of this work, which is why DNS feels instantaneous despite spanning the globe.

Caching and TTLs

Every DNS answer comes with a TTL (Time to Live) — a number of seconds the recipient is allowed to cache it. A TTL of 300 means any resolver (or browser) that receives the answer may reuse it for five minutes without asking again. DNS caching is layered: the browser has its own cache, the OS has one, and the recursive resolver has its own much larger cache shared across all its clients.

When you lower a TTL before a planned migration — say from 86400 (24 h) to 60 — you shrink the blast radius of an IP change. When the new IP is confirmed healthy you raise the TTL again to reduce resolver load. This dance is standard ops practice for zero-downtime migrations.

DNS Cache Hit vs. TTL Expiry

See how a cached answer short-circuits the full resolution chain until its TTL reaches zero, then triggers a fresh lookup. The TTL counter ticks down in real time.

Setting TTLs is a tradeoff: short TTLs (60–300 s) give you fast propagation for changes but increase load on authoritative servers. Long TTLs (3600–86400 s) reduce resolver load but mean a bad record stays alive in caches for a long time. Most production teams settle on 300 s for dynamic records and 86400 s for static infrastructure that never changes.

DNS Record Types

A DNS zone file is a list of resource records, each associating a name with a value and a type. The type tells the resolver what the value means. The most important record types for system design are:

A record — maps a hostname to an IPv4 address. The most fundamental record; every hostname you browse to ultimately resolves through one or more A records.

AAAA record — same as A but for IPv6 addresses (2001:db8::1). Modern dual-stack clients prefer AAAA when available.

CNAME record — an alias from one name to another. www.example.com CNAME example.com means resolvers look up example.com next. You cannot point a bare apex domain (example.com itself) at a CNAME — only subdomains. Some providers offer proprietary ALIAS or ANAME records that behave like a CNAME at the apex.

MX record — identifies the mail server(s) that accept email for the domain. Each MX record has a priority number; lower is preferred. Without correct MX records your domain cannot receive email.

NS record — delegates a zone to a set of authoritative nameservers. Changing NS records is how you transfer DNS hosting between providers.

TXT record — arbitrary text attached to a name. Used for SPF (email sender policy), DKIM keys, DMARC policy, domain ownership verification (Google Search Console, AWS Certificate Manager), and more.

DNS Record Types — What Each One Maps

Each record type cycles into focus showing its name, what it maps FROM and TO, and a real-world example. The highlighted row is the currently active type.

Record Type	Maps	Typical Use
A	Hostname → IPv4 address	Point a domain/subdomain to a server
AAAA	Hostname → IPv6 address	IPv6-capable endpoints, dual-stack
CNAME	Alias → canonical hostname	www subdomain, CDN endpoints
MX	Domain → mail server (+ priority)	Email delivery routing
NS	Zone → authoritative nameserver	Delegate DNS hosting to a provider
TXT	Name → arbitrary text	SPF, DKIM, DMARC, domain verification
SRV	Service → host + port + priority	SIP, XMPP, Kubernetes service discovery
PTR	IP address → hostname	Reverse DNS, spam filtering

DNS in Load Balancing and CDNs

DNS is not just for name resolution — it is also a powerful traffic-steering layer. Because the authoritative server can return different answers to different resolvers, you can implement sophisticated routing without changing anything on the client.

Round-robin DNS is the simplest form: an A record with multiple IP addresses. Each resolver receives the list in a different rotation, spreading requests across multiple servers. It is stateless, has no health-checking, and is easy to configure. Its major weakness is that a server can go down and DNS will keep sending traffic to it until the TTL expires.

GeoDNS extends this by inspecting the source IP of the resolver (or, with EDNS Client Subnet enabled, the approximate client IP). A query from Europe gets a European data-centre IP; a query from Asia gets an Asian one. This dramatically cuts latency because the client connects to the nearest PoP without any application-level knowledge of geography.

Weighted DNS lets you return IP A 90% of the time and IP B 10%, enabling canary deployments or blue-green traffic shifts at the DNS layer — no load balancer reconfig needed.

Anycast takes a different approach: the same IP address is announced from multiple locations simultaneously by BGP. Routers naturally direct each packet to the topologically nearest site. Cloudflare and Google Public DNS operate entirely this way — 1.1.1.1 is not one server; it is hundreds, all sharing the same IP.

Technique	How it works	Health-aware?	Typical use
Round-robin DNS	Multiple IPs rotated in answer	No	Simple horizontal scaling
GeoDNS	Different IPs by resolver geography	Optional	Multi-region latency routing
Weighted DNS	IPs returned with probabilistic weights	Optional	Canary / blue-green deploys
Anycast	Same IP announced from many sites via BGP	Yes (BGP withdrawal)	Public DNS, DDoS mitigation, CDN PoPs
DNS + Health checks	Authoritative removes unhealthy IPs from answer	Yes	High-availability primary DNS

Practical DNS at Scale: What Engineers Must Know

Propagation delay is not DNS propagation — the term is misleading. DNS records do not push out to resolvers; resolvers pull them when their cache expires. What you wait for is cached TTLs to drain from every resolver in the world. Lowering your TTL 48 hours before a change is the only reliable way to control this window.

DNSSEC adds cryptographic signatures to DNS responses, so resolvers can verify that an answer was not tampered with in transit. It defends against cache poisoning attacks (the Kaminsky attack) but adds operational complexity. Major registries and resolvers now broadly support it.

DNS over HTTPS (DoH) and DNS over TLS (DoT) encrypt the DNS query itself, hiding it from network eavesdroppers and ISPs. Browsers like Chrome and Firefox can bypass the OS resolver entirely and query DoH endpoints directly.

Negative caching — a NXDOMAIN (non-existent domain) response is also cached, with its own TTL from the SOA record's minimum field. This prevents storms of repeated lookups for typos or deleted records from flooding authoritative servers.

# Query a specific record type
dig A www.example.com @8.8.8.8

# Trace the full resolution chain manually
dig +trace www.example.com

# Check current TTL on a cached record (shows TTL countdown)
dig A www.example.com @1.1.1.1 | grep -E 'ANSWER|IN A'

# Reverse DNS lookup (PTR record)
dig -x 93.184.216.34

# Check MX records for email routing
dig MX example.com

# Verify SPF TXT record
dig TXT example.com | grep spf

Frequently Asked Questions

Why does DNS propagation take up to 48 hours?

The 48-hour figure comes from the old convention of setting record TTLs to 86400 seconds (24 hours) or higher. When you change a record, every recursive resolver that has the old answer cached will keep serving it until its locally-stored TTL expires. There is no mechanism to push an invalidation to all resolvers simultaneously — you can only wait for their caches to drain. If you lower your TTL to 300 s (5 minutes) at least 24 hours before making a change, the propagation window shrinks dramatically. After the change is stable, raise the TTL back to reduce load.

What is the difference between a recursive resolver and an authoritative nameserver?

A recursive resolver (like 8.8.8.8 or 1.1.1.1) is a general-purpose query engine. It accepts questions from clients, does not usually hold zone data, and assembles answers by querying the hierarchy on the client's behalf — caching results along the way. An authoritative nameserver is the definitive source of truth for a specific domain zone. It holds the actual records (A, MX, etc.) and answers with authority for that zone only, without forwarding or caching. Most domains use a managed authoritative DNS service (Route 53, Cloudflare DNS, NS1) rather than self-hosting their authoritative NS.

Can DNS alone provide high availability?

DNS can contribute significantly to availability but is not sufficient on its own. DNS health-checking (as offered by Route 53, Cloudflare, or NS1) can detect a failed endpoint and remove its IP from answers within seconds — but only after the TTL of the old answer expires in caches. Anycast helps because a BGP route withdrawal is faster than a TTL drain. In practice, production systems layer DNS-level failover with application-layer load balancers and health checks: DNS routes between data centres or regions, while a load balancer within each region handles individual server failures. DNS alone struggles with fast (<10 s) failover because of TTL propagation delays and because some clients (especially mobile devices and corporate proxies) ignore TTLs.

DNS is the silent first responder of every internet request: it runs before a single byte of your application is touched, and its caching, routing, and security properties determine whether your system is fast, resilient, or compromised. Treat your DNS configuration with the same care as your database schema.
— alokknight Engineering