Service Discovery in System Design: Registry, Health Checks & Client vs Server-Side Discovery (Visualized)

Service discovery is the mechanism by which services in a distributed system automatically locate the network addresses of other services they need to communicate with, without relying on hardcoded IPs or manual configuration. In modern cloud and microservice architectures, instances are ephemeral — they scale up and down, crash and restart, and migrate across hosts — so any address you record today may be stale in minutes. Service discovery keeps that routing table accurate and up to date at all times.

The problem is deceptively simple on a single server but explodes in complexity at scale. Imagine 50 microservices, each running 3–10 replicas, deployed across an auto-scaling cluster. Every replica has a different IP. When order-service needs to call inventory-service, which IP should it use? What if the only healthy instance just moved to a different node? Without service discovery, every deploy becomes a manual DNS update — fragile, error-prone, and impossible to automate.

The Service Registry: The Central Source of Truth

At the heart of every service discovery system is a service registry — a database that maps service names to the IP addresses and ports of their live instances. When a new service instance boots, it registers itself with the registry, advertising its name, address, and port. When it shuts down gracefully, it deregisters. If it crashes, the registry detects the absence via health checks and removes the entry automatically.

A service registry entry typically stores: the service name (e.g., inventory-service), the host IP and port, optional metadata tags (version, region, environment), and a health status field. Well-known registry implementations include Consul (HashiCorp), etcd (CNCF, also powers Kubernetes), Apache Zookeeper, and Netflix Eureka. Kubernetes builds its own registry abstraction on top of etcd using Service objects managed by CoreDNS.

Service Registration and Client Query

A new instance boots, registers in the registry, and a client queries the registry to find it.

Health Checks and Automatic Deregistration

A service registry is only as useful as the accuracy of its data. An instance can die without sending a deregistration message — the process crashes, the network partitions, or the host goes offline. To handle this, registries use health checks: periodic probes sent by the registry (or by the instance itself, via heartbeats) to verify an instance is still alive. Common probe types include:

HTTP health endpoint: the registry calls GET /health and expects a 200 OK. TCP check: the registry opens a TCP socket — if it connects, the instance is alive. Heartbeat / TTL: the instance sends a keep-alive ping to the registry every N seconds; if the registry does not receive a ping within the TTL window, it marks the instance dead and removes it. This is the model used by Consul (with TTL checks) and Eureka (with 30-second heartbeats and a 90-second eviction timeout).

Health Check Failure and Automatic Deregistration

An instance stops responding. After missed health checks it is marked DEAD and removed — traffic reroutes to healthy peers.

The key insight is that health checks create a self-healing registry. Engineers do not need to manually remove failed instances from a load balancer config — the registry does it automatically when the instance stops passing probes. In Consul, you configure a health check interval (e.g., every 10 seconds) and a deregister critical service after threshold (e.g., 30 seconds). In Kubernetes, liveness probes and readiness probes on Pods serve an equivalent function, with the control plane removing unready pods from Endpoints before CoreDNS propagates the update.

Client-Side vs Server-Side Discovery

There are two fundamental patterns for how a calling service finds the right instance to contact. The choice affects where intelligence lives — in the client or in the infrastructure.

Client-Side Discovery

In client-side discovery, the calling service (client) queries the registry directly, retrieves the full list of healthy instances, and picks one itself using a load-balancing algorithm embedded in the client — usually round-robin or random. This is the model used by Netflix Ribbon (with Eureka) and by service mesh sidecars like Envoy when operating in client-side mode. The client caches the instance list and refreshes it periodically to avoid hammering the registry on every call.

Server-Side Discovery

In server-side discovery, the client sends a request to a well-known stable address — a load balancer or a DNS name — and the infrastructure picks which instance to route to. The client does not know or care about individual instance IPs. This is the pattern used by Kubernetes Services (which give every service a stable cluster IP backed by kube-proxy) and by AWS Elastic Load Balancers integrated with ECS service discovery. The routing intelligence lives in the infrastructure, not the client library.

Client-Side vs Server-Side Discovery

Left: client queries registry and picks an instance itself. Right: client calls a load balancer which picks the instance.

Comparing Client-Side vs Server-Side Discovery

	Client-Side Discovery	Server-Side Discovery
Who picks the instance	Client library (e.g., Ribbon)	Load balancer / kube-proxy / DNS
Registry awareness	Client queries registry directly	Client knows only a stable VIP or DNS name
Load balancing logic	In the client	In the infrastructure
Flexibility	High — client can apply custom algorithms	Lower — LB algorithm is centrally managed
Operational complexity	Each language/framework needs a registry client	Simple clients; complexity in infra
Examples	Netflix Ribbon + Eureka, Envoy sidecar	Kubernetes Services + CoreDNS, AWS ELB + ECS
Failure surface	Client cache may be stale briefly	LB is a central choke point (must be HA)

DNS-Based Service Discovery

DNS-based discovery leverages the Domain Name System to resolve a service name to one or more instance IPs. Instead of a proprietary registry API, services simply do a standard DNS lookup. The registry (or control plane) keeps the DNS records updated as instances come and go. Low TTLs (often 5–30 seconds) ensure stale records are not cached too long.

In Kubernetes, CoreDNS is the cluster DNS server. Every Service object automatically gets a DNS entry: inventory-service.default.svc.cluster.local resolves to the Service's stable cluster IP. For headless services (clusterIP: None), CoreDNS returns A records for every ready Pod IP directly — giving clients the full instance list for client-side selection. Consul also supports DNS-based discovery, letting any service call inventory-service.service.consul to get a healthy instance.

# Kubernetes: standard Service (server-side — CoreDNS resolves to stable ClusterIP)
apiVersion: v1
kind: Service
metadata:
  name: inventory-service
spec:
  selector:
    app: inventory
  ports:
    - port: 80
      targetPort: 8080

---
# Kubernetes: headless Service (client-side — DNS returns all Pod IPs)
apiVersion: v1
kind: Service
metadata:
  name: inventory-service-headless
spec:
  clusterIP: None          # no VIP; returns Pod A records
  selector:
    app: inventory
  ports:
    - port: 8080

Named Implementations at a Glance

Tool	Discovery Style	Health Checks	Notable Feature
Consul	Client-side (API) + DNS	HTTP, TCP, script, TTL heartbeat	Service mesh, KV store, multi-datacenter
etcd	Client-side (via lease/watch API)	TTL-based leases (self-managed)	Strong consistency (Raft), powers Kubernetes
Eureka (Netflix)	Client-side (Ribbon)	Heartbeat every 30 s; eviction at 90 s	Java-first; peer-to-peer registry replication
Kubernetes Services + CoreDNS	Server-side (kube-proxy/iptables) + DNS	Readiness probes on Pods	Native k8s; headless mode for client-side
AWS Cloud Map	Client-side (API) + Route 53 DNS	Route 53 health checks	Native AWS; integrates with ECS, EKS, Lambda
Zookeeper	Client-side (Curator recipes)	Ephemeral nodes (session-based)	Strong consistency; used by Kafka, Hadoop

Self-Registration vs Third-Party Registration

A subtle but important design choice is who registers a service instance. In self-registration, the instance calls the registry on startup (e.g., via a Consul agent sidecar or a Spring Cloud Netflix library). This is simple but couples the application code to the registry. In third-party registration, an external orchestration layer (the Kubernetes controller, a Nomad scheduler, or an AWS ECS agent) registers and deregisters instances on behalf of the service — the application code is registry-agnostic. Kubernetes uses third-party registration: when you create a Deployment, the control plane manages the Endpoints object, not the Pod itself.

Common Pitfalls in Service Discovery

Stale instance cache: clients that cache registry results too aggressively will route to dead instances after a crash. Always set a short cache TTL and implement retry-with-jitter on connection failure. Registry as a single point of failure: the registry must itself be highly available (Consul runs a 3- or 5-node Raft cluster; etcd runs a similar quorum). A registry outage does not mean immediate service disruption — clients can serve from cache — but no new registrations or deregistrations will propagate. Health check amplification: in large clusters, every registry node probing every service instance can create significant network overhead; use agent-based checks (each host's local Consul agent probes its own services) rather than centralised checks. DNS caching and negative TTLs: JVM-based services are notorious for aggressive DNS caching; always set networkaddress.cache.ttl=5 in JVM deployments talking to CoreDNS or Consul DNS.

Frequently Asked Questions

What is the difference between service discovery and load balancing?

Service discovery answers the question "where are the healthy instances of service X right now?" — it returns a list of addresses. Load balancing answers a follow-on question: "which one of those instances should I send this request to?" The two are complementary: discovery populates the pool, load balancing selects from it. In server-side discovery they are often bundled together (a load balancer both consults the registry and picks an instance), while in client-side discovery they are separate concerns handled by the client library.

How does Kubernetes handle service discovery without Consul or Eureka?

Kubernetes uses etcd as its backing store for all cluster state, including which Pods are healthy. The control plane maintains Endpoints objects that list the IPs of ready Pods for each Service. CoreDNS watches these Endpoints and serves DNS queries for <svc>.<ns>.svc.cluster.local. kube-proxy programs iptables or IPVS rules on each node to NAT traffic from the Service's stable virtual ClusterIP to one of the ready Pod IPs. From the application's perspective, a simple DNS lookup or a call to the ClusterIP is all that is needed — the entire discovery and routing mechanism is invisible to the application code.

Should I use client-side or server-side discovery for a new microservice project?

For greenfield projects running on Kubernetes, server-side discovery via Kubernetes Services and CoreDNS is the right default — it requires zero application code changes, works with any language, and is already battle-tested at massive scale. Client-side discovery is worth considering when you need custom load-balancing logic (e.g., latency-aware routing, circuit-breaking per instance) or when you are not on Kubernetes and want fine-grained control — in which case a service mesh like Istio or Linkerd provides client-side routing via Envoy sidecar proxies without burdening your application code directly.

A service registry is the phone book of your microservice fleet — keep it accurate, keep it highly available, and let health checks do the dirty work of evicting dead entries automatically.
— alokknight Engineering

The Service Registry: The Central Source of Truth

Service Registration and Client Query

A new instance boots, registers in the registry, and a client queries the registry to find it.

Health Checks and Automatic Deregistration

Health Check Failure and Automatic Deregistration

An instance stops responding. After missed health checks it is marked DEAD and removed — traffic reroutes to healthy peers.

Client-Side vs Server-Side Discovery

There are two fundamental patterns for how a calling service finds the right instance to contact. The choice affects where intelligence lives — in the client or in the infrastructure.

Client-Side Discovery

Server-Side Discovery

Client-Side vs Server-Side Discovery

Left: client queries registry and picks an instance itself. Right: client calls a load balancer which picks the instance.

Comparing Client-Side vs Server-Side Discovery

	Client-Side Discovery	Server-Side Discovery
Who picks the instance	Client library (e.g., Ribbon)	Load balancer / kube-proxy / DNS
Registry awareness	Client queries registry directly	Client knows only a stable VIP or DNS name
Load balancing logic	In the client	In the infrastructure
Flexibility	High — client can apply custom algorithms	Lower — LB algorithm is centrally managed
Operational complexity	Each language/framework needs a registry client	Simple clients; complexity in infra
Examples	Netflix Ribbon + Eureka, Envoy sidecar	Kubernetes Services + CoreDNS, AWS ELB + ECS
Failure surface	Client cache may be stale briefly	LB is a central choke point (must be HA)

DNS-Based Service Discovery

# Kubernetes: standard Service (server-side — CoreDNS resolves to stable ClusterIP)
apiVersion: v1
kind: Service
metadata:
  name: inventory-service
spec:
  selector:
    app: inventory
  ports:
    - port: 80
      targetPort: 8080

---
# Kubernetes: headless Service (client-side — DNS returns all Pod IPs)
apiVersion: v1
kind: Service
metadata:
  name: inventory-service-headless
spec:
  clusterIP: None          # no VIP; returns Pod A records
  selector:
    app: inventory
  ports:
    - port: 8080

Named Implementations at a Glance

Tool	Discovery Style	Health Checks	Notable Feature
Consul	Client-side (API) + DNS	HTTP, TCP, script, TTL heartbeat	Service mesh, KV store, multi-datacenter
etcd	Client-side (via lease/watch API)	TTL-based leases (self-managed)	Strong consistency (Raft), powers Kubernetes
Eureka (Netflix)	Client-side (Ribbon)	Heartbeat every 30 s; eviction at 90 s	Java-first; peer-to-peer registry replication
Kubernetes Services + CoreDNS	Server-side (kube-proxy/iptables) + DNS	Readiness probes on Pods	Native k8s; headless mode for client-side
AWS Cloud Map	Client-side (API) + Route 53 DNS	Route 53 health checks	Native AWS; integrates with ECS, EKS, Lambda
Zookeeper	Client-side (Curator recipes)	Ephemeral nodes (session-based)	Strong consistency; used by Kafka, Hadoop

Self-Registration vs Third-Party Registration

Common Pitfalls in Service Discovery

Frequently Asked Questions

What is the difference between service discovery and load balancing?

How does Kubernetes handle service discovery without Consul or Eureka?

Should I use client-side or server-side discovery for a new microservice project?

A service registry is the phone book of your microservice fleet — keep it accurate, keep it highly available, and let health checks do the dirty work of evicting dead entries automatically.
— alokknight Engineering