Load Balancing in System Design: Algorithms, Health Checks & Failover (Visualized)
A load balancer distributes incoming traffic across multiple servers so no single machine is overwhelmed, improving reliability and scalability. This guide covers the core algorithms, health checks, failover, and L4 vs L7 โ with live animations of each.
Load balancing is the practice of distributing incoming network requests across a pool of backend servers so that no single server becomes a bottleneck. It is the foundation of horizontal scaling: instead of buying one bigger machine, you add more machines and spread the work across them.
A load balancer sits between your clients and your servers. Clients connect to a single address; the balancer forwards each request to one healthy backend. The result is higher throughput, better fault tolerance, and the ability to add or remove servers without downtime.
How a Load Balancer Works
Every load balancer does three jobs: (1) accept client connections at a single virtual address, (2) choose a backend server using a scheduling algorithm, and (3) continuously check backend health so dead servers are skipped. The algorithm in step 2 is where most of the interesting decisions live.
Algorithm 1: Round-Robin
Round-robin hands each new request to the next server in rotation: server 1, then 2, then 3, then back to 1. It is the simplest strategy and works well when all servers are identical and every request costs roughly the same amount of work.
Weighted round-robin is a common variation: each server is assigned a weight proportional to its capacity, so a server with weight 3 receives three times as many requests as a server with weight 1. Use it when your fleet has mixed hardware.
Algorithm 2: Least Connections
Round-robin breaks down when requests take wildly different amounts of time โ one slow request can pile up on a server while round-robin keeps sending it more. Least-connections fixes this by routing each new request to the server with the fewest active connections, so load naturally evens out.
Use least-connections for long-lived or uneven workloads โ WebSocket connections, streaming, slow database-backed endpoints โ where request duration varies a lot.
Algorithm 3: Hash-Based (Sticky) Routing
Sometimes you want the same client to always reach the same server โ for in-memory session state or cache locality. IP hashing or consistent hashing routes based on a hash of the client IP (or a key), so the same input always maps to the same server. Consistent hashing further minimizes how many keys move when servers are added or removed.
Health Checks & Automatic Failover
A load balancer continuously probes each backend (an HTTP GET /health, a TCP connect, etc.). If a server stops responding, it is pulled out of rotation automatically, so a single dead machine never takes down the service. When the server recovers, it is added back. This is what turns a pile of servers into a self-healing service.
Layer 4 vs Layer 7
Layer 4 (transport) load balancers route by IP and port. They are extremely fast and protocol-agnostic, but blind to request content. Layer 7 (application) load balancers understand HTTP, so they can route by URL path (/api vs /static), host header, cookies, or headers โ at a small CPU cost.
| Layer 4 | Layer 7 | |
|---|---|---|
| Routes by | IP + port | HTTP path, host, headers, cookies |
| Speed | Very fast | Slightly slower (parses HTTP) |
| Features | Basic forwarding | Path routing, TLS termination, rewrites |
| Examples | AWS NLB, IPVS | Nginx, HAProxy, AWS ALB, Envoy |
Choosing an Algorithm
| Algorithm | How it picks | Best for |
|---|---|---|
| Round-robin | Next server in rotation | Uniform servers, similar request cost |
| Weighted round-robin | Proportional to capacity | Mixed hardware sizes |
| Least connections | Fewest active connections | Long-lived / uneven requests |
| IP / consistent hash | Hash of client or key | Sticky sessions, cache locality |
Common Pitfalls
Single point of failure: the load balancer itself must be redundant (active-passive or active-active), or it becomes the thing that takes you down. Sticky sessions everywhere: they undermine even load distribution and make failover lose state โ prefer stateless servers with shared session storage. No health checks: without them, the balancer happily forwards traffic into a black hole.
Frequently Asked Questions
What is the difference between a load balancer and a reverse proxy?
A reverse proxy forwards client requests to backend servers and can add caching, TLS termination, and compression. A load balancer is a reverse proxy whose primary job is distributing traffic across many backends. In practice tools like Nginx and HAProxy are both.
Which load balancing algorithm is best?
There is no single best โ it depends on your traffic. Use round-robin for uniform servers and short requests, least-connections for long-lived or variable requests, and hash-based routing when you need stickiness. Weighted variants handle mixed hardware.
Does a load balancer improve availability?
Yes. By spreading traffic across redundant servers and removing unhealthy ones via health checks, a load balancer lets the service survive individual server failures โ provided the balancer itself is made redundant.
A load balancer turns a pile of servers into a single, resilient service. Add health checks and the whole thing heals itself.
โ alokknight Engineering
