WebSockets in System Design: The Handshake, Full-Duplex Frames & Scaling (Visualized)
WebSockets give you a single, persistent, full-duplex connection between a client and a server so both sides can push data the instant it happens. This guide covers the HTTP Upgrade handshake, frames, ws:// vs wss://, how it beats polling, and the real challenges of scaling them โ with live animations of each idea.
A WebSocket is a persistent, bidirectional communication channel between a client and a server that runs over a single long-lived TCP connection. Once it is open, either side can send a message at any time without making a new request โ the server can push data to the browser the instant something changes, instead of waiting to be asked.
This is what makes WebSockets the backbone of real-time features: chat, live dashboards, multiplayer games, collaborative editors, and trading tickers. Plain HTTP is request/response โ the client always speaks first. WebSockets break that rule, giving you a symmetric pipe where both ends are equals.
The Problem: HTTP Has to Keep Asking
Before WebSockets, the only way to get "live" data in a browser was to fake it. With short polling, the client sends a request every few seconds asking "any updates?" โ most of those requests come back empty, wasting bandwidth and adding latency equal to the polling interval. Long polling improves this: the server holds the request open until it has something to say, then responds and the client immediately reconnects. It works, but every message still costs a full HTTP round trip and a new request.
The Solution: The HTTP Upgrade Handshake
A WebSocket connection does not start as something exotic โ it starts as an ordinary HTTP GET request that politely asks to be upgraded. The client sends headers like Upgrade: websocket, Connection: Upgrade, and a random Sec-WebSocket-Key. If the server agrees, it replies with status 101 Switching Protocols and a matching Sec-WebSocket-Accept. From that moment the TCP connection stops speaking HTTP and starts speaking the WebSocket framing protocol โ the same socket, a new language.
GET /chat HTTP/1.1
Host: app.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
--- server replies ---
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=Frames, Full-Duplex, and ws:// vs wss://
After the handshake, data travels as frames rather than HTTP requests. A frame has a tiny header โ an opcode (text, binary, ping, pong, close), a length field, and (from the client) a masking key โ followed by the payload. Because there is no per-message HTTP overhead, frames are cheap: a few bytes wrap each message. The connection is full-duplex, meaning the client and server can send frames simultaneously without taking turns.
Just as HTTP has HTTPS, WebSockets come in two flavours. ws:// is plaintext; wss:// runs the same protocol over TLS, encrypting every frame. In production you should always use wss:// โ not only for confidentiality, but because corporate proxies and firewalls are far more likely to let an encrypted upgrade pass through cleanly.
WebSockets vs Polling, Long-Polling, and SSE
WebSockets are not the only way to get data to a browser quickly. Server-Sent Events (SSE) keep one HTTP connection open and stream events from server to client โ but they are one-directional (server to client only) and text-only. WebSockets are bidirectional and support binary. The table below compares the four common approaches.
| Short polling | Long polling | SSE | WebSocket | |
|---|---|---|---|---|
| Direction | Client pull | Client pull | Server to client | Bidirectional |
| Connection | New per request | Re-opened each msg | One held HTTP stream | One persistent socket |
| Latency | Up to poll interval | Low | Low | Lowest |
| Overhead | High (full HTTP each time) | Medium | Low | Lowest (tiny frames) |
| Payload | Text/binary | Text/binary | Text only | Text + binary |
| Best for | Rare updates | Occasional updates | Live feeds, notifications | Chat, games, trading |
Keeping Connections Alive: Heartbeats
A persistent connection can silently die โ a laptop sleeps, a proxy times out an idle socket, a phone switches from Wi-Fi to cellular โ without either side noticing. The WebSocket protocol solves this with ping/pong control frames: the server periodically sends a ping and expects a pong back within a timeout. Miss too many pongs and the connection is declared dead and closed, freeing resources and triggering the client to reconnect. These heartbeats are essential for detecting half-open connections that would otherwise leak forever.
Scaling WebSockets: The Hard Part
HTTP requests are stateless and short, so any server can handle any request. WebSockets are the opposite: a connection is pinned to one server for its entire lifetime, sometimes for hours. This creates three distinct scaling problems.
Sticky sessions: the load balancer must keep routing each client to the server holding its socket, since the connection state lives in that one process. Connection limits: each open socket consumes a file descriptor and memory, so a single box tops out at tens or hundreds of thousands of connections โ you scale out, not up. Fan-out: the hardest one. If 50,000 chat users are spread across 10 servers and someone posts a message, the server that received it only holds a fraction of the recipients. It must broadcast to the other servers โ usually through a publish/subscribe backplane like Redis Pub/Sub or Kafka โ so every server can deliver to its own local connections.
Tooling: Socket.IO and ws
In Node.js, ws is a thin, fast library that gives you the raw WebSocket protocol and nothing more โ ideal when you want full control. Socket.IO sits on top and adds the features real apps need: automatic reconnection, heartbeats, rooms and namespaces for organizing connections, an event-based API, a Redis adapter for the multi-server fan-out described above, and a long-polling fallback for environments where WebSockets are blocked. Browsers also ship a native WebSocket object, so a simple client needs no library at all.
// Native browser client โ no library needed
const socket = new WebSocket('wss://app.example.com/chat');
socket.addEventListener('open', () => {
socket.send(JSON.stringify({ type: 'join', room: 'general' }));
});
socket.addEventListener('message', (event) => {
const msg = JSON.parse(event.data);
console.log('pushed from server:', msg);
});
socket.addEventListener('close', () => {
// connection dropped โ reconnect with backoff
setTimeout(connect, 1000);
});When to Reach for WebSockets
Use WebSockets when you need low-latency, two-way, frequent communication: chat and messaging, multiplayer game state, collaborative editors, live trading and price feeds, and interactive dashboards. If your updates flow only from server to client and are infrequent, SSE is simpler. If updates are rare, plain polling may be all you need. The cost of a WebSocket is operational complexity โ sticky routing, fan-out, and connection management โ so reach for it when the real-time payoff justifies that cost.
Frequently Asked Questions
Are WebSockets faster than HTTP?
For repeated, real-time messaging, yes. After the one-time handshake, each message is a tiny frame with almost no overhead, and there is no connection setup per message. For a single one-off request, plain HTTP is simpler and just as fast โ WebSockets win when many messages flow over the same connection.
What is the difference between WebSockets and SSE?
Server-Sent Events stream data in one direction only โ server to client โ over a held HTTP connection, and carry text only. WebSockets are full-duplex (both directions) and support binary data. Choose SSE for simple live feeds and notifications; choose WebSockets when the client also needs to send frequently, as in chat or games.
Why do WebSockets need sticky sessions?
Because the open connection's state lives inside one specific server process for the connection's whole lifetime. The load balancer must keep routing that client to the same server, or its frames would arrive at a server that knows nothing about it. To share messages across servers you add a Pub/Sub backplane for fan-out rather than trying to move the connection.
HTTP makes the client ask again and again. A WebSocket opens the pipe once and lets the server speak the moment it has something to say โ the hard part isn't the socket, it's fanning one message out to a million of them.
โ alokknight Engineering
