WebRTC in System Design: Signaling, NAT Traversal, STUN/TURN & SFU (Visualized)
WebRTC lets browsers exchange audio, video, and data directly, peer-to-peer, with no plugins. This guide covers signaling (SDP offer/answer), NAT traversal with STUN and TURN, ICE candidate gathering, data channels, DTLS/SRTP security, and multi-party topologies (mesh vs SFU vs MCU) โ with live animations of each step.
WebRTC (Web Real-Time Communication) is a browser and mobile technology that lets two endpoints exchange audio, video, and arbitrary data directly with each other โ peer-to-peer โ without plugins or a media server sitting in the middle. It is the engine behind in-browser video calls, screen sharing, live game streaming, and low-latency file transfer.
The promise is simple: two browsers send media straight to each other so latency stays low and your servers stay cheap. The reality is harder, because both peers usually sit behind firewalls and NATs that block unsolicited inbound traffic. WebRTC's design is mostly about solving that problem โ discovering a path between two machines that were never meant to talk directly.
The Three Pillars: Signaling, Connecting, Securing
Every WebRTC session goes through three stages. (1) Signaling: the peers exchange session metadata (codecs, network candidates) over a channel you provide. (2) Connecting: ICE uses STUN and TURN to find a working network path through NATs. (3) Securing: the media path is encrypted end-to-end with DTLS and SRTP before any audio or video flows. WebRTC standardizes pillars 2 and 3; pillar 1 is left entirely to you.
Signaling: The SDP Offer / Answer Handshake
Before two peers can connect, they must agree on what they are sending โ codecs, resolutions, encryption keys, and the network addresses to try. This negotiation is carried in SDP (Session Description Protocol) blobs. The caller creates an offer, the callee replies with an answer, and both apply them as their local and remote descriptions.
Crucially, WebRTC does not define how those blobs get from one peer to the other. You ship them over a channel you already control โ a WebSocket, HTTP, or even a QR code. This is the signaling server. It only brokers the introduction; once the peers are connected, media never touches it.
// Caller side: build an offer and ship it over your own signaling channel
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{ urls: 'turn:turn.example.com', username: 'u', credential: 'p' }
]
});
// Local media -> peer connection
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true });
stream.getTracks().forEach(t => pc.addTrack(t, stream));
// Trickle ICE candidates to the other peer as they are found
pc.onicecandidate = e => { if (e.candidate) signaling.send({ ice: e.candidate }); };
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signaling.send({ sdp: offer }); // <- your WebSocket, not WebRTCNAT Traversal: STUN, TURN, and ICE
Most devices live behind a NAT (Network Address Translation) that hides them behind a single public IP and rewrites ports. A peer does not even know its own public address, let alone how to reach another peer behind a different NAT. WebRTC solves this with two helper services and an algorithm that ties them together.
A STUN (Session Traversal Utilities for NAT) server answers one question: "what public IP and port does the world see me as?" The peer asks STUN, learns its public mapping, and shares it as a candidate. If both NATs cooperate, the peers can then send packets to each other's mapped addresses and "hole-punch" a direct path โ no relay needed.
When NATs are too strict for hole-punching (symmetric NATs, corporate firewalls), there is a fallback: a TURN (Traversal Using Relays around NAT) server. Both peers connect out to TURN, which relays all their media. It always works but costs bandwidth and adds a hop, so it is the last resort โ typically needed for 10โ20% of connections.
ICE (Interactive Connectivity Establishment) is the algorithm that orchestrates all of this. Each peer gathers candidates โ its local address, its STUN-discovered public address, and a TURN relay address โ exchanges them via signaling, then pairs them up and runs connectivity checks. ICE picks the best pair that actually works, preferring direct paths over relays.
STUN vs TURN at a Glance
| STUN | TURN | |
|---|---|---|
| Role | Discovers your public IP/port | Relays all media between peers |
| Media path | Direct peer-to-peer | Through the relay server |
| Bandwidth cost | Tiny (one query) | High (carries every byte) |
| Success rate | Most NATs | Always โ last-resort fallback |
The Peer Connection and Data Channels
The central object is the RTCPeerConnection. It manages ICE, encryption, and the media transports. On top of it you attach media tracks (camera, mic, screen) carried over SRTP, and data channels (RTCDataChannel) for arbitrary bytes. Data channels run over SCTP and can be configured as reliable/ordered (like TCP) or unreliable/unordered (like UDP) โ ideal for game state, file transfer, or chat.
Security: DTLS and SRTP
WebRTC encryption is mandatory โ there is no unencrypted mode. Once ICE finds a path, the peers run a DTLS (Datagram TLS) handshake to authenticate each other and derive keys. Media is then protected with SRTP (Secure RTP) and data channels ride over DTLS. Because the keys are negotiated directly between peers, even a TURN relay only ever sees ciphertext.
Multi-Party Topologies: Mesh vs SFU vs MCU
Pure peer-to-peer is perfect for one-to-one calls, but it does not scale to a group. With N participants in a full mesh, every peer sends its stream to every other peer โ that is Nร(Nโ1) connections and Nโ1 uploads per person. A laptop on home WiFi falls over around 4โ5 participants. Group calling needs a server in the media path.
An SFU (Selective Forwarding Unit) is the modern answer: each peer uploads its stream once to the SFU, which forwards copies to everyone else. It does not decode or mix โ it just routes โ so it stays cheap and keeps streams separate (good for adaptive quality). An MCU (Multipoint Control Unit) goes further: it decodes every stream, composites them into a single mixed stream, and sends one stream back to each peer โ easy on clients but CPU-heavy on the server.
| Topology | Server work | Client upload | Best for |
|---|---|---|---|
| Mesh (P2P) | None โ just signaling | N-1 streams (heavy) | 1:1 and tiny groups (<=4) |
| SFU | Forward streams (light) | 1 stream | Most group calls, adaptive quality |
| MCU | Decode + mix (heavy) | 1 stream | Weak clients, recording, legacy bridges |
Common Use Cases
WebRTC powers browser video calls and conferencing, screen sharing, live customer-support and telehealth, cloud gaming and remote desktop (low-latency video plus an input data channel), peer-to-peer file transfer, and IoT/camera streaming. Anywhere you need sub-second, encrypted media in a browser without a plugin, WebRTC is the standard.
Frequently Asked Questions
Does WebRTC need a server?
Media can flow directly between peers, but you always need a signaling server to exchange the initial SDP offer/answer and ICE candidates, and you almost always need STUN (cheap) plus a TURN relay (for the ~10โ20% of connections that cannot be made directly). Group calls additionally need an SFU or MCU in the media path.
What is the difference between STUN and TURN?
STUN only tells a peer its public IP/port so the two peers can connect directly; it carries no media. TURN actually relays the media when a direct path is impossible, so it consumes real bandwidth. ICE tries STUN-based direct paths first and only falls back to TURN as a last resort.
Is WebRTC encrypted by default?
Yes โ encryption is mandatory and cannot be turned off. Peers run a DTLS handshake to derive keys, then protect media with SRTP and data channels over DTLS. Even when traffic passes through a TURN relay, the relay only sees encrypted bytes, never the plaintext audio or video.
WebRTC's hard part was never sending the video โ it was finding a path between two machines the internet was designed to keep apart. Signaling introduces them, ICE finds the route, and DTLS keeps it private.
โ alokknight Engineering
