API Gateway in System Design: Routing, Auth, Rate Limiting & More (Visualized)
An API gateway is the single front door to your microservices architecture โ handling routing, authentication, rate limiting, TLS termination, and response aggregation in one place so every downstream service stays lean. This guide covers how it works, when to use it, and how it differs from a load balancer or BFF.
An API gateway is a server that acts as the single entry point for all client requests in a microservices system, centralising cross-cutting concerns โ authentication, authorisation, rate limiting, TLS termination, request/response transformation, and response aggregation โ so that individual backend services never have to implement them independently.
Without a gateway, every service in your fleet must solve the same hard problems: validating JWTs, enforcing quotas, stripping internal headers, and stitching together data from sibling services. As the number of services grows, that duplicated logic becomes a maintenance nightmare. The gateway eliminates that duplication by handling every shared concern once, in one place, before a request ever reaches a service.
How a Request Flows Through an API Gateway
Every inbound request travels through an ordered pipeline of middleware plugins before reaching its target service. A typical pipeline looks like this: (1) TLS termination โ the gateway decrypts HTTPS so backends communicate over plain HTTP internally; (2) Authentication โ the JWT or API key is validated and the caller identity attached to the request; (3) Rate limiting โ a counter per caller/IP/plan is checked and, if exceeded, a 429 Too Many Requests is returned immediately; (4) Request transformation โ headers are rewritten, payloads translated between formats, or query parameters normalised; (5) Routing โ the gateway inspects the URL path or host header and forwards the request to the correct upstream microservice; (6) Response transformation โ the upstream response is reshaped, compressed, or filtered before being returned to the client.
API Gateway vs Load Balancer vs Backend-for-Frontend (BFF)
These three patterns are often confused because they all sit between clients and backends, yet they solve fundamentally different problems. A load balancer distributes copies of the same traffic across identical servers of a single service โ it has no concept of routing by URL path or business logic. An API gateway is aware of your entire service graph: it routes /users/* to the User service, /orders/* to the Order service, and enforces cross-cutting policies on every route. A Backend-for-Frontend (BFF) goes one step further and creates a purpose-built adapter layer per client type (mobile BFF, web BFF) that shapes responses specifically for that consumer's needs, often living behind the gateway itself.
| Load Balancer | API Gateway | BFF | |
|---|---|---|---|
| Primary job | Spread traffic across identical replicas | Single entry point for all microservices | Tailor API shape per client type |
| Routing logic | IP/port or round-robin | URL path, host header, method, headers | Client-specific aggregation & filtering |
| Auth / rate limit | No | Yes โ centrally enforced | Sometimes (or delegates to gateway) |
| Protocol awareness | L4 or L7 | L7 (HTTP/gRPC) | L7 โ full application logic |
| Examples | Nginx, AWS NLB, HAProxy | Kong, AWS API GW, Apigee, Envoy | GraphQL BFF, Next.js API routes |
Key Responsibilities of an API Gateway
1. Authentication and Authorisation
The gateway verifies every inbound token โ JWT, OAuth 2 access token, or API key โ once, before the request touches a service. Downstream services receive a pre-validated caller identity (often injected as a trusted header such as X-User-Id), which means they never need to import an auth library, manage JWKS endpoints, or handle token refresh logic. Centralising auth eliminates entire classes of security bugs that arise when each service reimplements token validation slightly differently.
2. Rate Limiting and Quotas
Rate limiting protects backend services from being overwhelmed by a single noisy client or a denial-of-service spike. The gateway maintains counters in a fast store (usually Redis) keyed on caller identity, IP address, or API key. When the counter exceeds the configured limit within the time window, the gateway immediately returns 429 Too Many Requests without ever touching the upstream service. Graduated response โ warning headers at 80% usage, hard rejection at 100% โ lets legitimate callers back off gracefully. Plan-based quotas let you offer free tiers with lower limits alongside paid tiers with higher ones.
3. TLS Termination
TLS termination at the gateway means the gateway handles the expensive TLS handshake with external clients, then forwards traffic over plain HTTP (or mTLS) on the private network. This removes CPU-heavy cryptographic work from every service, centralises certificate management in one place, and lets you rotate or upgrade TLS versions without touching any service code. Modern gateways support automatic certificate renewal via ACME/Let's Encrypt.
4. Request and Response Transformation
Gateways can rewrite request paths (/v1/users โ /users), add or strip headers, translate between JSON and XML, inject correlation IDs, and decompress or recompress payloads. Response transformation lets you version your public API contract independently of your internal service API โ you can evolve the service without breaking external clients by adapting the shape at the gateway layer.
5. Response Aggregation
A powerful gateway feature is response aggregation: the gateway fans out a single client request to multiple upstream services in parallel, waits for all responses, and stitches them into one composite response before replying to the client. This eliminates the need for clients to make many sequential round trips, dramatically reducing latency on mobile connections. Without aggregation, a product detail page might require separate calls to the User service, Inventory service, Pricing service, and Review service โ four sequential round trips. With aggregation at the gateway, all four run in parallel and the client makes exactly one request.
Cross-Cutting Concerns: Centralised vs Distributed
The central argument for an API gateway is the elimination of cross-cutting concern duplication. Without a gateway, every service in your fleet must independently implement authentication, logging, tracing, rate limiting, and CORS handling. With five services that is annoying; with fifty it is operationally catastrophic โ a single security patch must be applied to fifty code bases, tested independently, and deployed in coordination. The gateway collapses all of that into one place: fix it once, ship it once, and every service benefits immediately.
Popular API Gateway Implementations
The ecosystem has consolidated around a handful of mature options, each with different trade-offs in operational complexity, extensibility, and cost.
Kong is an open-source, Nginx/OpenResty-based gateway with a rich plugin ecosystem covering auth, rate limiting, transforms, and observability. It can be deployed on bare metal, Kubernetes (Kong Ingress Controller), or as a managed cloud service. Its declarative configuration and Admin API make it popular for platform teams that need fine-grained control.
AWS API Gateway is a fully managed service tightly integrated with the AWS ecosystem. It offers REST and HTTP API flavours with built-in Lambda integration, IAM and Cognito authorisers, usage plans, and CloudWatch metrics. The HTTP API flavour is lower-latency and cheaper for simple proxy use cases; the REST API flavour adds more transformation power. It scales automatically and charges per million requests, making cost unpredictable at high traffic volumes.
Apigee (Google Cloud) targets enterprises that need advanced API lifecycle management โ developer portals, monetisation, fine-grained analytics, and policy-based governance across hybrid clouds. It is the most feature-rich option but also the most operationally complex and expensive.
Envoy is a high-performance L7 proxy written in C++ and used as the data-plane in service meshes (Istio, Consul Connect). Unlike Kong or AWS API Gateway, Envoy is a building block rather than a complete gateway product โ you configure it via xDS APIs from a control plane. It is commonly used as the edge proxy in large Kubernetes clusters where its dynamic configuration and Wasm extension model provide maximum flexibility.
| Gateway | Deployment | Best for | Extension model | Pricing model |
|---|---|---|---|---|
| Kong | Self-hosted or managed | Platform teams, K8s | Plugins (Lua/Go/Wasm) | Open-source + enterprise tiers |
| AWS API Gateway | Fully managed (AWS) | AWS-native serverless | Lambda authorisers | Per-request + data transfer |
| Apigee (Google) | Managed / hybrid | Enterprise, monetisation | Policies + JavaScript | Subscription (expensive) |
| Envoy | Self-hosted (K8s) | Service mesh edge proxy | Wasm + xDS control plane | Open-source (infra cost) |
| Traefik | Self-hosted (K8s / Docker) | Cloud-native auto-discovery | Middlewares | Open-source + enterprise |
Gateway Configuration Example (Kong Declarative)
Below is a minimal Kong declarative configuration that exposes an /orders route, enforces JWT auth, and caps each consumer at 100 requests per minute.
_format_version: "3.0"
services:
- name: order-service
url: http://order-svc.internal:8080
routes:
- name: orders-route
paths: ["/orders"]
methods: ["GET", "POST", "PATCH"]
strip_path: false
plugins:
# 1. Authenticate the caller via JWT
- name: jwt
config:
claims_to_verify: ["exp"]
key_claim_name: sub
# 2. Rate-limit per consumer: 100 req/min
- name: rate-limiting
config:
minute: 100
policy: redis
redis_host: redis.internal
redis_port: 6379
# 3. Inject correlation ID for tracing
- name: correlation-id
config:
header_name: X-Correlation-Id
generator: uuid#counter
echo_downstream: trueBenefits and Risks
An API gateway delivers substantial benefits: a single place to enforce security policies, a stable public API surface that can evolve independently of internal services, consistent observability (every request logged and traced in one layer), and dramatically simplified service code because cross-cutting concerns are no longer each service's problem. API versioning becomes manageable โ the gateway can route /v1/* to an old cluster and /v2/* to a new one with zero changes to downstream services.
However, the gateway also introduces risks that must be managed deliberately. Single point of failure: because all traffic passes through it, a gateway outage is a total system outage โ it must be deployed in a highly-available, multi-instance configuration with health checks and automatic failover. Bottleneck: at very high throughput (millions of requests per second), a misconfigured or under-provisioned gateway becomes the performance ceiling of your entire architecture; horizontal scaling, connection pooling, and keep-alive tuning are essential. God-object creep: the biggest long-term risk is letting business logic accumulate in the gateway โ routing rules that encode domain knowledge, data transformation that belongs in services, or orchestration that belongs in an application layer. When the gateway starts implementing use-case-specific logic it becomes the tightly-coupled monolith you were trying to escape. Keep the gateway in its lane: routing, auth, rate limiting, and protocol translation only.
Frequently Asked Questions
What is the difference between an API gateway and a load balancer?
A load balancer distributes copies of the same traffic across identical replicas of one service โ it knows nothing about URL paths, auth tokens, or business logic. An API gateway understands your entire service graph: it routes different URL paths to different microservices, enforces authentication and rate limiting, transforms payloads, and can aggregate responses from multiple services. In practice most API gateways also perform load balancing across the replicas of each upstream service, so the gateway sits in front of the load balancer tier โ or the gateway itself handles both concerns.
Can an API gateway become a performance bottleneck?
Yes, and this is one of the most common production problems with gateways. Because every request passes through a single tier, a gateway that is under-provisioned or mis-configured (e.g., using synchronous Redis calls for rate limiting without connection pooling) will cap your system throughput. The mitigation is to run multiple gateway instances behind a network load balancer, enable HTTP keep-alive and upstream connection pools, offload TLS to a dedicated terminator if possible, and use async-safe plugins. Benchmarks for Kong and Envoy show they can handle hundreds of thousands of requests per second per instance on modern hardware when configured correctly.
When should I use a BFF instead of a single API gateway?
Use a Backend-for-Frontend when different clients (mobile app, web SPA, third-party integrators) need significantly different response shapes, field sets, or aggregation strategies. A single generic gateway will either over-fetch (returning all fields even though mobile only needs three) or require complex per-route transformation logic that quickly becomes unmaintainable. The typical pattern is a generic API gateway handling auth and rate limiting at the edge, with thin BFFs behind it per client type that perform client-specific aggregation and shaping. The BFF then only needs to worry about client-specific business rules, not security or quotas.
The API gateway is the front door to your microservices โ keep it focused on routing, auth, and policy. The moment it starts encoding business logic, you have traded one monolith for another.
โ alokknight Engineering
