API Gateway in System Design: Routing, Auth, Rate Limiting & More (Visualized)

An API gateway is a server that acts as the single entry point for all client requests in a microservices system, centralising cross-cutting concerns — authentication, authorisation, rate limiting, TLS termination, request/response transformation, and response aggregation — so that individual backend services never have to implement them independently.

Without a gateway, every service in your fleet must solve the same hard problems: validating JWTs, enforcing quotas, stripping internal headers, and stitching together data from sibling services. As the number of services grows, that duplicated logic becomes a maintenance nightmare. The gateway eliminates that duplication by handling every shared concern once, in one place, before a request ever reaches a service.

How a Request Flows Through an API Gateway

Every inbound request travels through an ordered pipeline of middleware plugins before reaching its target service. A typical pipeline looks like this: (1) TLS termination — the gateway decrypts HTTPS so backends communicate over plain HTTP internally; (2) Authentication — the JWT or API key is validated and the caller identity attached to the request; (3) Rate limiting — a counter per caller/IP/plan is checked and, if exceeded, a 429 Too Many Requests is returned immediately; (4) Request transformation — headers are rewritten, payloads translated between formats, or query parameters normalised; (5) Routing — the gateway inspects the URL path or host header and forwards the request to the correct upstream microservice; (6) Response transformation — the upstream response is reshaped, compressed, or filtered before being returned to the client.

API Gateway request pipeline — Auth → Rate Limit → Route

Watch a request travel through each gateway stage. The active stage glows; the phase label at the bottom narrates what is happening. A rejected request (red) shows rate-limit enforcement.

API Gateway vs Load Balancer vs Backend-for-Frontend (BFF)

These three patterns are often confused because they all sit between clients and backends, yet they solve fundamentally different problems. A load balancer distributes copies of the same traffic across identical servers of a single service — it has no concept of routing by URL path or business logic. An API gateway is aware of your entire service graph: it routes /users/* to the User service, /orders/* to the Order service, and enforces cross-cutting policies on every route. A Backend-for-Frontend (BFF) goes one step further and creates a purpose-built adapter layer per client type (mobile BFF, web BFF) that shapes responses specifically for that consumer's needs, often living behind the gateway itself.

	Load Balancer	API Gateway	BFF
Primary job	Spread traffic across identical replicas	Single entry point for all microservices	Tailor API shape per client type
Routing logic	IP/port or round-robin	URL path, host header, method, headers	Client-specific aggregation & filtering
Auth / rate limit	No	Yes — centrally enforced	Sometimes (or delegates to gateway)
Protocol awareness	L4 or L7	L7 (HTTP/gRPC)	L7 — full application logic
Examples	Nginx, AWS NLB, HAProxy	Kong, AWS API GW, Apigee, Envoy	GraphQL BFF, Next.js API routes

Key Responsibilities of an API Gateway

1. Authentication and Authorisation

The gateway verifies every inbound token — JWT, OAuth 2 access token, or API key — once, before the request touches a service. Downstream services receive a pre-validated caller identity (often injected as a trusted header such as X-User-Id), which means they never need to import an auth library, manage JWKS endpoints, or handle token refresh logic. Centralising auth eliminates entire classes of security bugs that arise when each service reimplements token validation slightly differently.

2. Rate Limiting and Quotas

Rate limiting protects backend services from being overwhelmed by a single noisy client or a denial-of-service spike. The gateway maintains counters in a fast store (usually Redis) keyed on caller identity, IP address, or API key. When the counter exceeds the configured limit within the time window, the gateway immediately returns 429 Too Many Requests without ever touching the upstream service. Graduated response — warning headers at 80% usage, hard rejection at 100% — lets legitimate callers back off gracefully. Plan-based quotas let you offer free tiers with lower limits alongside paid tiers with higher ones.

3. TLS Termination

TLS termination at the gateway means the gateway handles the expensive TLS handshake with external clients, then forwards traffic over plain HTTP (or mTLS) on the private network. This removes CPU-heavy cryptographic work from every service, centralises certificate management in one place, and lets you rotate or upgrade TLS versions without touching any service code. Modern gateways support automatic certificate renewal via ACME/Let's Encrypt.

4. Request and Response Transformation

Gateways can rewrite request paths (/v1/users → /users), add or strip headers, translate between JSON and XML, inject correlation IDs, and decompress or recompress payloads. Response transformation lets you version your public API contract independently of your internal service API — you can evolve the service without breaking external clients by adapting the shape at the gateway layer.

5. Response Aggregation

A powerful gateway feature is response aggregation: the gateway fans out a single client request to multiple upstream services in parallel, waits for all responses, and stitches them into one composite response before replying to the client. This eliminates the need for clients to make many sequential round trips, dramatically reducing latency on mobile connections. Without aggregation, a product detail page might require separate calls to the User service, Inventory service, Pricing service, and Review service — four sequential round trips. With aggregation at the gateway, all four run in parallel and the client makes exactly one request.

Response aggregation — one request, multiple upstream calls, one response

The gateway fans out one client request to three services in parallel, then merges the responses into a single reply. Watch the merge counter fill up.

Cross-Cutting Concerns: Centralised vs Distributed

The central argument for an API gateway is the elimination of cross-cutting concern duplication. Without a gateway, every service in your fleet must independently implement authentication, logging, tracing, rate limiting, and CORS handling. With five services that is annoying; with fifty it is operationally catastrophic — a single security patch must be applied to fifty code bases, tested independently, and deployed in coordination. The gateway collapses all of that into one place: fix it once, ship it once, and every service benefits immediately.

Cross-cutting concerns: centralised at gateway vs duplicated in every service

Left: no gateway — each service carries its own auth, rate-limit, and logging stack (red overhead). Right: gateway centralises all three; services stay lean (green). Toggle fires every 4 s.

Popular API Gateway Implementations

The ecosystem has consolidated around a handful of mature options, each with different trade-offs in operational complexity, extensibility, and cost.

Kong is an open-source, Nginx/OpenResty-based gateway with a rich plugin ecosystem covering auth, rate limiting, transforms, and observability. It can be deployed on bare metal, Kubernetes (Kong Ingress Controller), or as a managed cloud service. Its declarative configuration and Admin API make it popular for platform teams that need fine-grained control.

AWS API Gateway is a fully managed service tightly integrated with the AWS ecosystem. It offers REST and HTTP API flavours with built-in Lambda integration, IAM and Cognito authorisers, usage plans, and CloudWatch metrics. The HTTP API flavour is lower-latency and cheaper for simple proxy use cases; the REST API flavour adds more transformation power. It scales automatically and charges per million requests, making cost unpredictable at high traffic volumes.

Apigee (Google Cloud) targets enterprises that need advanced API lifecycle management — developer portals, monetisation, fine-grained analytics, and policy-based governance across hybrid clouds. It is the most feature-rich option but also the most operationally complex and expensive.

Envoy is a high-performance L7 proxy written in C++ and used as the data-plane in service meshes (Istio, Consul Connect). Unlike Kong or AWS API Gateway, Envoy is a building block rather than a complete gateway product — you configure it via xDS APIs from a control plane. It is commonly used as the edge proxy in large Kubernetes clusters where its dynamic configuration and Wasm extension model provide maximum flexibility.

Gateway	Deployment	Best for	Extension model	Pricing model
Kong	Self-hosted or managed	Platform teams, K8s	Plugins (Lua/Go/Wasm)	Open-source + enterprise tiers
AWS API Gateway	Fully managed (AWS)	AWS-native serverless	Lambda authorisers	Per-request + data transfer
Apigee (Google)	Managed / hybrid	Enterprise, monetisation	Policies + JavaScript	Subscription (expensive)
Envoy	Self-hosted (K8s)	Service mesh edge proxy	Wasm + xDS control plane	Open-source (infra cost)
Traefik	Self-hosted (K8s / Docker)	Cloud-native auto-discovery	Middlewares	Open-source + enterprise

Gateway Configuration Example (Kong Declarative)

Below is a minimal Kong declarative configuration that exposes an /orders route, enforces JWT auth, and caps each consumer at 100 requests per minute.

_format_version: "3.0"

services:
  - name: order-service
    url: http://order-svc.internal:8080
    routes:
      - name: orders-route
        paths: ["/orders"]
        methods: ["GET", "POST", "PATCH"]
        strip_path: false
    plugins:
      # 1. Authenticate the caller via JWT
      - name: jwt
        config:
          claims_to_verify: ["exp"]
          key_claim_name: sub
      # 2. Rate-limit per consumer: 100 req/min
      - name: rate-limiting
        config:
          minute: 100
          policy: redis
          redis_host: redis.internal
          redis_port: 6379
      # 3. Inject correlation ID for tracing
      - name: correlation-id
        config:
          header_name: X-Correlation-Id
          generator: uuid#counter
          echo_downstream: true

Benefits and Risks

An API gateway delivers substantial benefits: a single place to enforce security policies, a stable public API surface that can evolve independently of internal services, consistent observability (every request logged and traced in one layer), and dramatically simplified service code because cross-cutting concerns are no longer each service's problem. API versioning becomes manageable — the gateway can route /v1/* to an old cluster and /v2/* to a new one with zero changes to downstream services.

However, the gateway also introduces risks that must be managed deliberately. Single point of failure: because all traffic passes through it, a gateway outage is a total system outage — it must be deployed in a highly-available, multi-instance configuration with health checks and automatic failover. Bottleneck: at very high throughput (millions of requests per second), a misconfigured or under-provisioned gateway becomes the performance ceiling of your entire architecture; horizontal scaling, connection pooling, and keep-alive tuning are essential. God-object creep: the biggest long-term risk is letting business logic accumulate in the gateway — routing rules that encode domain knowledge, data transformation that belongs in services, or orchestration that belongs in an application layer. When the gateway starts implementing use-case-specific logic it becomes the tightly-coupled monolith you were trying to escape. Keep the gateway in its lane: routing, auth, rate limiting, and protocol translation only.

Frequently Asked Questions

What is the difference between an API gateway and a load balancer?

A load balancer distributes copies of the same traffic across identical replicas of one service — it knows nothing about URL paths, auth tokens, or business logic. An API gateway understands your entire service graph: it routes different URL paths to different microservices, enforces authentication and rate limiting, transforms payloads, and can aggregate responses from multiple services. In practice most API gateways also perform load balancing across the replicas of each upstream service, so the gateway sits in front of the load balancer tier — or the gateway itself handles both concerns.

Can an API gateway become a performance bottleneck?

Yes, and this is one of the most common production problems with gateways. Because every request passes through a single tier, a gateway that is under-provisioned or mis-configured (e.g., using synchronous Redis calls for rate limiting without connection pooling) will cap your system throughput. The mitigation is to run multiple gateway instances behind a network load balancer, enable HTTP keep-alive and upstream connection pools, offload TLS to a dedicated terminator if possible, and use async-safe plugins. Benchmarks for Kong and Envoy show they can handle hundreds of thousands of requests per second per instance on modern hardware when configured correctly.

When should I use a BFF instead of a single API gateway?

Use a Backend-for-Frontend when different clients (mobile app, web SPA, third-party integrators) need significantly different response shapes, field sets, or aggregation strategies. A single generic gateway will either over-fetch (returning all fields even though mobile only needs three) or require complex per-route transformation logic that quickly becomes unmaintainable. The typical pattern is a generic API gateway handling auth and rate limiting at the edge, with thin BFFs behind it per client type that perform client-specific aggregation and shaping. The BFF then only needs to worry about client-specific business rules, not security or quotas.

The API gateway is the front door to your microservices — keep it focused on routing, auth, and policy. The moment it starts encoding business logic, you have traded one monolith for another.
— alokknight Engineering