GraphQL in System Design: Schema, Resolvers, N+1, and When to Use It (Visualized)
GraphQL is a query language for APIs that lets clients request exactly the fields they need β no more, no less β from a single endpoint. This guide covers typed schemas, queries, mutations, subscriptions, resolvers, the N+1 problem, DataLoader batching, caching trade-offs, and when GraphQL beats REST β with live animations of each concept.
GraphQL is a strongly-typed query language and runtime for APIs, developed by Facebook in 2012 and open-sourced in 2015, that replaces a constellation of REST endpoints with a single endpoint where the client declares exactly which fields it needs and the server returns precisely those fields β nothing more, nothing less. The contract between client and server is a schema: a type system written in the GraphQL Schema Definition Language (SDL) that describes every object, every field, and every relationship available in the API.
Where a REST API might expose dozens of endpoints (/users, /users/:id/posts, /users/:id/followers, β¦), a GraphQL API exposes one endpoint β typically POST /graphql β and clients compose queries that traverse the type graph to assemble exactly the response shape they need. This single-endpoint, client-driven model is the core mental shift GraphQL demands.
The Typed Schema
Every GraphQL API starts with a schema. The schema defines the shape of all data the API can serve, using object types, scalar types (String, Int, Boolean, ID, β¦), enums, interfaces, and unions. Three special root types β Query, Mutation, and Subscription β declare the operations clients can perform. A schema is introspectable at runtime: clients can query __schema to discover all available types and fields, which powers IDE auto-completion and API documentation tools like GraphiQL and Apollo Studio automatically.
# Schema Definition Language (SDL)
type User {
id: ID!
name: String!
email: String!
posts: [Post!]!
}
type Post {
id: ID!
title: String!
body: String!
author: User!
tags: [String!]!
}
type Query {
user(id: ID!): User
posts(limit: Int = 10): [Post!]!
}
type Mutation {
createPost(title: String!, body: String!, authorId: ID!): Post!
}
type Subscription {
postCreated: Post!
}With this schema in place, a client can write a query that requests only the fields it cares about. Here a mobile feed screen asks for a user's name and the titles and tags of their posts β skipping the email and full body text it does not need:
# Client query β requests exactly three field paths
query GetUserFeed($userId: ID!) {
user(id: $userId) {
name
posts {
title
tags
}
}
}
# Response contains ONLY name, posts[].title, posts[].tags
# Fields like email, body, id are NOT fetched from the DBNo Over-Fetching, No Under-Fetching β Visualized
Over-fetching happens when a REST endpoint returns more data than the client needs β bandwidth wasted on fields that will be discarded. Under-fetching is the opposite: a single endpoint does not include enough data, so the client fires multiple sequential requests (the waterfall problem). GraphQL eliminates both by letting the client declare the exact shape of its response in one query sent to one endpoint.
Queries, Mutations, and Subscriptions
GraphQL defines three root operation types. A query is a read-only fetch β the most common operation, analogous to HTTP GET. A mutation is a write that changes server state (create, update, delete), analogous to POST/PUT/DELETE; it can also return data so the client receives the newly created or updated object in one round-trip. A subscription maintains a long-lived connection (typically over WebSockets) and pushes new data to the client whenever a server-side event occurs β think real-time notifications, live comment feeds, or stock tickers. Apollo Client and Relay are the dominant client-side libraries that manage query caching, subscriptions, and optimistic updates in the browser or React Native.
Resolvers: How the Schema Comes to Life
Each field in a GraphQL schema is backed by a resolver β a function that knows how to fetch or compute the value for that field. The GraphQL execution engine walks the query's selection set top-down, calling the resolver for each field. The root resolver for Query.user fetches a user from the database; the resolver for User.posts then fetches that user's posts; the resolver for Post.tags fetches tags for each post. This tree-walking is elegant and composable β but it conceals a notorious performance trap.
The N+1 Problem and DataLoader
The N+1 problem is the most common GraphQL performance pitfall. Suppose a query asks for 5 posts and the author of each post. The root resolver fetches 5 posts in one query (1 database call). Then the Post.author resolver runs once per post, issuing 5 separate database lookups β 6 queries total instead of 2. At scale: 100 posts β 101 database calls, 1 000 posts β 1 001 calls.
The standard solution is DataLoader, a batching and caching utility (originally by Facebook, now available for every language). DataLoader collects all individual keys requested within a single event-loop tick and issues one batched query β SELECT * FROM users WHERE id IN (1, 2, 3, β¦). The result is also cached for the request lifetime, so the same user fetched by two different resolvers triggers only one database call. N+1 collapses into 2 queries regardless of list size.
GraphQL vs REST
REST and GraphQL are not universally better or worse than each other β they make different trade-offs. REST is simpler to cache at the HTTP layer (GET responses map directly to cache keys), easier to expose publicly with standard HTTP semantics, and carries less client-side complexity. GraphQL shines when multiple clients (web, iOS, Android) need different data shapes from the same backend, when you want to reduce round-trips in high-latency mobile environments, or when rapid product iteration requires the frontend to evolve without backend changes.
| Dimension | REST | GraphQL |
|---|---|---|
| Endpoints | Many (one per resource) | Single (/graphql) |
| Response shape | Fixed by server | Declared by client |
| Over/under-fetching | Common | Eliminated |
| HTTP caching | Simple β GET maps to cache key | Hard β POST body varies per query |
| Schema and types | Optional (OpenAPI) | Built-in, introspectable at runtime |
| Real-time push | Polling or WebSocket add-on | First-class Subscriptions |
| Error handling | HTTP status codes (404, 500, β¦) | Always HTTP 200; errors array in body |
| Learning curve | Low | Moderate |
| Tooling | curl, Postman, browser DevTools | GraphiQL, Apollo Studio, Relay DevTools |
| Best for | Public APIs, simple CRUD, edge caching | Complex clients, mobile, BFF pattern |
Caching Difficulty
HTTP caching is the biggest operational downside of GraphQL. Because every request is a POST to a single endpoint with the query in the body, CDNs and proxies cannot cache responses by URL alone. Workarounds exist: Persisted Queries (the client sends a hash of the query string, which the server resolves to the full query before executing β turning the effective call into a GET with a stable cache key), Automatic Persisted Queries (APQ) in Apollo, and field-level caching inside resolvers using Apollo Server's @cacheControl directive backed by Redis. None is as frictionless as REST's native HTTP caching, so teams must plan for this cost explicitly.
When NOT to Use GraphQL
GraphQL adds complexity that is not always worth it. Avoid it when: (1) you have a simple CRUD API with few resource types and a single client β REST is faster to build and easier to cache; (2) you need aggressive HTTP edge caching without investing in persisted queries; (3) your team is small and the operational overhead of schema management, DataLoader patterns, and subscription infrastructure is prohibitive; (4) you are building a public API consumed by third parties who expect standard REST semantics and HTTP status codes. The Backend-for-Frontend (BFF) pattern is where GraphQL most consistently pays off: a GraphQL gateway sits in front of multiple microservices and lets each client compose exactly the query it needs β and with Apollo Federation, each microservice can own its own subgraph of the shared schema.
Frequently Asked Questions
Is GraphQL always faster than REST?
Not necessarily. GraphQL reduces network round-trips and payload size on the wire, which helps mobile clients on high-latency connections. However, naΓ―ve resolver implementations trigger the N+1 problem and can make server-side database load far heavier than an equivalent REST endpoint with a hand-tuned SQL JOIN. With DataLoader, proper indexing, and query depth limiting, GraphQL can match or beat REST in throughput β but it requires more deliberate performance engineering. REST GET responses also benefit from transparent HTTP caching that GraphQL must replicate at the application layer.
What are Apollo and Relay, and do I need them?
Apollo is a full ecosystem: Apollo Client (React hooks for queries, mutations, subscriptions, a normalized in-memory cache) and Apollo Server (a Node.js GraphQL server with plugin support, APQ, and caching directives). It is the most widely adopted choice and supports Apollo Federation for splitting a schema across microservices. Relay is Facebook's own client library, highly opinionated about pagination (the Connections spec) and co-locating fragments with components; it provides compiler-enforced guarantees and a very predictable normalized cache, but carries a steeper learning curve. For most teams Apollo Client is the pragmatic starting point; Relay is worth evaluating for large React applications with complex data requirements and teams willing to follow its strict conventions.
How do you secure a GraphQL API?
GraphQL's flexibility introduces security risks REST does not face by default. Apply depth limiting to prevent deeply nested queries from triggering recursive resolver chains that exhaust memory. Use query complexity analysis to assign a cost to each field and reject queries exceeding a budget. Disable introspection in production to prevent schema harvesting by attackers. Deploy persisted queries so only pre-approved query strings can execute. Enforce authentication (JWT or session) in the context object and apply authorization in every resolver or via a directive layer such as GraphQL Shield. Rate-limit at the operation level β not just by IP β and log all queries with their complexity scores for audit purposes.
GraphQL doesn't make your API simpler β it moves the complexity from the server to the contract. Design the schema as carefully as you would a database schema, because every client will depend on it for years.
β alokknight Engineering
