Database Indexing in System Design: B-Trees, Hash, LSM & Composite Indexes (Visualized)

A database index is an auxiliary data structure that stores a sorted (or hashed) copy of one or more columns plus pointers back to the table rows, so the engine can locate matching rows without scanning every record. The classic analogy is the index at the back of a book: instead of reading all 800 pages to find every mention of "replication", you flip to the index, jump to the term, and follow the page numbers straight to the answer.

Without an index, a query like SELECT * FROM users WHERE email = '[email protected]' forces a full table scan — the engine reads and tests every row, which is O(n) and gets linearly slower as the table grows. With a B-tree index on email, the same lookup becomes O(log n): a handful of pointer hops down a tree, regardless of whether the table has a thousand rows or a billion.

Why Indexes Turn O(n) Scans into O(log n) Lookups

A full table scan must touch every row because the data is unordered with respect to your search key — the engine has no way to know a match isn't lurking in the very last block. An index imposes order. Because the keys are kept sorted, the engine can binary-search them: each comparison halves the remaining search space, so finding one key among a billion takes roughly 30 comparisons instead of a billion reads. In a balanced tree of fan-out f, the depth is log_f(n), which is why even huge tables resolve a point lookup in three or four page reads.

Full table scan vs index lookup

Top: a full table scan checks every row one by one (O(n)). Bottom: a B-tree index descends to the matching row in a few hops (O(log n)). Both hunt the same target value.

B-Tree, Hash and LSM Indexes

Most relational databases default to a B-tree (really a B+tree) index. Keys are kept sorted in wide, shallow nodes; internal nodes hold separator keys and child pointers, while leaf nodes hold the keys plus row pointers and are linked together. This layout is the workhorse of indexing because it supports both equality lookups (=) and range queries (<, BETWEEN, ORDER BY) by walking the linked leaves.

A hash index stores keys in a hash table, giving O(1) average equality lookups — but it cannot do ranges or sorted scans, because hashing destroys order. PostgreSQL offers explicit USING hash indexes; MySQL's Memory engine and InnoDB's adaptive hash index use them internally. An LSM-tree (log-structured merge tree) buffers writes in memory and flushes sorted runs to disk that are later compacted. LSM trades read amplification for blazing write throughput, which is why it backs write-heavy stores like Cassandra, RocksDB and ScyllaDB.

Descending a B-tree to a key

A search key descends the B-tree: at each node the engine picks the child whose key range contains the target, reaching the leaf row pointer in just a few hops.

Clustered vs Non-Clustered Indexes

A clustered index determines the physical order of the rows on disk — the table is the index, with the actual row data living in the leaf nodes. There can be only one per table. In MySQL InnoDB, the primary key is always the clustered index, so primary-key lookups and primary-key range scans are exceptionally fast.

A non-clustered (secondary) index is a separate structure whose leaves hold the key plus a pointer back to the row (in InnoDB, the pointer is the primary-key value). Looking up a row through a secondary index therefore costs an extra step: find the key in the secondary index, then follow the pointer into the clustered index — a "bookmark lookup". PostgreSQL takes a different route: all indexes are secondary and reference a physical row location (a TID) into a separate heap, so even the primary key is a non-clustered index over heap-stored rows.

Composite Indexes & the Leftmost-Prefix Rule

A composite index covers multiple columns, e.g. (last_name, first_name, birth_date). The keys are sorted by the first column, then by the second within equal firsts, and so on — like a phone book ordered by surname, then given name. This ordering creates the leftmost-prefix rule: the index can satisfy a query only if the query constrains a contiguous prefix of the index columns starting from the left.

So an index on (a, b, c) helps queries filtering on a, on a AND b, or on a AND b AND c. It does not help a query filtering only on b, or only on c, because the engine cannot binary-search a column whose preceding sort keys are unknown — just as you can't find everyone named "James" in a phone book sorted by surname first. Order your composite columns by selectivity and query shape, putting equality-filtered columns before range-filtered ones.

Composite index & the leftmost-prefix rule

An index on (a, b, c). Queries cycle through; a query can use the index only when it constrains a contiguous prefix from the left — the usable part lights green, the unusable suffix stays grey.

-- Composite index: keys sorted by a, then b, then c
CREATE INDEX idx_abc ON events (a, b, c);

-- USES the index (leftmost prefix satisfied)
SELECT * FROM events WHERE a = 1;
SELECT * FROM events WHERE a = 1 AND b = 2;
SELECT * FROM events WHERE a = 1 AND b = 2 AND c = 3;

-- CANNOT use idx_abc efficiently (prefix 'a' is missing)
SELECT * FROM events WHERE b = 2;
SELECT * FROM events WHERE c = 3;

-- Range on the last used column is fine; columns after it can't be seeked
SELECT * FROM events WHERE a = 1 AND b > 2 AND c = 3;  -- c not used for seeking

Covering Indexes

A covering index contains every column a query needs, so the engine answers entirely from the index and never touches the table heap — an "index-only scan". If a query selects order_id, total filtered by customer_id, an index on (customer_id, order_id, total) covers it. PostgreSQL formalizes this with INCLUDE columns, which ride along in the leaf without becoming part of the sort key:

-- PostgreSQL covering index: key on customer_id, payload columns INCLUDEd
CREATE INDEX idx_cust_cover
  ON orders (customer_id)
  INCLUDE (order_id, total);

-- Answered from the index alone (index-only scan), no heap fetch:
SELECT order_id, total FROM orders WHERE customer_id = 42;

The Write & Space Cost of Indexes

Indexes are not free. Every index is a second copy of its columns, so it consumes disk and memory. More importantly, every INSERT, UPDATE, or DELETE must update every affected index, adding write amplification and lock/latch contention. A table with eight indexes can spend most of an insert's cost maintaining indexes rather than writing the row. The rule of thumb: index to make reads fast, but each index you add taxes every write. Drop indexes that the query planner never chooses.

Partial Indexes & When NOT to Index

A partial index covers only the rows matching a predicate, e.g. CREATE INDEX ... WHERE status = 'active'. If 99% of orders are archived but you only ever query active ones, a partial index is tiny, cheap to maintain, and highly effective. It is the surgical alternative to indexing the whole column.

Skip an index when it won't pay off: on low-cardinality columns (a boolean or a status with three values) the planner often prefers a scan because the index points at huge fractions of the table; on small tables a full scan is already a couple of pages; and on write-heavy tables with rare reads, the maintenance cost outweighs the lookup savings. Indexing is a trade-off, not a default — measure with EXPLAIN ANALYZE before adding one.

Choosing an Index Type

Index type	Structure	Strengths	Best for
B-tree (B+tree)	Sorted balanced tree	Equality + range + sorting, O(log n)	General purpose; the default in PostgreSQL & MySQL
Hash	Hash table	O(1) equality, no ranges	Exact-match lookups on a single key
GIN / inverted	Term → row-list postings	Multi-value membership search	Full-text search, JSONB, arrays, tags
LSM-tree	In-memory buffer + sorted runs	Very high write throughput	Write-heavy stores (Cassandra, RocksDB)

Frequently Asked Questions

What is the difference between a clustered and a non-clustered index?

A clustered index defines the physical order of the rows on disk — the table data lives in the index leaves, so there can be only one. In MySQL InnoDB the primary key is the clustered index. A non-clustered (secondary) index is a separate structure that stores the key plus a pointer to the row, so a lookup through it costs an extra hop to fetch the full row. PostgreSQL stores all rows in a heap and treats every index, including the primary key, as non-clustered.

Why isn't my composite index being used?

Almost always the leftmost-prefix rule. An index on (a, b, c) can only be seeked when your query constrains a contiguous prefix starting at a. A query filtering only on b or c cannot use it, because the keys are sorted by a first. Reorder the index columns, or add an index whose leading column matches your filter. Use EXPLAIN to confirm which index the planner picks.

Do indexes slow down writes?

Yes. Every insert, update, or delete must also update each affected index, so more indexes mean slower writes, more disk and memory use, and more contention. Indexes speed up reads at the expense of writes and space, so add them deliberately, prefer partial or covering indexes where they fit, and drop indexes the planner never uses.

An index is a deal: you pay on every write and in disk to make the right reads fast. Index for the queries you actually run, measure with EXPLAIN, and drop the ones the planner ignores.
— alokknight Engineering