Consistency Models in System Design: Strong, Eventual, Causal & More (Visualized)
Consistency models define the rules a distributed system promises about the order and visibility of reads and writes. Choosing the right model is the central trade-off in distributed systems: stronger consistency is easier to reason about but costs latency and availability; weaker consistency is faster but shifts complexity to the application.
A consistency model is a contract between a distributed system and its clients that specifies exactly which values a read is allowed to return given the history of preceding writes. It is not about preventing bugs β it is about specifying which behaviours are legal, so engineers can reason correctly about their applications without reading the source code of every replica.
In a single-machine database a write is instantly visible to the next read. In a distributed system, data lives on multiple replicas spread across data centres. A write on Replica A must travel over the network to Replica B β and until it arrives, Replica B serves stale data. The consistency model defines how long that staleness is allowed to persist, and what ordering guarantees the system makes across concurrent operations.
The Consistency Spectrum
Consistency models form a hierarchy from strongest to weakest. Stronger models are easier to program against β they make the distributed system look like a single machine β but they require coordination that adds latency and limits availability. Weaker models are faster and more available, but they expose anomalies that the application must handle explicitly.
The five most important points on the spectrum, from strongest to weakest, are: linearizability (strict real-time ordering), sequential consistency (global order respected, real-time not required), causal consistency (causally related ops ordered, concurrent ops may diverge), client-centric guarantees (per-session promises like read-your-writes), and eventual consistency (replicas converge eventually, no ordering promise). Each step down trades safety for performance.
Linearizability: The Strongest Guarantee
Linearizability (also called strict consistency or atomic consistency) is the gold standard of distributed consistency. It requires that every operation appears to take effect instantaneously at some single point in real time between its invocation and its response. Once a write completes, every subsequent read β from any client, on any replica β must see that write or a later one. The system appears as a single, lock-step machine.
Linearizability requires tight coordination. To confirm a write is durably ordered, a leader must get acknowledgement from a quorum of replicas before responding to the client (as in Raft or Paxos). Reads must also go through the leader or include a quorum read to avoid serving stale data. The result is elevated latency β typically proportional to the round-trip time across your replica set β and reduced availability: if the quorum cannot be formed (during a network partition), the system blocks rather than returning a stale answer.
Sequential Consistency
Sequential consistency relaxes real-time ordering while keeping a global total order. All operations appear to execute in some sequential order that is consistent with the order seen by each individual process β but that order does not have to match wall-clock time. This means two clients may observe writes in different orders, as long as each client sees a consistent sequence and their own writes appear in program order.
Sequential consistency is the model guaranteed by many multi-processor CPUs (without memory fences) and by systems like Zookeeper in certain configurations. It is weaker than linearizability (so you cannot safely use it to implement distributed locks without extra care), but strong enough for many coordination scenarios. It still requires global coordination, so latency is not dramatically better than linearizability in practice.
Causal Consistency
Causal consistency tracks cause-and-effect relationships between operations using mechanisms like vector clocks or Lamport timestamps. If operation A causally precedes operation B (because B read or depended on A's result), then every replica must apply A before B. Concurrent operations β those with no causal relationship β can appear in any order on different replicas. Causal consistency is the strongest model that can be implemented without coordination during writes, making it available during network partitions.
Real-world systems using causal consistency include MongoDB's causally consistent sessions (introduced in version 3.6) and Amazon's Dynamo-style systems augmented with causal metadata. The challenge is tracking causal dependencies: each write carries a version vector, and replicas must defer serving a read until all causally prior writes are applied locally.
Client-Centric Guarantees
Rather than making global ordering promises, client-centric consistency models make per-session promises that are sufficient for most user-facing applications. There are four classic guarantees, originally defined by Vogels (1999):
Read-your-writes (RYW): After a client writes a value, all subsequent reads by that same client will see that value or a later one. Without this guarantee, a user posts a comment and then immediately refreshes to find it missing β a jarring experience. Monotonic reads: Once a client has read a value at version V, it will never see an older version. Without this, a client might see a newer post, then switch replicas and see an older state β time appears to run backwards. Monotonic writes: Writes from a single client are applied in order across all replicas. Writes-follow-reads: If a client reads a value and then writes, the write is ordered after the read's version on all replicas β preventing a situation where a reply to a post appears before the post itself.
Eventual Consistency
Eventual consistency is the weakest useful model: if no new writes occur, all replicas will eventually converge to the same value β but there is no bound on how long convergence takes, and reads may return any previously written value in the interim. It is the model used by DNS (updates propagate over minutes to hours), many NoSQL stores (Apache Cassandra in its default configuration, Amazon DynamoDB in eventually consistent read mode), and social-media feeds.
Eventual consistency maximises write availability and minimises latency: a write succeeds on the local replica immediately and propagates asynchronously. The application must tolerate reading stale values. Common techniques for handling staleness include conflict-free replicated data types (CRDTs), which merge concurrent writes automatically, and last-write-wins (LWW), which uses timestamps to pick a winner β at the risk of discarding concurrent updates.
The Latency and Availability Cost of Strong Consistency
Every step toward stronger consistency requires more coordination. A linearizable write must wait for acknowledgements from a quorum before responding β in a geo-distributed cluster, this means the write latency is bounded below by the round-trip time between data centres. A cross-region quorum write (e.g., US-East to EU-West) easily adds 80β150 ms per write, compared to under 5 ms for a locally acknowledged eventual write.
Availability is also impacted. Consensus protocols like Raft require a majority (quorum) of replicas to be reachable. If more than half your replicas are partitioned away or fail simultaneously, a linearizable system will refuse writes β returning an error rather than allowing a potentially inconsistent update. This is the unavailability side of the CAP theorem.
Relationship to CAP and PACELC
The CAP theorem (Brewer, 2000) states that during a network partition, a distributed system must choose between consistency (every read returns the latest write) and availability (every request receives a response). Linearizability is the precise formalization of the C in CAP. Systems that prioritize C during a partition (CP systems) include HBase, Zookeeper, and Etcd. Systems that prioritize A (AP systems) include Cassandra and CouchDB.
CAP is often misread as a binary choice. PACELC (Daniel Abadi, 2012) extends it by noting that even in the absence of partitions, there is a trade-off between latency (L) and consistency (C): to get stronger consistency, you must do more coordination, which takes more time. PACELC makes the latency cost explicit and is more useful for day-to-day system design decisions where partitions are rare but latency sensitivity is constant.
Consistency Model Comparison
| Model | Ordering Guarantee | Partition Behaviour | Typical Latency | Example Systems | |
|---|---|---|---|---|---|
| Linearizability | Strict real-time global order | Returns error (blocks) | High (quorum RTT) | Etcd, Zookeeper, Google Spanner, CockroachDB | |
| Sequential Consistency | Global order, not real-time | Blocks or errors | High | Zookeeper (reads), many CPUs | |
| Causal Consistency | Causal order preserved | Available | Medium | MongoDB causal sessions, COPS | |
| Read-Your-Writes | Session: own writes visible | Available (per session) | LowβMedium | DynamoDB strong reads, MongoDB sessions | |
| Monotonic Reads | Session: no time travel | Available (per session) | LowβMedium | Sticky-session systems, Redis Sentinel | |
| Eventual Consistency | Converges eventually | Fully available | Very Low | Cassandra, DynamoDB (eventual), DNS |
Choosing the Right Model in Practice
The right consistency model depends on what incorrect behaviour the application can tolerate. For financial transactions, inventory counts, and distributed locks, linearizability is required β double-spending or overselling is unacceptable. For user profiles, product catalogues, and social feeds, eventual consistency is usually fine: a stale follower count for a few seconds is invisible to users and the latency saving is enormous. For collaborative documents and chat systems, causal consistency gives a natural model: replies always appear after the messages they reply to, without requiring global coordination.
A pragmatic approach: start with the weakest model that makes your application correct, not the strongest one. Use strong consistency surgically β only for the specific operations that need it (e.g., decrement stock, compare-and-swap a lock) β and default to causal or eventual consistency everywhere else. Many databases let you select per-operation (DynamoDB's ConsistentRead flag, MongoDB's read concern levels, Cassandra's per-query consistency levels).
Frequently Asked Questions
What is the difference between linearizability and serializability?
Serializability is an isolation level for transactions: concurrent transactions produce a result equivalent to some serial (one-at-a-time) execution. It says nothing about real time. Linearizability is a consistency model for single operations: each operation appears atomic at some point between its start and end in real time. A database can be serializable but not linearizable (transactions are ordered but may lag real time), or linearizable but not serializable (single-key atomic, but no multi-key transactions). Strict serializability combines both: serializable transactions that are also ordered in real time β the gold standard, used by Google Spanner.
Is eventual consistency safe for financial systems?
Generally no, without additional safeguards. A payment debit and a balance check must be linearizable (or strictly serializable) to prevent double-spending. However, many parts of a financial system can tolerate eventual consistency: transaction history feeds, fraud-score displays, reporting dashboards. The pattern is to use strong consistency for the authoritative ledger and eventual consistency for derived read views. Systems like PayPal use Paxos-based consensus for the core account balance and eventually consistent replicas for everything else.
How does Cassandra's consistency level setting work?
Cassandra lets you specify a consistency level per read and per write operation. QUORUM reads and writes require acknowledgement from a majority of replicas β if you use QUORUM for both, you get strong consistency (every read sees the latest write) because the read and write quorums are guaranteed to overlap. ONE or LOCAL_ONE means only one replica must respond, giving eventual consistency with the lowest latency. The rule of thumb: if W + R > N (write replicas + read replicas exceeds replication factor), reads and writes overlap and you get strong consistency. Cassandra surfaces this trade-off explicitly, giving you fine-grained control per operation.
Pick the weakest consistency model that still makes your application correct. Consistency is not free β every step stronger costs latency, availability, or both. Spend that budget only where your users would notice the anomaly.
β alokknight Engineering
