Event Sourcing in System Design: Immutable Logs, Snapshots & CQRS (Visualized)
Event Sourcing stores every change to application state as an immutable, append-only sequence of events β so you can always reconstruct what happened and why. This guide covers the event store, state replay, snapshots, CQRS pairing, and real trade-offs β with live animations.
Event Sourcing is an architectural pattern in which the state of a system is derived entirely from an ordered, append-only log of immutable events β instead of storing only the current state, every change that ever happened is persisted as a first-class record that can be replayed at any time.
In a traditional CRUD system, updating a bank account balance means overwriting a single row in a database. Event Sourcing flips that model: instead of UPDATE accounts SET balance = 950 WHERE id = 42, you append an event { type: 'MoneyWithdrawn', amount: 50 } to a log. The current balance is always derived by replaying every event from the beginning. This seemingly simple shift unlocks audit trails, time-travel debugging, and a natural fit for event-driven microservices.
State as a Sequence of Events
Think of every object in your system β a bank account, an order, a user profile β as an entity whose current state is a function of its entire history. Each mutation is captured as a named, past-tense domain event: AccountOpened, MoneyDeposited, MoneyWithdrawn. To answer "what is the balance right now?", the system replays the event stream and applies each event in order, arriving at the current state with full traceability.
This is not a new idea β it mirrors how accounting ledgers have worked for centuries. Double-entry bookkeeping never erases a line; it appends corrective entries. Git is another everyday example: a repository is defined by its commit log, not by a single snapshot of the working tree. Event Sourcing brings this discipline to application data.
The Event Store
The event store is the heart of an event-sourced system. It is a specialised database optimised for two operations: appending new events to the tail of a stream, and reading a stream from a given position. Events are uniquely identified by stream ID (e.g., account-42) and a monotonically increasing sequence number. Popular implementations include EventStoreDB (purpose-built), Apache Kafka (durable log), and PostgreSQL with a single events table guarded by optimistic concurrency checks.
A minimal event record carries at minimum: stream_id, sequence, event_type, payload (JSON), and occurred_at. Because events are immutable once written, no row ever needs to be updated or deleted β only INSERT and SELECT are needed, which simplifies replication and makes the data an excellent fit for write-once storage like S3 Glacier for archival.
# Minimal event store table (PostgreSQL)
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
stream_id TEXT NOT NULL, -- e.g. 'account-42'
sequence INT NOT NULL, -- per-stream monotonic counter
event_type TEXT NOT NULL, -- e.g. 'MoneyDeposited'
payload JSONB NOT NULL, -- event data
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (stream_id, sequence) -- optimistic concurrency guard
);
# Appending a new event
INSERT INTO events (stream_id, sequence, event_type, payload)
VALUES ('account-42', 3, 'MoneyDeposited', '{"amount": 200}');
# Replaying all events for one stream
SELECT event_type, payload
FROM events
WHERE stream_id = 'account-42'
ORDER BY sequence;Snapshots: Speeding Up Replay
Replaying every event from the very beginning is correct, but it grows expensive as event streams age. An account opened five years ago may have hundreds of thousands of events. The solution is a snapshot: a periodic checkpoint that captures the full aggregate state at a given sequence number. On the next load, the system fetches the most recent snapshot and only replays the events after it β reducing replay cost from O(all events) to O(events since last snapshot).
Snapshots are a pure performance optimisation and do not change the semantics of the system. They can be generated asynchronously β after every N events, or on a schedule β and stored in a separate snapshot store. If all snapshots were deleted, the system would still produce identical answers by replaying from event 1. This is a key correctness guarantee: the event log is the ground truth; snapshots are disposable caches.
Audit Trails and Time-Travel Debugging
Because every state change is an immutable, timestamped event in the log, Event Sourcing provides a perfect audit trail at zero extra cost. For regulated industries β banking, healthcare, e-commerce β this is transformative. "Who changed this record, when, and to what value?" becomes a simple query against the event stream rather than a forensic exercise across change-log tables or backup restores.
Time-travel debugging follows naturally: to understand the state of an account at any point in the past, replay events up to that timestamp and stop. There is no need for point-in-time backup restores or temporal tables β the event log is the temporal record. This also enables powerful "what if" scenarios: replay a stream with a bug-fix applied to older events to verify the corrected behaviour without touching production data.
Event Sourcing vs Traditional State Storage
The difference between state-oriented and event-sourced storage is best seen side by side. In a traditional system, only the latest value of each field is kept β history is implicitly discarded on every update. In an event-sourced system, the history is the primary record and the current state is always computable from it.
| Dimension | Traditional (State-Oriented) | Event Sourcing |
|---|---|---|
| Storage unit | Current state row (overwritten) | Immutable event records (appended) |
| History | Lost on each update | Full history always available |
| Audit trail | Requires extra CDC / changelog tables | Built-in at zero extra cost |
| Time-travel | Requires point-in-time backups | Replay log to any past timestamp |
| Read performance | Direct row lookup β very fast | Requires replay or read model |
| Write pattern | UPDATE / upsert | INSERT only (append) |
| Schema changes | ALTER TABLE, migration scripts | Add new event types; versioning needed |
| Complexity | Low for simple CRUD | Higher β projections, versioning, snapshots |
| Best for | Simple CRUD, low-change data | Complex domains, audit, event-driven systems |
Pairing Event Sourcing with CQRS
CQRS (Command Query Responsibility Segregation) separates the write side of your application (commands that mutate state) from the read side (queries that return data). Event Sourcing and CQRS are distinct patterns but pair exceptionally well because they solve complementary problems.
On the write side, a command (e.g., WithdrawMoney) is validated against the current aggregate state (reconstructed by replaying its event stream), then one or more new events are appended to the store. On the read side, projections (also called read models or view models) subscribe to the event stream and maintain denormalised, query-optimised data structures β Redis hashes, Elasticsearch indices, materialised SQL views β that answer queries in O(1) time without any replay. This means reads can be arbitrarily fast regardless of how deep the event history goes.
The key insight is that projections are derived and therefore disposable: if your read model schema needs to change, you can delete the projection and rebuild it by replaying the entire event history from scratch. This makes schema evolution on the read side trivially safe β a property traditional databases do not offer.
Trade-Offs and When Not to Use Event Sourcing
Event Sourcing is not a free lunch. The added power comes with real costs that must be weighed against the benefits.
Complexity: you must implement and maintain the event store, projections, snapshot logic, and command handlers. Teams unfamiliar with the pattern face a steep learning curve. For simple CRUD apps β a settings page, a static product catalogue β Event Sourcing adds overhead with no meaningful benefit. It shines in domains with complex business logic, high regulatory requirements, or collaborative workflows where conflict resolution from history is valuable.
Schema evolution of events is harder than altering a table column. Once an event type is published, it may be replayed years later by code that no longer exists. You must adopt an event versioning strategy β upcasting (transforming old events to a newer schema on read), event version fields, or maintaining separate event type variants β from day one.
Eventual consistency: because projections are updated asynchronously after events are published, reads from a projection may lag behind the latest write. Systems that require strong read-your-own-writes consistency must either read directly from the event log (expensive) or implement synchronous projection updates (negating some CQRS benefits). This is a fundamental tension that must be designed around, not ignored.
GDPR and data deletion: because events are immutable, deleting personal data is non-trivial. Common approaches include encrypting personal data and discarding the encryption key ("cryptographic erasure"), or storing personal data references that are nullified outside the event log. These must be planned for before the system goes live.
Frequently Asked Questions
Is Event Sourcing the same as an event-driven architecture?
No β they are related but distinct. Event-driven architecture is a broad style where services communicate by emitting and consuming events (e.g., via Kafka). Event Sourcing is a specific persistence pattern where an aggregate's state is stored as a sequence of events rather than as a current-value row. You can use one without the other: you can publish domain events for inter-service communication without event-sourcing your internal state, or you can event-source your state without broadcasting those events externally. In practice, they compose well β event-sourced aggregates are a natural source for the integration events an event-driven architecture depends on.
How often should I take snapshots?
There is no universal rule β it depends on event volume and acceptable load latency. A common heuristic is to snapshot every 50β200 events, or whenever a load operation exceeds a target latency threshold (e.g., > 10 ms). Because snapshots are disposable caches, taking them too frequently wastes storage but is otherwise safe; taking them too infrequently means load operations replay many events. Start without snapshots, measure replay latency under realistic load, then add snapshotting only when you can observe a problem to solve.
Can I use Event Sourcing with a standard relational database?
Yes, and it is a perfectly reasonable starting point. A PostgreSQL events table with a UNIQUE(stream_id, sequence) constraint provides optimistic concurrency control out of the box. You get ACID guarantees on individual stream appends, straightforward backups, and familiar tooling. Purpose-built stores like EventStoreDB offer built-in subscriptions, projections, and better throughput at high event rates, but PostgreSQL can comfortably handle millions of events per stream and tens of thousands of appends per second β more than sufficient for most production workloads. Migrate to a specialised store only when you can measure that you need to.
The event log is the ground truth. Every read model, every projection, every snapshot is a disposable cache β delete them all and replay from event 1 to get everything back. That durability is what makes Event Sourcing worth its complexity.
β alokknight Engineering
