Stream Processing in System Design: Windows, Watermarks & Exactly-Once (Visualized)
Stream processing computes results over unbounded event streams in near real time, instead of waiting to batch data on a schedule. This guide covers windowing, event time vs processing time, watermarks, stateful exactly-once operations, backpressure, and Lambda vs Kappa โ with live animations of each core idea.
Stream processing is the practice of computing results continuously over an unbounded sequence of events as they arrive, rather than collecting data into fixed files and processing it on a schedule. Each event โ a click, a payment, a sensor reading โ is handled within milliseconds to seconds of being produced, so dashboards, alerts, and downstream systems reflect reality in near real time.
A stream processor reads from a durable log (such as Apache Kafka), applies a chain of operators โ filters, maps, joins, aggregations โ and writes results to sinks like databases, caches, or other topics. Unlike a request/response service, the program runs forever: it maintains state across events and emits new output every time fresh data flows through.
Stream Processing vs Batch Processing
Batch processing runs over a bounded dataset โ a day's worth of logs, a table snapshot โ on a schedule (hourly, nightly). It is simple and throughput-efficient, but results are always stale by at least one batch interval. Stream processing trades that simplicity for freshness: it processes events one at a time (or in tiny micro-batches) and updates results incrementally, so latency drops from hours to milliseconds.
| Batch | Stream | |
|---|---|---|
| Input | Bounded dataset (file, snapshot) | Unbounded, never-ending stream |
| Latency | Minutes to hours (per interval) | Milliseconds to seconds |
| Execution | Runs to completion on a schedule | Runs forever, incrementally |
| State | Re-derived each run | Maintained continuously across events |
| Tools | Spark, Hadoop MapReduce, Airflow | Flink, Kafka Streams, Spark Structured Streaming |
Windowing: Tumbling, Sliding & Session
An unbounded stream never ends, so aggregates like "count per minute" need a way to scope events into finite chunks. That scoping is windowing. A tumbling window divides time into fixed, non-overlapping intervals (e.g., every 60 seconds), emitting one result per interval. A sliding window also has a fixed size but advances by a smaller step, so windows overlap and each event lands in several of them. A session window has no fixed boundaries at all: it groups bursts of activity and closes after a configurable gap of inactivity, which is ideal for user sessions.
| Window type | Overlap | Boundary | Typical use |
|---|---|---|---|
| Tumbling | None | Fixed size, fixed start | Per-minute counts, billing buckets |
| Sliding | Yes (size > step) | Fixed size, advances by step | Moving averages, rolling alerts |
| Session | None | Data-driven, closes after a gap | User sessions, activity bursts |
Event Time vs Processing Time & Watermarks
Event time is when an event actually happened (a timestamp baked into the record). Processing time is when the stream processor sees it. They diverge constantly: a mobile client goes offline, a network hiccups, a partition lags โ so events arrive late and out of order. If you window by processing time, a sale that happened at 11:59 but arrived at 12:01 lands in the wrong bucket.
A watermark is the processor's assertion that "event time has advanced to T; I do not expect events older than T anymore." It lets the engine decide when a window is complete enough to emit while bounding how long it waits. Events that arrive after the watermark has passed their window are late: they can be dropped, sent to a side output, or used to update a previously emitted result, depending on the configured allowed lateness.
Stateful Operations & Exactly-Once
Most useful stream jobs are stateful: they keep running totals, dedup sets, window contents, or join buffers. That state must survive failures. Engines like Flink achieve this with checkpoints โ periodic, consistent snapshots of all operator state plus the input offsets that produced it. On failure, the job rewinds to the last checkpoint and replays from those offsets.
This enables exactly-once processing: every event affects the result exactly once, even across crashes and retries. It is usually implemented as effectively-once via idempotent or transactional sinks (e.g., Kafka transactions) combined with checkpointed offsets. Contrast with at-most-once (may lose events) and at-least-once (may double-count). Exactly-once is the gold standard for billing, counting, and financial pipelines.
# Stateful streaming word count (PyFlink, event-time + checkpointing)
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.common import Time
from pyflink.datastream.window import TumblingEventTimeWindows
env = StreamExecutionEnvironment.get_execution_environment()
# Checkpoint every 5s -> consistent snapshot of state + source offsets.
# This is what makes recovery exactly-once.
env.enable_checkpointing(5000)
stream = env.from_source(kafka_source, watermark_strategy, "events")
result = (
stream
.map(lambda line: (line.word, 1))
.key_by(lambda kv: kv[0])
# Window by EVENT time; the watermark decides when each window fires.
.window(TumblingEventTimeWindows.of(Time.minutes(1)))
.reduce(lambda a, b: (a[0], a[1] + b[1])) # running count = state
)
result.sink_to(transactional_sink) # commit aligned with checkpoints
env.execute("word-count")Backpressure
Producers do not slow down just because a downstream operator is busy. When a slow operator (say, one writing to an overloaded database) cannot keep up, its input buffers fill. Backpressure is the mechanism that propagates this fullness upstream: each stage signals "slow down" to the stage feeding it, all the way back to the source, which then reads from the log more slowly. Because the source is a durable log, unconsumed events simply wait โ nothing is lost. Without backpressure, buffers overflow and the job crashes (OOM) or silently drops data.
Lambda vs Kappa Architecture
The Lambda architecture runs two parallel paths: a batch layer that recomputes accurate results over all historical data, and a speed layer that uses stream processing to fill the gap for recent data. A serving layer merges them. It is robust but doubles the code: the same logic must be written twice, in two systems, and kept consistent.
The Kappa architecture drops the batch layer entirely. Everything is a stream: there is one code path, one engine. To reprocess history (e.g., after a bug fix), you simply replay the durable log from the beginning through the same streaming job. Modern engines with strong event-time semantics and exactly-once make Kappa practical, and it has become the default for most new real-time systems.
| Lambda | Kappa | |
|---|---|---|
| Layers | Batch + speed + serving | Single streaming layer |
| Code paths | Two (batch + stream logic) | One |
| Reprocessing | Re-run batch job | Replay the log through the stream |
| Complexity | Higher (keep two paths in sync) | Lower (one system to operate) |
| Best when | Heavy historical recompute needs | Most modern real-time pipelines |
The Tooling Landscape
Kafka Streams is a lightweight Java library embedded in your application โ no separate cluster โ great for per-topic transformations and joins. Apache Flink is a dedicated, distributed engine with the richest event-time, windowing, and exactly-once semantics, built for large stateful jobs. Spark Structured Streaming extends Spark's batch API to streams via micro-batches (and a low-latency continuous mode), letting teams reuse one framework for both batch and stream. All three read from Kafka and support windowing, state, and at-least-once or exactly-once delivery.
Frequently Asked Questions
What is the difference between stream and batch processing?
Batch processing runs over a bounded, complete dataset on a schedule and finishes, so results lag by at least one interval. Stream processing runs forever over an unbounded flow of events, updating results incrementally within milliseconds to seconds. Choose batch for large periodic recomputes where latency does not matter, and stream when you need fresh results continuously.
Why do watermarks matter in stream processing?
Events arrive late and out of order, so a window can never be sure it has seen everything. A watermark is the engine's estimate of how far event time has progressed; it lets the engine decide when to fire a window while bounding how long it waits for stragglers. Events arriving after the watermark passes their window are treated as late and dropped, side-output, or used to amend earlier results based on the allowed-lateness setting.
How does exactly-once processing work?
The engine periodically checkpoints all operator state together with the source offsets that produced it. On failure it restores the latest checkpoint and replays inputs from those offsets. Combined with idempotent or transactional sinks (such as Kafka transactions), each event affects the final result exactly once despite retries โ distinct from at-most-once (may lose data) and at-least-once (may double-count).
Batch asks what happened yesterday; streams answer what is happening right now. Windows give the stream a shape, watermarks give it a deadline, and checkpoints give it a memory.
โ alokknight Engineering
