Stream Processing in System Design: Windows, Watermarks & Exactly-Once (Visualized)

Stream processing is the practice of computing results continuously over an unbounded sequence of events as they arrive, rather than collecting data into fixed files and processing it on a schedule. Each event — a click, a payment, a sensor reading — is handled within milliseconds to seconds of being produced, so dashboards, alerts, and downstream systems reflect reality in near real time.

A stream processor reads from a durable log (such as Apache Kafka), applies a chain of operators — filters, maps, joins, aggregations — and writes results to sinks like databases, caches, or other topics. Unlike a request/response service, the program runs forever: it maintains state across events and emits new output every time fresh data flows through.

Stream Processing vs Batch Processing

Batch processing runs over a bounded dataset — a day's worth of logs, a table snapshot — on a schedule (hourly, nightly). It is simple and throughput-efficient, but results are always stale by at least one batch interval. Stream processing trades that simplicity for freshness: it processes events one at a time (or in tiny micro-batches) and updates results incrementally, so latency drops from hours to milliseconds.

Batch vs stream: latency to a result

Batch waits for the window to fill, then processes in one burst (stale until it fires). The stream operator updates a running aggregate on every event in real time.

	Batch	Stream
Input	Bounded dataset (file, snapshot)	Unbounded, never-ending stream
Latency	Minutes to hours (per interval)	Milliseconds to seconds
Execution	Runs to completion on a schedule	Runs forever, incrementally
State	Re-derived each run	Maintained continuously across events
Tools	Spark, Hadoop MapReduce, Airflow	Flink, Kafka Streams, Spark Structured Streaming

Windowing: Tumbling, Sliding & Session

An unbounded stream never ends, so aggregates like "count per minute" need a way to scope events into finite chunks. That scoping is windowing. A tumbling window divides time into fixed, non-overlapping intervals (e.g., every 60 seconds), emitting one result per interval. A sliding window also has a fixed size but advances by a smaller step, so windows overlap and each event lands in several of them. A session window has no fixed boundaries at all: it groups bursts of activity and closes after a configurable gap of inactivity, which is ideal for user sessions.

Tumbling window: collect, close, emit, repeat

Events flow into the active window. When the interval ends the window closes, emits its count, and a fresh empty window opens — forever.

Window type	Overlap	Boundary	Typical use
Tumbling	None	Fixed size, fixed start	Per-minute counts, billing buckets
Sliding	Yes (size > step)	Fixed size, advances by step	Moving averages, rolling alerts
Session	None	Data-driven, closes after a gap	User sessions, activity bursts

Event Time vs Processing Time & Watermarks

Event time is when an event actually happened (a timestamp baked into the record). Processing time is when the stream processor sees it. They diverge constantly: a mobile client goes offline, a network hiccups, a partition lags — so events arrive late and out of order. If you window by processing time, a sale that happened at 11:59 but arrived at 12:01 lands in the wrong bucket.

A watermark is the processor's assertion that "event time has advanced to T; I do not expect events older than T anymore." It lets the engine decide when a window is complete enough to emit while bounding how long it waits. Events that arrive after the watermark has passed their window are late: they can be dropped, sent to a side output, or used to update a previously emitted result, depending on the configured allowed lateness.

Watermarks and late events

Events are placed by event time. The watermark sweeps right; when it passes a window's end the window fires. A late event before the watermark is accepted; one after it is dropped.

Stateful Operations & Exactly-Once

Most useful stream jobs are stateful: they keep running totals, dedup sets, window contents, or join buffers. That state must survive failures. Engines like Flink achieve this with checkpoints — periodic, consistent snapshots of all operator state plus the input offsets that produced it. On failure, the job rewinds to the last checkpoint and replays from those offsets.

This enables exactly-once processing: every event affects the result exactly once, even across crashes and retries. It is usually implemented as effectively-once via idempotent or transactional sinks (e.g., Kafka transactions) combined with checkpointed offsets. Contrast with at-most-once (may lose events) and at-least-once (may double-count). Exactly-once is the gold standard for billing, counting, and financial pipelines.

# Stateful streaming word count (PyFlink, event-time + checkpointing)
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.common import Time
from pyflink.datastream.window import TumblingEventTimeWindows

env = StreamExecutionEnvironment.get_execution_environment()

# Checkpoint every 5s -> consistent snapshot of state + source offsets.
# This is what makes recovery exactly-once.
env.enable_checkpointing(5000)

stream = env.from_source(kafka_source, watermark_strategy, "events")

result = (
    stream
    .map(lambda line: (line.word, 1))
    .key_by(lambda kv: kv[0])
    # Window by EVENT time; the watermark decides when each window fires.
    .window(TumblingEventTimeWindows.of(Time.minutes(1)))
    .reduce(lambda a, b: (a[0], a[1] + b[1]))  # running count = state
)

result.sink_to(transactional_sink)  # commit aligned with checkpoints
env.execute("word-count")

Backpressure

Producers do not slow down just because a downstream operator is busy. When a slow operator (say, one writing to an overloaded database) cannot keep up, its input buffers fill. Backpressure is the mechanism that propagates this fullness upstream: each stage signals "slow down" to the stage feeding it, all the way back to the source, which then reads from the log more slowly. Because the source is a durable log, unconsumed events simply wait — nothing is lost. Without backpressure, buffers overflow and the job crashes (OOM) or silently drops data.

Lambda vs Kappa Architecture

The Lambda architecture runs two parallel paths: a batch layer that recomputes accurate results over all historical data, and a speed layer that uses stream processing to fill the gap for recent data. A serving layer merges them. It is robust but doubles the code: the same logic must be written twice, in two systems, and kept consistent.

The Kappa architecture drops the batch layer entirely. Everything is a stream: there is one code path, one engine. To reprocess history (e.g., after a bug fix), you simply replay the durable log from the beginning through the same streaming job. Modern engines with strong event-time semantics and exactly-once make Kappa practical, and it has become the default for most new real-time systems.

	Lambda	Kappa
Layers	Batch + speed + serving	Single streaming layer
Code paths	Two (batch + stream logic)	One
Reprocessing	Re-run batch job	Replay the log through the stream
Complexity	Higher (keep two paths in sync)	Lower (one system to operate)
Best when	Heavy historical recompute needs	Most modern real-time pipelines

The Tooling Landscape

Kafka Streams is a lightweight Java library embedded in your application — no separate cluster — great for per-topic transformations and joins. Apache Flink is a dedicated, distributed engine with the richest event-time, windowing, and exactly-once semantics, built for large stateful jobs. Spark Structured Streaming extends Spark's batch API to streams via micro-batches (and a low-latency continuous mode), letting teams reuse one framework for both batch and stream. All three read from Kafka and support windowing, state, and at-least-once or exactly-once delivery.

Frequently Asked Questions

What is the difference between stream and batch processing?

Batch processing runs over a bounded, complete dataset on a schedule and finishes, so results lag by at least one interval. Stream processing runs forever over an unbounded flow of events, updating results incrementally within milliseconds to seconds. Choose batch for large periodic recomputes where latency does not matter, and stream when you need fresh results continuously.

Why do watermarks matter in stream processing?

Events arrive late and out of order, so a window can never be sure it has seen everything. A watermark is the engine's estimate of how far event time has progressed; it lets the engine decide when to fire a window while bounding how long it waits for stragglers. Events arriving after the watermark passes their window are treated as late and dropped, side-output, or used to amend earlier results based on the allowed-lateness setting.

How does exactly-once processing work?

The engine periodically checkpoints all operator state together with the source offsets that produced it. On failure it restores the latest checkpoint and replays inputs from those offsets. Combined with idempotent or transactional sinks (such as Kafka transactions), each event affects the final result exactly once despite retries — distinct from at-most-once (may lose data) and at-least-once (may double-count).

Batch asks what happened yesterday; streams answer what is happening right now. Windows give the stream a shape, watermarks give it a deadline, and checkpoints give it a memory.
— alokknight Engineering