TiCDC architecture model: streaming, semantics, and state (Legacy)
A conceptual view of TiCDC as a streaming system: source/sink abstraction, delivery semantics, and how checkpoint/resolved timestamps are maintained.
This page summarizes TiCDC from a “streaming system” perspective. It focuses on concepts that help you reason about correctness and lag, rather than version-specific configuration details.
1. Streaming vs batch (why CDC is streaming)
Data replication is a form of data processing: moving data from one “end” to another. Processing models are often grouped into:
- Batch processing: bounded datasets; collect-then-process.
- Streaming processing: unbounded datasets; event-driven processing with ordering/time semantics.
CDC is naturally a streaming problem: the dataset is unbounded, and correctness often depends on ordering and “time” (e.g., commit timestamps).
2. Source & sink abstraction
TiCDC can be understood via a standard streaming abstraction:
- Source: produces a stream of change events
- Sink: consumes the stream and outputs it to a target system
There are two important “source → sink” segments in the TiCDC pipeline:
- TiKV CDC component → TiCDC Capture (gRPC change event stream)
- TiCDC Puller → downstream sink (MySQL / MQ / etc.)

3. Delivery semantics: at-most-once vs at-least-once vs exactly-once
In streaming systems, delivery semantics define what happens when failures occur:
- At-most-once: events are processed at most once; failures can drop data.
- At-least-once: events can be processed multiple times; retries can cause duplicates.
- Exactly-once: effects are applied exactly once (requires stronger coordination).
TiCDC is typically described as at-least-once. Retries can introduce duplicates, so correctness relies on idempotency at the sink layer (or on downstream consumers that can handle duplicates).
4. System architecture (roles and responsibilities)
TiCDC runs as a set of stateless nodes (Captures) with coordination stored in PD/etcd. A changefeed can be scheduled across multiple Captures.
Key concepts (high-level):
- Owner: a special role elected among captures; responsible for global scheduling and coordination.
- Processor: per-capture runtime that manages table pipelines for a changefeed.
- Changefeed: a replication task definition (the “pipeline”).
- Checkpoint TS: how far the sink has safely processed and emitted.
- Resolved TS: a global watermark indicating all events before this ts are known/available for processing.

5. Incremental scan + streaming
When a changefeed starts (or when a table is added/moved), TiCDC/TiKV perform an initialization sequence:
- Incremental scan: scan historical changes from a start point (implementation details vary by versions)
- Continuous streaming: after the scan, TiKV continuously pushes new changes
Your practical takeaway:
- If initialization for some regions/tables never completes, resolved-ts can be pinned and lag can “grow linearly”.
6. State and fault tolerance (why duplicates can happen)
With at-least-once semantics, failures can cause retries and duplicates. Two common patterns:
- Upstream (TiKV) reconnects and re-sends changes after a leader change.
- A capture failure causes ownership/scheduling changes and re-creation of processors/table pipelines, which re-pulls changes from the last checkpoint.
As long as downstream application is idempotent (or conflicts resolve deterministically), duplicates should not break correctness, but they can increase load.
References
- Flink architecture overview (streaming vs batch): https://flink.apache.org/flink-architecture.html
- Distributed Snapshots paper (Chandy–Lamport): https://lamport.azurewebsites.net/pubs/chandy.pdf
- TiCDC overview: https://docs.pingcap.com/tidb/stable/ticdc-overview