Glossary
Common terms and abbreviations for TiDB / TiKV / TiCDC (ongoing)
TiDB Ecosystem
- TiDB: A distributed HTAP database that provides the SQL layer plus transactional and analytical query capabilities.
- TiKV: A distributed key-value storage engine. It uses Raft replication underneath and serves data reads/writes.
- PD (Placement Driver): The cluster metadata and scheduling control plane. It provides TSO and Region scheduling capabilities.
- TSO (Timestamp Oracle): A globally monotonic timestamp service used for MVCC and transaction ordering.
- Region: TiKV’s sharding unit (with a key range boundary). It is also the basic unit for Raft replication and scheduling.
- Raft: The consensus protocol used for multi-replica replication and leader election per Region.
- MVCC: Multi-Version Concurrency Control. A single key can have multiple versions to support snapshot reads and concurrent transactions.
TiCDC / TiFlow
- TiCDC: TiDB’s Change Data Capture (CDC) component. It replicates changes from TiKV to downstream systems (MySQL / Kafka / etc.).
- TiFlow: The project that contains TiCDC (and related scheduling/operations capabilities).
- Changefeed: A CDC replication task (“pipeline”) that captures changes from upstream and emits them to downstream.
- Resolved TS: A global watermark in CDC, indicating changes before this timestamp have been fully processed/emitted.
- Resolved TS Lag: The gap between
now - resolved_ts(or an equivalent metric), used to measure replication delay. - Incremental Scan: During CDC initialization, the process of scanning incremental data from a start point (details vary by versions/implementations).
- Backpressure: When downstream consumption is slow and blocks upstream sending/processing, propagating backward along the pipeline.
Downstream & Storage
- Sink: The downstream output (e.g., MySQL, Kafka, S3). It is often a major source of latency/throughput bottlenecks.
- Sorter / Sort Engine: The buffering/sorting engine used to preserve ordering and smooth bursts; it can be disk-IO intensive.
- Redo Log: Logs used for disaster recovery / replay (implementation/config differs by versions). It can also add extra IO pressure.