The Lifecycle of Point Get (TiDB Source Code Reading)
From SQL entering TiDB to TiKV returning results, this article walks through the full execution path of a Point Get query.
1. Preface
Interested friends can pick it up themselves. During the period, I have seen some modules, such as: How PD schedules Regions, and I only peeked into the partial functions of the database. There is always a feeling of "I don't understand how the database logical concepts and codes are related at all!". Therefore, taking advantage of "Point Check" to avoid a large amount of complex optimizer code, we try our best to connect the execution processes of Point Check between TiDB and TiKV. Since then, I have become increasingly aware of the difference in value between "production and research" and "delivery". For example, you cannot say that just looking at the code is awesome. On the premise that the documentation is rich and the product is mature, learning the documentation is the faster, more comprehensive, and more valuable method. See the summary for more thoughts.
Tips 1. In order to prevent the impact of subsequent code reconstruction, breakpoint debugging is performed on TiDB v5.3.0 tidb and tikv this time. 2. To facilitate readers’ understanding, Github anchor links will be given for functions or methods that appear for the first time in this article, and URL links will no longer be marked if they appear repeatedly.
2. Summary
The main content of this article is distributed in "3. Enumeration process", "3.1 TiDB part" introduces the enumeration SQL flow process within TiDB. "3.2 TiKV Part" introduces the process of how the query is transferred, processed, and returned within TiKV after it is processed by TiDB and requested to TiKV. "3.3 Summary" briefly introduces the reasons why tidb is fast. The “Model Tips” of each module will explain the role of the module in each component of TIDB. And we try our best to explain in a hierarchical description. The so-called layering, for example: if the Executor calls the PD Client, then only the logic of that function triggering the call to the PD Client is described in the Executor part, without detailing how the PD Client works at this level. Finally, in "Four. Learning Summary", personal views on the value of DBA's code reading behavior are introduced.
3. Enumeration process
3.1 TiDB
The following 3.1 section is an introduction to the processing flow involved in the TiDB component. Roughly as shown below: 1. First, the client connects to MySQL Protocol Layer to access TiDB. After obtaining the Token, it sends SQL for execution; 2. Secondly, SQL processing enters the Parser layer and is parsed into AST (Abstract Syntax Tree); 3. Again, SQL processing enters the Compile layer, selects TSO and compiles AST into an execution plan; 4. Then, SQL processing enters the Executor layer (volcano model that can be "Batch processed"), and the execution plan is Open, Next, and Close to complete the TiKV data acquisition; 5. Finally, SQL processing returns to MySQL Protocol Layer and calls writeChunks to export the data in MemBuffer and return it in the form required by the client; By tracing func (cc *clientConn) handleQuery(...) --> func (cc *clientConn) handleStmt(...) --> func (tc *TiDBContext) ExecuteStmt(...) It is a serial query. From the methods of parsing, compilation, execution, and protocol echo, understanding this function is crucial to understanding the full text.

3.1.1 SQL Protol Deal
Model Tips: TiDB SQL Protol processing layer is only the code processing done by TiDB to achieve MySQL Protocol compatibility. The main functions include: monitoring client requests, distributing different SQL processing, calling Parser, Compiler, and Executor to complete SQL processing, writing back result sets, etc.
1. First, when starting the Server, the go native package net is encapsulated inside the Struct clientConn for TCP communication, and two Listener goroutines are started in the for loop to process the messages sent by the client.
// Run runs the server.
func (s *Server) Run() error {
go s.startNetworkListener(s.listener, false, errChan)
go s.startNetworkListener(s.socket, true, errChan)
......
}
func (s *Server) startNetworkListener(listener net.Listener, isUnixSocket bool, errChan chan error) {
for {
conn, err := listener.Accept()
......
go s.onConn(clientConn)
}
}2. Secondly, token will be obtained first in the Dispatch method, which is used to limit the number of sessions that a single TiDB Instance can handle while executing requests. For details, see: token-limit; 3. Again, the Run, readPacket, writePacket, handshake, openSession, handleQuery, handleStmt and other methods are implemented in Struct clientConn to implement MySQL Client/Server Protocol. In this example, the query SQL sent from MySQL Client follows the MySQL Protocol and enters the handleQuery branch for processing through dispatch method; 4. Then, handleQuery will loop through each SQL in the Session, and parse, compile, and execute each SQL in handleStmt; 5. Finally, call the writeResultset method in handleStmt to trigger the Next() method of the organized Executor to obtain data from TiKV;
3.1.2 SQL Parser Deal
Model Tips: The SQL Parser processing layer implements lexical parsing of MySQL by encapsulating YACC and converts SQL into AST.
1. First, in handleQuery, call the func (s *session) Parse(...) method to convert and return the AST; 2. Secondly, a deeper understanding will reveal that the Parse method calls the func (s *session) ParseSQL(...) function to implement the real lexical parsing action, and records whether the parsing of SQL is successful and the time it takes to parse. 3. Finally, enter func (s *session) Parse(...) First encapsulate a sync.Pool as a parserPool to reduce the pressure on the GC struct, and copy the AST result and return it to the upper call, and then the deep DEBUG will enter the Parser module.
func (s *session) ParseSQL(ctx context.Context, sql string, params ...parser.ParseParam) ([]ast.StmtNode, []error, error) {
......
p := parserPool.Get().(*parser.Parser)
defer parserPool.Put(p)
p.SetSQLMode(s.sessionVars.SQLMode)
p.SetParserConfig(s.sessionVars.BuildParserConfig())
tmp, warn, err := p.ParseSQL(sql, params...)
res := make([]ast.StmtNode, len(tmp))
copy(res, tmp)
return res, warn, err
}3.1.3 SQL Compile Deal
Model Tips: The SQL Compiler processing layer completes SQL semantic checking (Preprocess), compiled execution plan (Logical Optimize, Physical Optimize), and compiles SQL into an executable physical execution plan.
1. First, concatenate the "parse" and "execute" operations from the MySQL Protocol Layer, and call [func (c *Compiler) in func (s *session) ExecuteStmt(...) Compile(...)](https://github.com/pingcap/tidb/blob/27348d67951c5d9e409c84ca095f0e5d3332c1fd/executor/compiler.go#L51-L109) performs the actual compilation process. 2. Secondly, call func Preprocess(...) inside Compile to perform Preprocess to complete pre-checking, such as semantic checking. The specific implementation process is to construct a Vistor to traverse the AST through the Accept method of the AST. Each Visitor interface contains Enter and Leave methods, and when Enter or Leave, judged according to the SQL type. In this example, point get will jump to func (n *SelectStmt) Accept(v Visitor), and continue branch processing to complete the traversal. For details, please refer to Compiler of TiDB source code reading --> progress about 10 minutes. 3. Finally, enter the Optimizer process. In this case, because the point check will bypass a large number of optimizer processes, directly enter func TryFastPlan(...) for simple "permission check" and "database name check".
// TryFastPlan tries to use the PointGetPlan for the query.
func TryFastPlan(ctx sessionctx.Context, node ast.Node) (p Plan) {
......
case *ast.SelectStmt:
if fp := tryPointGetPlan(ctx, x, isForUpdateReadSelectLock(x.LockInfo)); fp != nil {
if checkFastPlanPrivilege(ctx, fp.dbName, fp.TblInfo.Name.L, mysql.SelectPriv) != nil {
return nil
}
if tidbutil.IsMemDB(fp.dbName) {
return nil
}
if fp.IsTableDual {
return
}
p = fp
return
}
}
return nil
}3.1.4 SQL Executor Deal
Model Tips: SQL Executor receives the Plan that has passed the Optimizer, and constructs and executes the volcano model to obtain the execution results. For details on the volcano model, see Zhihu -- SQL Optimization Volcano Model.
1. First, call func (a *ExecStmt) Exec(...) in func (s *session) ExecuteStmt(...), call Open() This method completes the construction of the Executor variant volcano model and finally returns a RecordSet.
2. Secondly, the RecordSet returned by ExecuteStmt contains all the information required by the Executor. At this time, the RecordSet can be regarded as the returned result set, but the execution flow has never triggered the Next() method, that is, there is no real acquisition of data.
3. Finally, actually Next() is triggered by writeResultset of *clientConn) writeChunks(...) --> func (trs *tidbResultSet) Next(...) --> func (a *recordSet) Next(...) --> func Next(...) --> func (e *PointGetExecutor) Next(...) --> func (e *PointGetExecutor) Next(...) --> func (e *PointGetExecutor) getAndLock(...) --> func (e *PointGetExecutor) get(...)** Called layer by layer until the Executor completes obtaining all data.
4. In addition, it is worth mentioning that when func (a ExecStmt) Exec(...) --> Build Executor, because the query uses "primary key" or "unique index" to calibrate a row, there is no duplicate data reading*, so in autoCommit In this case, directly take MaxUint64 as the StartTS of the transaction (the transaction has only one check), that is: infinity +∞. At the same time, PriorityHigh will also be given priority for processing. For details, see force-priority.
func (a *ExecStmt) buildExecutor() (Executor, error) {
......
} else {
// Do not sync transaction for Execute statement, because the real optimization work is done in
// "ExecuteExec.Build".
useMaxTS, err := plannercore.IsPointGetWithPKOrUniqueKeyByAutoCommit(ctx, a.Plan)
if useMaxTS {
if err := ctx.InitTxnWithStartTS(math.MaxUint64); err != nil {
return nil, err
}
}
if stmtPri := stmtCtx.Priority; stmtPri == mysql.NoPriority {
switch {
case useMaxTS:
stmtCtx.Priority = kv.PriorityHigh
case a.LowerPriority:
stmtCtx.Priority = kv.PriorityLow
}
}
}
}
b := newExecutorBuilder(ctx, a.InfoSchema, a.Ti, a.SnapshotTS, a.IsStaleness, a.ReplicaReadScope)
e := b.build(a.Plan)
return e, nil
}3.1.5 TiKV & PD Client Deal
Model Tips: TiKV Client is encapsulated on the TiDB side and is mainly responsible for obtaining KV data from TiKV.
1. First, trace through the execution flow func (s *tikvSnapshot) Get(...) --> func (i *TemporaryTableSnapshotInterceptor) OnGet(...) --> func (s *tikvSnapshot) Get(...) will directly call the package The Get(...) method of tikv client's tikvSnapshot structure obtains KV data from TiKV.
2. Secondly, DEBUG to the internal details of the TiKV Client will find that the TiKV Client follows whether the data exists in the local cache. If there is no data, construct the request header and call GetRegionCache() in the for {} loop to query the Region Cache of the PD Client to obtain the region to be queried. The location in TiKV, send a request to TiKV to obtain data. The reason why a for loop is used is because the Region Cache information is obtained from PD, so it is not necessarily the latest and most accurate Region Location information, including some error recovery operations, such as EpochNotMatch, etc. For details, see: Region Cache cache and cleanup logic explanation or TiDB source code reading series article (18) tikv-client (Part 1).
func (s *KVSnapshot) get(ctx context.Context, bo *retry.Backoffer, k []byte) ([]byte, error) {
// Check the cached values first.
s.mu.RLock()
if s.mu.cached != nil {
if value, ok := s.mu.cached[string(k)]; ok {
atomic.AddInt64(&s.mu.hitCnt, 1)
s.mu.RUnlock()
return value, nil
}
}
s.mu.RUnlock()
......
s.mu.RLock()
req := tikvrpc.NewReplicaReadRequest(tikvrpc.CmdGet,
&kvrpcpb.GetRequest{
Key: k,
Version: s.version,
}, s.mu.replicaRead, &s.replicaReadSeed, kvrpcpb.Context{
Priority: s.priority.ToPB(),
NotFillCache: s.notFillCache,
TaskId: s.mu.taskID,
ResourceGroupTag: s.resourceGroupTag,
})
s.mu.RUnlock()
for {
loc, err := s.store.GetRegionCache().LocateKey(bo, k)
resp, _, _, err := cli.SendReqCtx(bo, req, loc.Region, client.ReadTimeoutShort, tikvrpc.TiKV, "", ops...)
regionErr, err := resp.GetRegionError()
val := cmdGetResp.GetValue()
return val, nil
}
} 3. Finally, during func (e *PointGetExecutor) Next(...) processing, whether isCommonHandleRead is a normal query or a point query, the point executor will directly get the value of the key. Since the main function of TiDB KV is to encapsulate TiKV data acquisition processing, the separate extraction module will not be described in detail.
// Next implements the Executor interface.
func (e *PointGetExecutor) Next(ctx context.Context, req *chunk.Chunk) error {
if e.idxInfo != nil {
if isCommonHandleRead(e.tblInfo, e.idxInfo) {
handleBytes, err := EncodeUniqueIndexValuesForKey(e.ctx, e.tblInfo, e.idxInfo, e.idxVals)
e.handle, err = kv.NewCommonHandle(handleBytes)
} else {
e.idxKey, err = EncodeUniqueIndexKey(e.ctx, e.tblInfo, e.idxInfo, e.idxVals, tblID)
e.handleVal, err = e.get(ctx, e.idxKey)
}
}
......
return nil
}3.2 TiKV
The following 3.2 section is an introduction to the processing flow involved in the TiKV component.

3.2.1 KV Grpc & Service Deal
1. First, after the TiKV process is started, all Grpc request processing is taken over by the Service layer. Located in the src/server/service/kv.rs file. For example, this query request will be processed by a declaration macro such as handle_request!(kv_get, future_get, GetRequest, GetResponse);, and future_get will be called for asynchronous processing.
2. Secondly, parse key and other related information from the Grpc request, and pass it as a parameter to call the storage engine method storage.get(...) for actual call processing, including constructing snapshot, obtaining value, etc.
5. Finally, after the v(value) asynchronous result is obtained, the Grpc is constructed and returned to the requesting end.
fn future_get<E: Engine, L: LockManager>(
storage: &Storage<E, L>,
mut req: GetRequest,
) -> impl Future<Output = ServerResult<GetResponse>> {
let v = storage.get(
req.take_context(),
Key::from_raw(req.get_key()),
req.get_version().into(),
);
async move {
let mut resp = GetResponse::default();
match v {
Ok((val, stats)) => {
match val {
Some(val) => resp.set_value(val),
None => resp.set_not_found(true),
}
}
Err(e) => resp.set_error(extract_key_error(&e)),
}
Ok(resp)
}
}3.2.2 KV Storage ReadPool Deal
1. First, the function of this function is to search for request row data from the snapshot that meets the requirement of "data row submission timestamp < this data row initiation request timestamp", is lock-free, and has the latest timestamp in the MVCC multi-version data.
2. Secondly, observe the function logic, first judge the priority of the request, then spawn a handle from the read_pool thread pool, and specify the parsed "priority" for the handle to execute the request at the storage layer. For details on SQL priority, see --> force-priority Description.
3. Again, prepare_snap_ctx(...) will check for memory lock conflicts and construct a snapshot context based on key, start_ts and other information.
4. Then, pass in snap_ctx and construct a snapshot of the storage engine in step let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, snap_ctx)).await?;.
5. Finally, and get the result in let result = snap_store.get(&key, &mut statistics)});.
pub fn get(
&self,
mut ctx: Context,
key: Key,
start_ts: TimeStamp,
) -> impl Future<Output = Result<(Option<Value>, KvGetStatistics)>> {
let priority = ctx.get_priority();
let res = self.read_pool.spawn_handle(
async move {
let snap_ctx = prepare_snap_ctx(
&ctx,
iter::once(&key),
start_ts,
&bypass_locks,
&concurrency_manager,
CMD,
)?;
let snapshot = Self::with_tls_engine(|engine| Self::snapshot(engine, snap_ctx)).await?;
{
let snap_store = SnapshotStore::new(
snapshot,
start_ts,
ctx.get_isolation_level(),
!ctx.get_not_fill_cache(),
bypass_locks,
access_locks,
false,
);
let result = snap_store.get(&key, &mut statistics)});
Ok((
result?,
KvGetStatistics {
stats: statistics,
perf_stats: delta,
latency_stats,
},
))
}
}
.in_resource_metering_tag(resource_tag),
priority,
thread_rng().next_u64(),
);
}3.2.3 KV RocksDB Snapshot Deal
1. First, Drill down into the snapshot construction part from step 4 of 3.2.2. Careful DEBUG will find that fn async_snapshot(...) will initiate ReadIndex before constructing the snapshot to determine whether the leader is really the leader at this time. In the early days, Read Index was implemented by sending a heartbeat, For details about read index, please refer to --> TiKV Function Introduction - Lease Read. Then TiKV introduced Lease Read optimization, for specific concepts and origins, please refer to --> read index and local read scenario analysis. For this example, check the Lease Read part of the logic under the pub fn propose_raft_command function.
2. Secondly, Construct snap_store.get(...) from step 5 of 3.2.2 in depth. For point checking, a point_getter will be constructed, and then get will correspond to the value. The logic is probably processed through the isolation level branch of the transaction (all requests at this stage are SI isolation categories). In the SI branch, scan the lock CF for the User Key. For details, see the code --> impl<S: Snapshot> PointGetter<S>; Because the automatic submission check uses max_ts, this step will return empty, which means that the latest submitted Default CF data can be queried.
3. Finally, after excluding the lock information, entering load_data(user_key) will start a loop, construct a WriteRef containing cursor, and continuously scan Write CF. Because after the transaction is committed, a record with key {user_key}{commit_ts} and value {type}{start_ts} will be written in Write CF, so scanning Write CF of PUT type means that it can be judged whether the queried data exists in Write CF, because MVCC data reading points out that data smaller than 64 bytes will be directly embedded in Lock Info or Write Info, otherwise load_data_from_default_cf(...) will be called from Get the result value of the query in Default CF.
fn load_data(&mut self, user_key: &Key) -> Result<Option<Value>> {
loop {
let write = WriteRef::parse(self.write_cursor.value(&mut self.statistics.write))?;
match write.write_type {
WriteType::Put => {
match write.short_value {
Some(value) => {
// Value is carried in `write`.
self.statistics.processed_size += user_key.len() + value.len();
println!("-short-->{}<---",String::from_utf8_lossy(value));
return Ok(Some(value.to_vec()));
}
None => {
let start_ts = write.start_ts;
let value = self.load_data_from_default_cf(start_ts, user_key)?;
println!("-default-->{:?}<---",String::from_utf8_lossy(&value));
self.statistics.processed_size += user_key.len() + value.len();
return Ok(Some(value));
}
}
}
}
if !self.write_cursor.next(&mut self.statistics.write) {
return Ok(None);
}
}
}3.3 Summary
To put it simply, the main point is that the point query can directly locate the required value based on the unique Key. The query data is less, and there is no need to go back to the secondary index to locate the data. Secondly, PointGet skips a large number of optimizer rule optimizations and directly uses FastPlan to save time on the optimizer part. Finally, the enumeration becomes a more efficient execution plan.
4. Learning summary
4.1 Look at the source code conditions
First of all, you need to clearly understand the purpose of reading the source code. As a DBA, from the perspective of wanting to understand the database product, I feel that you can basically start reading the source code if the following points are met. Points 1 and 2 are the foundation, and point 3 is the source of motivation and efficient method to keep reading the code.
- Understand basic programming language syntax, language features, and peripheral components, such as: rust future, rust ownership, go gpm, go mod, cargo, Makefile...etc.;
- Master the functions, processes, and concepts of database components, such as: parsing, compilation, execution, volcano model, vector model, storage model, etc.;
- Maintain the enthusiasm for continuous exploration and communicate more with the community and related enthusiasts. One day, those who do not understand the module will be understood.
Finally, in fact, many logical ideas cannot be seen just by looking at the code. You need to combine the articles on the official website, the concepts described by the author, and industry experience to see it. Otherwise, it is easy to get lost by jumping between codes.
4.2 Look at the value of code
First of all, I personally feel that the act of reading code brings different values to different roles, such as:
1. As a database kernel development role, reading database code is a basic skill, and understanding different product features in daily work... 2. As a DBA, you can see more details under logical concepts, especially when the product is relatively immature (less documentation, more BUGs, and less best practices). When you encounter a problem, you can further delve into the details to find new ideas for solving the problem... 3. As an application developer, you can discover "black technologies" in code implementation methods, and make better use of different database products based on different database characteristics...
Secondly, when it comes to value, what I personally feel cannot be ignored is the "input-output ratio". After adding this measurement factor, the following questions arise in my mind:
- If the behavior just feels awesome, is it really worth doing? If you do it, it may be just to satisfy your inner curiosity about something, which means that there is no measurement of the value of the behavior at all, and it is the result of impulsive behavior. Of course the result may be good or bad.
- How much benefit does this behavior bring to the DBA role? Reading more code may lead to a more accurate and solid understanding of logical concepts and product features. However, compared with reading fault cases, the benefits obtained with the same time investment may be a question mark❓. Furthermore, if there is a product problem, do I really dare to change the code or make a conclusion without the support and confirmation of the product author?
- How does the industry or enterprise position the code reading skills of the DBA role? Work involving the handling of code issues can be mostly handled by professionals, and it may be more efficient for professionals to do professional things. If this premise is correct, then DBA reading code is undoubtedly an inefficient activity.
- If you want to build your own ability to write code, why not do research and development? Hahahahaha...
Finally, I can only ask the above questions now, but I cannot answer them, nor am I qualified to answer them. I am also constantly exploring, learning, and making choices... Maybe there is no absolute answer to this matter.
To summarize, this article describes the relationship between query behavior and code implementation under the concept of database logic at the TiDB product level, which can further improve the author's understanding of the product. At the level of code-learning behavior, it briefly describes the individual's measurement of the value that this behavior brings. Perhaps, these are the 2 "only" ** values that can be brought by the lengthy expression of this article.
5. References
TiDB Blog -- TiDB source code reading series (2) First introduction to TiDB source code TiDB Blog -- TiDB source code reading series (Thirteen) Introduction to index range calculation TiDB Blog -- TiDB source code reading series (3) SQL life TiDB Blog -- TiKV source code analysis series of articles (19) read index and local read scenario analysis TiDB Blog -- TiKV function introduction - Lease Read TiDB Blog -- TiKV source code analysis series of articles (Thirteen) MVCC data reading TiDB Blog -- TiDB source code reading series of articles (18) tikv-client (Part 1) Jack Yu Blog -- How to read the source code of TiDB (1) Jan Su Blog -- TiDB run and debug on M1 MySQL Doc -- MySQL Client/Server Protocol MySQL Doc -- MySQL protocol analysis Talkgo movie -- Compiler for TiDB source code reading [There are Easter eggs] [Go Night Reading] AskTUG Req -- tidb sql execution Zhihu Blog -- RocksDB transaction implementation TransactionDB analysis Zhihu Blog -- SQL Optimization Volcano Model