Kafka Kraft 核心实现
作者:Clarke
- 2022 年 3 月 25 日
本文字数:2250 字
阅读完需:约 7 分钟
https://blog.csdn.net/weixin_45505313/article/details/122642581
1. Kafka 集群选举的流程
在 Kafka 3.0 源码笔记(1)-Kafka 服务端的网络通信架构 中笔者提到在 KRaft 模式下 Kafka 集群的元数据已经交由 Controller 集群自治,则在分布式环境下必然要涉及到集群节点的交互,包括集群选主、集群元数据同步等。其中 Kafka 集群选举涉及的状态流转如下图所示,关键的请求交互如下:
Vote
由 Candidate 候选者节点发送,请求其他节点为自己投票
BeginQuorumEpoch
由 Leader 节点发送,告知其他节点当前的 Leader 信息
EndQuorumEpoch
当前 Leader 退位时发送,触发重新选举
Fetch
由 Follower 发送,用于复制 Leader 日志,另外通过 Fetch 请求 Follower 也可以完成对 Leader 的探活
————————————————
版权声明:本文为 CSDN 博主「谈谈 1974」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_45505313/article/details/122642581
/**
* This class implements a Kafkaesque version of the Raft protocol. Leader election
* is more or less pure Raft, but replication is driven by replica fetching and we use Kafka's
* log reconciliation protocol to truncate the log to a common point following each leader
* election.
*
* Like Zookeeper, this protocol distinguishes between voters and observers. Voters are
* the only ones who are eligible to handle protocol requests and they are the only ones
* who take part in elections. The protocol does not yet support dynamic quorum changes.
*
* These are the APIs in this protocol:
*
* 1) {@link VoteRequestData}: Sent by valid voters when their election timeout expires and they
* become a candidate. This request includes the last offset in the log which electors use
* to tell whether or not to grant the vote.
*
* 2) {@link BeginQuorumEpochRequestData}: Sent by the leader of an epoch only to valid voters to
* assert its leadership of the new epoch. This request will be retried indefinitely for
* each voter until it acknowledges the request or a new election occurs.
*
* This is not needed in usual Raft because the leader can use an empty data push
* to achieve the same purpose. The Kafka Raft implementation, however, is driven by
* fetch requests from followers, so there must be a way to find the new leader after
* an election has completed.
*
* 3) {@link EndQuorumEpochRequestData}: Sent by the leader of an epoch to valid voters in order to
* gracefully resign from the current epoch. This causes remaining voters to immediately
* begin a new election.
*
* 4) {@link FetchRequestData}: This is the same as the usual Fetch API in Kafka, but we add snapshot
* check before responding, and we also piggyback some additional metadata on responses (i.e. current
* leader and epoch). Unlike partition replication, we also piggyback truncation detection on this API
* rather than through a separate truncation state.
*
* 5) {@link FetchSnapshotRequestData}: Sent by the follower to the epoch leader in order to fetch a snapshot.
* This happens when a FetchResponse includes a snapshot ID due to the follower's log end offset being less
* than the leader's log start offset. This API is similar to the Fetch API since the snapshot is stored
* as FileRecords, but we use {@link UnalignedRecords} in FetchSnapshotResponse because the records
* are not necessarily offset-aligned.
复制代码
/**
* QuorumController implements the main logic of the KRaft (Kafka Raft Metadata) mode controller.
*
* The node which is the leader of the metadata log becomes the active controller. All
* other nodes remain in standby mode. Standby controllers cannot create new metadata log
* entries. They just replay the metadata log entries that the current active controller
* has created.
*
* The QuorumController is single-threaded. A single event handler thread performs most
* operations. This avoids the need for complex locking.
*
* The controller exposes an asynchronous, futures-based API to the world. This reflects
* the fact that the controller may have several operations in progress at any given
* point. The future associated with each operation will not be completed until the
* results of the operation have been made durable to the metadata log.
*/
复制代码
/**
* The ReplicationControlManager is the part of the controller which deals with topics
* and partitions. It is responsible for managing the in-sync replica set and leader
* of each partition, as well as administrative tasks like creating or deleting topics.
*/
public class ReplicationControlManager
复制代码
划线
评论
复制
发布于: 刚刚阅读数: 2
Clarke
关注
还未添加个人签名 2018.04.15 加入
还未添加个人简介
评论