写点什么

Kafka Kraft 核心实现

作者:Clarke
  • 2022 年 3 月 25 日
  • 本文字数:2250 字

    阅读完需:约 7 分钟

https://blog.csdn.net/weixin_45505313/article/details/122642581

1. Kafka 集群选举的流程

在 Kafka 3.0 源码笔记(1)-Kafka 服务端的网络通信架构 中笔者提到在 KRaft 模式下 Kafka 集群的元数据已经交由 Controller 集群自治,则在分布式环境下必然要涉及到集群节点的交互,包括集群选主、集群元数据同步等。其中 Kafka 集群选举涉及的状态流转如下图所示,关键的请求交互如下:

Vote

由 Candidate 候选者节点发送,请求其他节点为自己投票

BeginQuorumEpoch

由 Leader 节点发送,告知其他节点当前的 Leader 信息

EndQuorumEpoch

当前 Leader 退位时发送,触发重新选举

Fetch

由 Follower 发送,用于复制 Leader 日志,另外通过 Fetch 请求 Follower 也可以完成对 Leader 的探活

————————————————

版权声明:本文为 CSDN 博主「谈谈 1974」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。

原文链接:https://blog.csdn.net/weixin_45505313/article/details/122642581


/** * This class implements a Kafkaesque version of the Raft protocol. Leader election * is more or less pure Raft, but replication is driven by replica fetching and we use Kafka's * log reconciliation protocol to truncate the log to a common point following each leader * election. * * Like Zookeeper, this protocol distinguishes between voters and observers. Voters are * the only ones who are eligible to handle protocol requests and they are the only ones * who take part in elections. The protocol does not yet support dynamic quorum changes. * * These are the APIs in this protocol: * * 1) {@link VoteRequestData}: Sent by valid voters when their election timeout expires and they *    become a candidate. This request includes the last offset in the log which electors use *    to tell whether or not to grant the vote. * * 2) {@link BeginQuorumEpochRequestData}: Sent by the leader of an epoch only to valid voters to *    assert its leadership of the new epoch. This request will be retried indefinitely for *    each voter until it acknowledges the request or a new election occurs. * *    This is not needed in usual Raft because the leader can use an empty data push *    to achieve the same purpose. The Kafka Raft implementation, however, is driven by *    fetch requests from followers, so there must be a way to find the new leader after *    an election has completed. * * 3) {@link EndQuorumEpochRequestData}: Sent by the leader of an epoch to valid voters in order to *    gracefully resign from the current epoch. This causes remaining voters to immediately *    begin a new election. * * 4) {@link FetchRequestData}: This is the same as the usual Fetch API in Kafka, but we add snapshot *    check before responding, and we also piggyback some additional metadata on responses (i.e. current *    leader and epoch). Unlike partition replication, we also piggyback truncation detection on this API *    rather than through a separate truncation state. * * 5) {@link FetchSnapshotRequestData}: Sent by the follower to the epoch leader in order to fetch a snapshot. *    This happens when a FetchResponse includes a snapshot ID due to the follower's log end offset being less *    than the leader's log start offset. This API is similar to the Fetch API since the snapshot is stored *    as FileRecords, but we use {@link UnalignedRecords} in FetchSnapshotResponse because the records *    are not necessarily offset-aligned.
复制代码


/** * QuorumController implements the main logic of the KRaft (Kafka Raft Metadata) mode controller. * * The node which is the leader of the metadata log becomes the active controller.  All * other nodes remain in standby mode.  Standby controllers cannot create new metadata log * entries.  They just replay the metadata log entries that the current active controller * has created. * * The QuorumController is single-threaded.  A single event handler thread performs most * operations.  This avoids the need for complex locking. * * The controller exposes an asynchronous, futures-based API to the world.  This reflects * the fact that the controller may have several operations in progress at any given * point.  The future associated with each operation will not be completed until the * results of the operation have been made durable to the metadata log. */
复制代码


/** * The ReplicationControlManager is the part of the controller which deals with topics * and partitions. It is responsible for managing the in-sync replica set and leader * of each partition, as well as administrative tasks like creating or deleting topics. */public class ReplicationControlManager
复制代码


用户头像

Clarke

关注

还未添加个人签名 2018.04.15 加入

还未添加个人简介

评论

发布
暂无评论
Kafka Kraft核心实现_Clarke_InfoQ写作平台