ARTS 打卡第 3 周
1. Algorithm
题目的来源于 leetcode:https://leetcode.cn/problems/daily-temperatures/。应用的是单调栈(入栈的是温度的数组下标,结果是下标之差),代码实现如下:
2. Review
本周继续是对 redis cluster 的章节的翻译。原文链接:https://redis.io/docs/reference/cluster-spec/中的 Configuration handling, propagation, and failovers。
Masters reply to replica vote request
主节点对于从节点投票请求的响应
In the previous section, we discussed how replicas try to get elected. This section explains what happens from the point of view of a master that is requested to vote for a given replica.
在前面的章节里。我们讨论了从节点是如何尝试进行选举。本节将从主节点的角度解释当收到了一个从节点的投票选举发生了什么。
Masters receive requests for votes in form of
FAILOVER_AUTH_REQUEST
requests from replicas.主节点收到来自从节点 FAILOVER_AUTH_REQUEST 的格式的投票请求。
For a vote to be granted the following conditions need to be met:
要获得投票授权,需要满足以下条件:
A master only votes a single time for a given epoch, and refuses to vote for older epochs: every master has a lastVoteEpoch field and will refuse to vote again as long as the
currentEpoch
in the auth request packet is not greater than the lastVoteEpoch. When a master replies positively to a vote request, the lastVoteEpoch is updated accordingly, and safely stored on disk.1. 对于给定的纪元,主节点只会投一次票,并拒绝为旧的纪元投票:每个主节点都有一个 lastVoteEpoch 字段,在投票请求数据包中的 currentEpoch 大于 lastVoteEpoch 之前,主节点将拒绝再次投票。当主节点正确回应投票请求时,lastVoteEpoch 会相应地更新并安全地存储在磁盘上。
2. A master votes for a replica only if the replica's master is flagged as
FAIL
.2. 主节点只会为被标记为 FAIL 的节点的从节点投票。
3. Auth requests with a
currentEpoch
that is less than the mastercurrentEpoch
are ignored. Because of this the master reply will always have the samecurrentEpoch
as the auth request. If the same replica asks again to be voted, incrementing thecurrentEpoch
, it is guaranteed that an old delayed reply from the master can not be accepted for the new vote.3.如果投票请求数据包中的 currentEpoch 小于主节点的 currentEpoch,它将被忽略。因此,主节点的回复数据包的 currentEpoch 将始终与授权请求数据包的 currentEpoch 相同。这保证了从主节点来的旧的,延迟到达的投票响应不被新的投票请求所接受。
Example of the issue caused by not using rule number 3:
不使用规则 3 引发的问题示例:
Master
currentEpoch
is 5, lastVoteEpoch is 1 (this may happen after a few failed elections)主节点的 currentEpoch 为 5,上次的 lastVoteEpoch 是 1(这可能发生在几次失败的选举之后)。
Replica
currentEpoch
is 3.从节点的 currentEpoch 为 3
Replica tries to be elected with epoch 4 (3+1), master replies with an ok with
currentEpoch
5, however the reply is delayed.从节点在 epoch4 时发起选举(3+1:原来的 3,发起新的投票请求会递增 1),主节点以 currentEpoch 5 回复 OK 响应。但是此响应发生延迟了。
Masters don't vote for a replica of the same master before
NODE_TIMEOUT * 2
has elapsed if a replica of that master was already voted for. This is not strictly required as it is not possible for two replicas to win the election in the same epoch. However, in practical terms it ensures that when a replica is elected it has plenty of time to inform the other replicas and avoid the possibility that another replica will win a new election, performing an unnecessary second failover.4. 如果已经为同一主节点的某个从节点投过票,那么在经过 NODE_TIMEOUT * 2 时间后,主节点不会再为同一主节点的另一个从节点投票。虽然这不是严格要求,因为两个从节点不可能在同一个 epoch 中都获胜。然而,在实际情况下,这确保了当一个从节点被选举时,它有足够的时间通知其他从节点,避免另一个从节点在新的选举中获胜,从而避免不必要的第二次故障转移。
5. Masters make no effort to select the best replica in any way. If the replica's master is in
FAIL
state and the master did not vote in the current term, a positive vote is granted. The best replica is the most likely to start an election and win it before the other replicas, since it will usually be able to start the voting process earlier because of its higher rank as explained in the previous section.5. 主节点不会采取任何措施来选择最佳的从节点。如果从节点的对应的主节点处于 FAIL 状态,并且该主节点在当前任期内没有投票,那它就会进行正确的投票。最佳的从节点通常是最有可能在其他从节点之前开始并赢得选举的,因为根据前面的章节所述,它通常能够更早地启动投票过程,因为它的排名更高。
6. When a master refuses to vote for a given replica there is no negative response, the request is simply ignored.
6. 当主节点拒绝为某从节点投票时,该投票请求只会被忽略,而不会返回任何响应。
7. Masters don't vote for replicas sending a
configEpoch
that is less than anyconfigEpoch
in the master table for the slots claimed by the replica. Remember that the replica sends theconfigEpoch
of its master, and the bitmap of the slots served by its master. This means that the replica requesting the vote must have a configuration for the slots it wants to failover that is newer or equal the one of the master granting the vote.7. 主节点不会为发送的 configEpoch 小于主节点表中任何槽位所声明的 configEpoch 的从节点投票。请记住,从节点会发送其主节点的 configEpoch,以及由其主节点服务的槽位的位图。这意味着请求投票的从节点必须具有对其希望进行故障转移的槽位的配置,该配置应更新或等于响应投票请求的主节点的配置。
至此。我认为的 redis cluster 中的关于 failover 的相关内容就基本翻译完成了。这其实就是 Raft 协议的一个改造和修改。后面也是我们可以借鉴的一个实现。
3. Technique/Tips
redis cluster 的一个应用性场景改造。根据 redis cluser 的实现,当从节点发起投票请求后,只有获得超过半数以上的主节点的响应后,才能发起 failover。我们的场景假设如下:
2 个主机运行环境构成一个主-主架构
3 个主节点,分别对应一个从节点,既包含 3 个从节点
6 个 redis 节点以 3-3 的节点数部署在 2 个主机运行环境中
2 个主机运行环境部署在不同区域的机房,脑裂的场景发生概率高
当 2 个运行环境因为网络停止同步后,不能停止服务
持久化的业务数据不会经常发生变动,缓存数据对于的一致性要求不高
改造思路如下(经过实现,测试满足条件,当然其他场景可能需要进一步修改),这里面最重要的是必须调用函数 clusterBumpConfigEpochWithoutConsensus 去更新自己的 configEpoch 和 cluster 的 currentEpoch,否则会出现当恢复之后 configEpoch 冲突之后带来的问题(clusterHandleConfigEpochCollision 函数处理这种情况)
4. Share
最近换了新的工作和生活环境。让我对下面的感悟更加的深刻。
人生总是不断地分别和遇见
生活是不公平的,要去适应它
版权声明: 本文为 InfoQ 作者【Geek_wu】的原创文章。
原文链接:【http://xie.infoq.cn/article/787d5775e6466c250229b9b4d】。
本文遵守【CC-BY 4.0】协议,转载请保留原文出处及本版权声明。
评论