写点什么

ARTS 打卡第 1 周

作者:Geek_wu
  • 2023-08-20
    广东
  • 本文字数:6666 字

    阅读完需:约 22 分钟

ARTS 打卡第 1 周

1. Algorithm

题目的来源于某次的面试题,题目如下:

  • 给定一个长度为 n 的数组,要求以时间复杂度 O(n)找出数组中比左边大比右边的小的元素。如:输入数组:[1,8,6,9,10,15,12,20],正确输出:[9,10]

int findPivotNum(vector<int>& nums){	if (nums.size() == 0)  		return 0;  	  	if (nums.size() == 1)      cout << nums[0] << endl;  	int len = nums.size();
vector<int> vecMin(len, INT_MAX); /** * 从右往左遍历,获取当前位置的最小值 */ for(int i = len-2; i >= 0; --i) { if (nums[i+1] < vecMin[i+1]) { vecMin[i] = nums[i+1]; continue; } vecMin[i] = vecMin[i+1]; } int l_max = 0; for(int i = 0; i < len; ++i) { if (nums[i] > l_max) { l_max = nums[i]; if (nums[i] < vecMin[i]) cout << nums[i] << endl; } } return 0;}
复制代码


2. Review

由于这段时间需要对 redis cluster 的 failover 进行业务场景的改造,因此借此机会对 redis cluster 关于故障切换章节的描述进行一个包含自己理解的翻译(可能有某处与英文并未完全对应,请谅解)。原文链接:https://redis.io/docs/reference/cluster-spec/中的 Configuration handling, propagation, and failovers。(未完待续)


配置处理、传播和故障切换

Cluster current epoch:

Redis Cluster uses a concept similar to the Raft algorithm "term". In Redis Cluster the term is called epoch instead, and it is used in order to give incremental versioning to events. When multiple nodes provide conflicting information, it becomes possible for another node to understand which state is the most up to date。

Redis Cluster 采用了类似于 Raft 共识协议中的任期(term)的概念。在 Redis Cluster 中用纪元(epoch)代替了任期(term),用来为事件提供递增的版本号。当多个节点提供冲突的信息,另一个节点才有可能知道那个节点的状态是最新的。


The currentEpoch is a 64 bit unsigned number

currentEpoch 是一个 64 位的无符号整型。


At node creation every Redis Cluster node, both replicas and master nodes, set the currentEpoch to 0。

在创建每个 Redis 集群节点(包括主节点和从节点)时,会将当前的 currentEpoch 设置为 0。


Every time a packet is received from another node, if the epoch of the sender (part of the cluster bus messages header) is greater than the local node epoch, the currentEpoch is updated to the sender epoch。

每当从另一个节点接收数据时,如果发送者的纪元(cluster 交互消息头的一部分:clusterMsg)大于本地节点的纪元,那么当前的 currentEpoch 就会更新为发送者的纪元。


Because of these semantics, eventually all the nodes will agree to the greatest currentEpoch in the cluster。

因为这些约定,最终所有的节点将会对集群中最大的 currentEpoch 达成共识。


This information is used when the state of the cluster is changed and a node seeks agreement in order to perform some action。

当集群状态发生变化并且节点寻求达成共识以执行某些操作时,这些信息会被使用。


Currently this happens only during replica promotion, as described in the next section. Basically the epoch is a logical clock for the cluster and dictates that given information wins over one with a smaller epoch。

当前只有在对从节点升级为主节点时才会发生,正如下一章节描述的。基本上,epoch 就类似于集群的逻辑时钟,它规定了 epoch 较大信息优于 epoch 较小的信息。


Configuration epoch

Every master always advertises its configEpoch in ping and pong packets along with a bitmap advertising the set of slots it serves。

每个主节点会在 PING 或者 PONG 的消息中传播自己的 configEpoch 以及它所拥有的一个以 bitmap 形式的槽位集合。


The configEpoch is set to zero in masters when a new node is created

在创建新节点时,主节点的 configEpoch 被设置为零。


A new configEpoch is created during replica election. replicas trying to replace failing masters increment their epoch and try to get authorization from a majority of masters. When a replica is authorized, a new unique configEpoch is created and the replica turns into a master using the new configEpoch

在从节点选举过程中会创建一个新的 configEpoch。试图取代失效主节点的从节点会增加其 epoch 并尝试获取大多数主节点的投票。当一个从节点被获得绝大多数投票时,会创建一个新的唯一 configEpoch,然后从节点会使用这个新的 configEpoch 变成一个主节点。


As explained in the next sections the configEpoch helps to resolve conflicts when different nodes claim divergent configurations (a condition that may happen because of network partitions and node failures)。

正如接下来的章节中所解释的那样,configEpoch 有助于解决不同节点声称具有不一致配置的冲突情况(这种情况可能由于网络分区和节点故障而发生)。


replica nodes also advertise the configEpoch field in ping and pong packets, but in the case of replicas the field represents the configEpoch of its master as of the last time they exchanged packets. This allows other instances to detect when a replica has an old configuration that needs to be updated (master nodes will not grant votes to replicas with an old configuration).

从节点也会在 ping 和 pong 数据包中广播 configEpoch 字段,但在从节点的情况下,该字段表示从节点上次交换数据包时的主节点的 configEpoch。这允许其他实例检测到从节点具有旧的配置需要更新(主节点不会为具有旧配置的从节点进行投票)。


Every time the configEpoch changes for some known node, it is permanently stored in the nodes.conf file by all the nodes that receive this information. The same also happens for the currentEpoch value. These two variables are guaranteed to be saved and fsync-ed to disk when updated before a node continues its operations.

每当某个已知节点的 configEpoch 发生变化时,所有接收到此信息的节点都会将其永久存储在 nodes.conf 文件中。对于 currentEpoch 值也是如此。在节点继续其操作之前,这两个变量在更新时都会被保证保存并通过 fsync 操作同步到磁盘。


The configEpoch values generated using a simple algorithm during failovers are guaranteed to be new, incremental, and unique.

在故障转移期间使用简单算法生成的 configEpoch 值保证是新的、递增的和唯一的。


Replica election and promotion

从节点选举和升级为主节点。


Replica election and promotion is handled by replica nodes, with the help of master nodes that vote for the replica to promote. A replica election happens when a master is in FAIL state from the point of view of at least one of its replicas that has the prerequisites in order to become a master.

从节点的选举和升级是由从节点自己发起的,通过主节点的投票来支持从节点的升级。从节点发起选举的时机是其自己的主节点被至少一个具备升级为主节点前提条件的从节点视为 FAIL 状态。


In order for a replica to promote itself to master, it needs to start an election and win it. All the replicas for a given master can start an election if the master is in FAIL state, however only one replica will win the election and promote itself to master.

为了使从节点升级为主节点,它需要发起一次选举并赢得它。对于给定的主节点,所有从节点都可以在主节点处于 FAIL 状态时启动一次选举,然而只有一个从节点将赢得选举并升级为主节点。


A replica starts an election when the following conditions are met:

当满足以下条件时,从节点会启动一次选举:


  • The replica's master is in FAIL state.

  • 从节点的主节点处于 FAIL 状态。

  • The master was serving a non-zero number of slots.

  • 其主节点拥有需要服务的槽位。

  • The replica replication link was disconnected from the master for no longer than a given amount of time, in order to ensure the promoted replica's data is reasonably fresh. This time is user configurable.

  • 从节点与主节点的复制连接已断开的时间不超过一定时间,以确保升级的从节点的数据相对较新。这个时间可以由用户配置。(配置项 cluster_node_timeout 和 cluster_slave_validity_factor


In order to be elected, the first step for a replica is to increment its currentEpoch counter, and request votes from master instances.

为了被选举,从节点的第一步是递增其 currentEpoch 计数器,并向主节点实例请求投票。


Votes are requested by the replica by broadcasting a FAILOVER_AUTH_REQUEST packet to every master node of the cluster. Then it waits for a maximum time of two times the NODE_TIMEOUT for replies to arrive (but always for at least 2 seconds).

从节点通过向集群中的每个主节点广播一个 FAILOVER_AUTH_REQUEST 数据包来请求投票。然后它等待最多两倍的 NODE_TIMEOUT 时间来等待回复(但始终至少等待 2 秒)。


Once a master has voted for a given replica, replying positively with a FAILOVER_AUTH_ACK, it can no longer vote for another replica of the same master for a period of NODE_TIMEOUT * 2. In this period it will not be able to reply to other authorization requests for the same master. This is not needed to guarantee safety, but useful for preventing multiple replicas from getting elected (even if with a different configEpoch) at around the same time, which is usually not wanted.

一旦主节点对某个从节点投票,回复正常的 FAILOVER_AUTH_ACK 响应后,在接下来的 NODE_TIMEOUT * 2 时间内,它将无法再为同一主节点的其他从节点投票。在此期间,它将无法对其他具有相同主节点的投票请求做出回复。这对安全性的保证可能不是必须的,但对于防止多个从节点在大致相同的时间内被选举为主节点(即使它们的 configEpoch 不同)是有用的,通常情况下是不希望出现这种情况的。


A replica discards any AUTH_ACK replies with an epoch that is less than the currentEpoch at the time the vote request was sent. This ensures it doesn't count votes intended for a previous election.

从节点会丢弃掉任何在投票请求发送时其 epoch 小于当前 currentEpochAUTH_ACK 回复。这确保它不会计算针对先前选举的投票。


Once the replica receives ACKs from the majority of masters, it wins the election. Otherwise if the majority is not reached within the period of two times NODE_TIMEOUT (but always at least 2 seconds), the election is aborted and a new one will be tried again after NODE_TIMEOUT * 4 (and always at least 4 seconds).

一旦从节点收到了大多数主节点的 ACK 回复,它就赢得了选举。否则,如果在 NODE_TIMEOUT * 2 的时间内(但始终至少为 2 秒)未达到大多数,选举将会被中止,并在 NODE_TIMEOUT * 4 的时间后(始终至少为 4 秒)再次尝试新的选举。


Practical example of configuration epoch usefulness during partitions

配置纪元在网络分区期间的实际用途示例如下


This section illustrates how the epoch concept is used to make the replica promotion process more resistant to partitions.

本节阐述了如何使用纪元概念来使从节点升级过程中更好的应对网络分区的场景。


  • A master is no longer reachable indefinitely. The master has three replicas A, B, C.

  • 主节点不可达,主节点有三个从节点 A、B、C

  • Replica A wins the election and is promoted to master.

  • 从节点 A 赢得了选举,并提升为主节点。

  • A network partition makes A not available for the majority of the cluster.

  • A 节点出现的网络分区的使得 A 对于集群中大多数节点不可用。

  • Replica B wins the election and is promoted as master.

  • 从节点 A 赢得了选举,并提升为主节点

  • A partition makes B not available for the majority of the cluster.

  • A 节点出现的网络分区的使得 B 对于集群中大多数节点不可用。

  • The previous partition is fixed, and A is available again.

  • 之前的分区被修复,从节点 A 再次可用。


At this point B is down and A is available again with a role of master (actually UPDATE messages would reconfigure it promptly, but here we assume all UPDATE messages were lost). At the same time, replica C will try to get elected in order to fail over B. This is what happens:

此时,节点 B 处于宕机状态,节点 A 再次可用,并担任主节点角色(实际上,UPDATE 消息会迅速重新配置它,但在这里我们假设所有 UPDATE 消息都丢失)。与此同时,从节点 C 将尝试选举自己,以实现对节点 B 的故障切换。以下是发生的情况:


  1. C will try to get elected and will succeed, since for the majority of masters its master is actually down. It will obtain a new incremental configEpoch.

从节点 C 将尝试发起选举,并且由于对于大多数主节点来说,其主节点实际上已宕机,所以选举将成功。它将获得一个新的增量性的 configEpoch。

2. A will not be able to claim to be the master for its hash slots, because the other nodes already have the same hash slots associated with a higher configuration epoch (the one of B) compared to the one published by A.

节点 A 将无法声称自己是其哈希槽的主节点,因为其他节点已经将相同的哈希槽与更高的配置纪元(即节点 B 的配置纪元)相关联,而不是节点 A 发布的配置纪元。

3. So, all the nodes will upgrade their table to assign the hash slots to C, and the cluster will continue its operations.

因此,所有节点都将更新其哈希槽分布,将哈希槽分配给节点 C,集群将继续正常运行。


As you'll see in the next sections, a stale node rejoining a cluster will usually get notified as soon as possible about the configuration change because as soon as it pings any other node, the receiver will detect it has stale information and will send an UPDATE message.

正如您将在接下来的部分中看到的那样,重新加入集群的陈旧节点通常会尽快收到有关配置更改的通知,因为一旦它与任何其他节点发出 ping,接收方就会检测到它具有陈旧的信息,并将发送一个 UPDATE 消息。

3. Technique/Tips

spurious wakeup

  • 场景:针对 condition 条件变量的一种被错误唤醒的场景

  • 例子

//1. 出现spurious wakeup的场景/*** 当nofity早于wait时,如果此时直接进行wait会导致一直等待而不返回,因此需要判断一次boolean变量的情况**/pthread_mutex_lock();if (queue.empty())    /**    * 可能会收到signal,此时系统调用返回-1,并设置errorno为EINTR    * 条件变量被唤醒,此时便是属于spurious wakeup    **/    pthread_cond_wait();    pthread_mutex_unlock();
//2. 解决spurious wakeup的办法pthread_mutex_lock();while (queue.empty()) /** * 如收到signal,此时系统调用返回-1,并设置errorno为EINTR * 条件变量被唤醒,此时便是属于spurious wakeup * 但会再进行一次boolean的变量判断,如果没有boolean没有设置,说明是一个假唤醒,继续等待 **/ pthread_cond_wait(); pthread_mutex_unlock();
复制代码

4. Share

  • 对于技术上的问题,你遇到的 95%的问题别人都已经遇到过,请擅用搜索引擎,或者现在大热的 AIGC 类相关产品去找到你想要的答案。

  • 技术永远要保持有刨根问底儿的思想,在追求“底儿”的时候,你或许又会打开新的世界。


发布于: 19 小时前阅读数: 26
用户头像

Geek_wu

关注

还未添加个人签名 2019-09-26 加入

还未添加个人简介

评论

发布
暂无评论
ARTS 打卡第 1 周_ARTS 打卡计划_Geek_wu_InfoQ写作社区