架构师训练营第 5 周作业
发布于: 2020 年 10 月 25 日
作业一
- 用你熟悉的编程语言实现一致性 hash 算法。 
- 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。 
生成随机 word 程序,文件名:generate_random_string.py
import stringimport random
def unique_strings(k: int, ntokens: int,               pool: str=string.ascii_letters) -> set:    """Generate a set of unique string tokens.
    k: Length of each token    ntokens: Number of tokens    pool: Iterable of characters to choose from
    For a highly optimized version:    https://stackoverflow.com/a/48421303/7954504    """
    seen = set()
    # An optimization for tightly-bound loops:    # Bind these methods outside of a loop    join = ''.join    add = seen.add
    while len(seen) < ntokens:        token = join(random.choices(pool, k=k))        add(token)    return seen
if __name__ == '__main__':    result = unique_strings(5, 1000000)    print(result)    print(len(result))复制代码
 主程序
# -*- coding: utf-8 -*-import hashlib
class HashRing:    def __init__(self, nodes=None, replicas=3):        self.replicas = replicas        self.ring = dict()        self._sorted_keys = []
        if nodes:            for node in nodes:                self.add_node(node)
    def add_node(self, node):        """        Adds a `node` to the hash ring (including a number of replicas)        """        for i in range(self.replicas):            virtual_node = f"{node}#{i}"            key = self.gen_key(virtual_node)            self.ring[key] = node            self._sorted_keys.append(key)            # print(f"{virtual_node} --> {key} --> {node}")
        self._sorted_keys.sort()        # print([self.ring[key] for key in self._sorted_keys])
    def remove_node(self, node):        """        Removes `node` from the hash ring and its replicas        """        for i in range(self.replicas):            key = self.gen_key(f"{node}#{i}")            del self.ring[key]            self._sorted_keys.remove(key)
    def get_node(self, string_key):        """        Given a string key a corresponding node in the hash ring is returned.
        If the hash ring is empty, `None` is returned.        """        return self.get_node_pos(string_key)[0]
    def get_node_pos(self, string_key):        """        Given a string key a corresponding node in the hash ring is returned        along with it's position in the ring.
        If the hash ring is empty, (`None`, `None`) is returned.        """        if not self.ring:            return None, None
        key = self.gen_key(string_key)        nodes = self._sorted_keys        # print('nodes # is', len(nodes))        for i in range(len(nodes)):            node = nodes[i]            if key < node:                return self.ring[node], i
        # 如果key > node,那么让这些key落在第一个node上就形成了闭环        return self.ring[nodes[0]], 0
    def gen_key(self, string_key):        """        Given a string key it returns a long value, this long value represents        a place on the hash ring        """        m = hashlib.md5()        m.update(string_key.encode('utf-8'))        return m.hexdigest()
def consistent_hash(words, servers, replicas):    hr = HashRing(servers, replicas)
    database = {s: [] for s in servers}
    for w in words:        database[hr.get_node(w)].append(w)
    # print(f"words={len(words)}\n")
    host2num = dict()    for node, result in database.items():        host2num[node] = len(result)        # print(f"{node}={len(result)}")        # print(f"result={result}")
    return host2num
if __name__ == '__main__':    from generate_random_string import unique_strings    import json    import numpy as np
    # content = """In computer science, consistent hashing is a special kind of    # hashing such that when a hash table is resized, only K/n keys need to be    # remapped on average, where K is the number of keys, and n is the number of    # slots. In contrast, in most traditional hash tables, a change in the number    # of array slots causes nearly all keys to be remapped because the mapping    # between the keys and the slots is defined by a modular operation."""
    # word_list = content.split()
    # generate unique string set    word_list = unique_strings(10, 1000000)
    cluster_servers = ['192.168.1.' + str(i + 1) for i in range(10)]
    # get every server and the # of data it holds in dictionary flavor.    host2num = consistent_hash(word_list, cluster_servers, 150)    print(json.dumps(host2num, indent=2))
    # calculate np.std    arr = [i for i in host2num.values()]    print('Standard Deviation:', np.std(arr, ddof=1))
复制代码
 结果
{  "192.168.1.1": 110046,  "192.168.1.2": 98939,  "192.168.1.3": 96354,  "192.168.1.4": 90668,  "192.168.1.5": 97363,  "192.168.1.6": 100693,  "192.168.1.7": 92113,  "192.168.1.8": 100883,  "192.168.1.9": 106599,  "192.168.1.10": 106342}Standard Deviation: 6294.738702546225复制代码
 作业二:本周总结
日常工作中没有立即使用上,这是一个劣势。我应该多挤挤时间自个实践巩固下。老师讲的这些架构知识,工作中没有涉及到,组内同事也不会讲到,但这些都是干货!
划线
评论
复制
发布于: 2020 年 10 月 25 日阅读数: 41
版权声明: 本文为 InfoQ 作者【TheSRE】的原创文章。
原文链接:【http://xie.infoq.cn/article/bc0ed7ca93e68aa726527bbc0】。文章转载请联系作者。

TheSRE
关注
The SRE. 2019.06.25 加入
A SRE engineer.











 
    
评论