写点什么

架构师训练营第 5 周作业

用户头像
TheSRE
关注
发布于: 2020 年 10 月 25 日

作业一


  1. 用你熟悉的编程语言实现一致性 hash 算法。

  2. 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。


生成随机 word 程序,文件名:generate_random_string.py

import stringimport random

def unique_strings(k: int, ntokens: int, pool: str=string.ascii_letters) -> set: """Generate a set of unique string tokens.
k: Length of each token ntokens: Number of tokens pool: Iterable of characters to choose from
For a highly optimized version: https://stackoverflow.com/a/48421303/7954504 """
seen = set()
# An optimization for tightly-bound loops: # Bind these methods outside of a loop join = ''.join add = seen.add
while len(seen) < ntokens: token = join(random.choices(pool, k=k)) add(token) return seen

if __name__ == '__main__': result = unique_strings(5, 1000000) print(result) print(len(result))
复制代码


主程序

# -*- coding: utf-8 -*-import hashlib

class HashRing: def __init__(self, nodes=None, replicas=3): self.replicas = replicas self.ring = dict() self._sorted_keys = []
if nodes: for node in nodes: self.add_node(node)
def add_node(self, node): """ Adds a `node` to the hash ring (including a number of replicas) """ for i in range(self.replicas): virtual_node = f"{node}#{i}" key = self.gen_key(virtual_node) self.ring[key] = node self._sorted_keys.append(key) # print(f"{virtual_node} --> {key} --> {node}")
self._sorted_keys.sort() # print([self.ring[key] for key in self._sorted_keys])
def remove_node(self, node): """ Removes `node` from the hash ring and its replicas """ for i in range(self.replicas): key = self.gen_key(f"{node}#{i}") del self.ring[key] self._sorted_keys.remove(key)
def get_node(self, string_key): """ Given a string key a corresponding node in the hash ring is returned.
If the hash ring is empty, `None` is returned. """ return self.get_node_pos(string_key)[0]
def get_node_pos(self, string_key): """ Given a string key a corresponding node in the hash ring is returned along with it's position in the ring.
If the hash ring is empty, (`None`, `None`) is returned. """ if not self.ring: return None, None
key = self.gen_key(string_key) nodes = self._sorted_keys # print('nodes # is', len(nodes)) for i in range(len(nodes)): node = nodes[i] if key < node: return self.ring[node], i
# 如果key > node,那么让这些key落在第一个node上就形成了闭环 return self.ring[nodes[0]], 0
def gen_key(self, string_key): """ Given a string key it returns a long value, this long value represents a place on the hash ring """ m = hashlib.md5() m.update(string_key.encode('utf-8')) return m.hexdigest()

def consistent_hash(words, servers, replicas): hr = HashRing(servers, replicas)
database = {s: [] for s in servers}
for w in words: database[hr.get_node(w)].append(w)
# print(f"words={len(words)}\n")
host2num = dict() for node, result in database.items(): host2num[node] = len(result) # print(f"{node}={len(result)}") # print(f"result={result}")
return host2num

if __name__ == '__main__': from generate_random_string import unique_strings import json import numpy as np
# content = """In computer science, consistent hashing is a special kind of # hashing such that when a hash table is resized, only K/n keys need to be # remapped on average, where K is the number of keys, and n is the number of # slots. In contrast, in most traditional hash tables, a change in the number # of array slots causes nearly all keys to be remapped because the mapping # between the keys and the slots is defined by a modular operation."""
# word_list = content.split()
# generate unique string set word_list = unique_strings(10, 1000000)
cluster_servers = ['192.168.1.' + str(i + 1) for i in range(10)]
# get every server and the # of data it holds in dictionary flavor. host2num = consistent_hash(word_list, cluster_servers, 150) print(json.dumps(host2num, indent=2))
# calculate np.std arr = [i for i in host2num.values()] print('Standard Deviation:', np.std(arr, ddof=1))
复制代码


结果

{  "192.168.1.1": 110046,  "192.168.1.2": 98939,  "192.168.1.3": 96354,  "192.168.1.4": 90668,  "192.168.1.5": 97363,  "192.168.1.6": 100693,  "192.168.1.7": 92113,  "192.168.1.8": 100883,  "192.168.1.9": 106599,  "192.168.1.10": 106342}Standard Deviation: 6294.738702546225
复制代码


作业二:本周总结

日常工作中没有立即使用上,这是一个劣势。我应该多挤挤时间自个实践巩固下。老师讲的这些架构知识,工作中没有涉及到,组内同事也不会讲到,但这些都是干货!

发布于: 2020 年 10 月 25 日阅读数: 41
用户头像

TheSRE

关注

The SRE. 2019.06.25 加入

A SRE engineer.

评论

发布
暂无评论
架构师训练营第5周作业