架构师训练营第 5 周作业
发布于: 2020 年 10 月 25 日
作业一
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
生成随机 word 程序,文件名:generate_random_string.py
import string
import random
def unique_strings(k: int, ntokens: int,
pool: str=string.ascii_letters) -> set:
"""Generate a set of unique string tokens.
k: Length of each token
ntokens: Number of tokens
pool: Iterable of characters to choose from
For a highly optimized version:
https://stackoverflow.com/a/48421303/7954504
"""
seen = set()
# An optimization for tightly-bound loops:
# Bind these methods outside of a loop
join = ''.join
add = seen.add
while len(seen) < ntokens:
token = join(random.choices(pool, k=k))
add(token)
return seen
if __name__ == '__main__':
result = unique_strings(5, 1000000)
print(result)
print(len(result))
复制代码
主程序
# -*- coding: utf-8 -*-
import hashlib
class HashRing:
def __init__(self, nodes=None, replicas=3):
self.replicas = replicas
self.ring = dict()
self._sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def add_node(self, node):
"""
Adds a `node` to the hash ring (including a number of replicas)
"""
for i in range(self.replicas):
virtual_node = f"{node}#{i}"
key = self.gen_key(virtual_node)
self.ring[key] = node
self._sorted_keys.append(key)
# print(f"{virtual_node} --> {key} --> {node}")
self._sorted_keys.sort()
# print([self.ring[key] for key in self._sorted_keys])
def remove_node(self, node):
"""
Removes `node` from the hash ring and its replicas
"""
for i in range(self.replicas):
key = self.gen_key(f"{node}#{i}")
del self.ring[key]
self._sorted_keys.remove(key)
def get_node(self, string_key):
"""
Given a string key a corresponding node in the hash ring is returned.
If the hash ring is empty, `None` is returned.
"""
return self.get_node_pos(string_key)[0]
def get_node_pos(self, string_key):
"""
Given a string key a corresponding node in the hash ring is returned
along with it's position in the ring.
If the hash ring is empty, (`None`, `None`) is returned.
"""
if not self.ring:
return None, None
key = self.gen_key(string_key)
nodes = self._sorted_keys
# print('nodes # is', len(nodes))
for i in range(len(nodes)):
node = nodes[i]
if key < node:
return self.ring[node], i
# 如果key > node,那么让这些key落在第一个node上就形成了闭环
return self.ring[nodes[0]], 0
def gen_key(self, string_key):
"""
Given a string key it returns a long value, this long value represents
a place on the hash ring
"""
m = hashlib.md5()
m.update(string_key.encode('utf-8'))
return m.hexdigest()
def consistent_hash(words, servers, replicas):
hr = HashRing(servers, replicas)
database = {s: [] for s in servers}
for w in words:
database[hr.get_node(w)].append(w)
# print(f"words={len(words)}\n")
host2num = dict()
for node, result in database.items():
host2num[node] = len(result)
# print(f"{node}={len(result)}")
# print(f"result={result}")
return host2num
if __name__ == '__main__':
from generate_random_string import unique_strings
import json
import numpy as np
# content = """In computer science, consistent hashing is a special kind of
# hashing such that when a hash table is resized, only K/n keys need to be
# remapped on average, where K is the number of keys, and n is the number of
# slots. In contrast, in most traditional hash tables, a change in the number
# of array slots causes nearly all keys to be remapped because the mapping
# between the keys and the slots is defined by a modular operation."""
# word_list = content.split()
# generate unique string set
word_list = unique_strings(10, 1000000)
cluster_servers = ['192.168.1.' + str(i + 1) for i in range(10)]
# get every server and the # of data it holds in dictionary flavor.
host2num = consistent_hash(word_list, cluster_servers, 150)
print(json.dumps(host2num, indent=2))
# calculate np.std
arr = [i for i in host2num.values()]
print('Standard Deviation:', np.std(arr, ddof=1))
复制代码
结果
{
"192.168.1.1": 110046,
"192.168.1.2": 98939,
"192.168.1.3": 96354,
"192.168.1.4": 90668,
"192.168.1.5": 97363,
"192.168.1.6": 100693,
"192.168.1.7": 92113,
"192.168.1.8": 100883,
"192.168.1.9": 106599,
"192.168.1.10": 106342
}
Standard Deviation: 6294.738702546225
复制代码
作业二:本周总结
日常工作中没有立即使用上,这是一个劣势。我应该多挤挤时间自个实践巩固下。老师讲的这些架构知识,工作中没有涉及到,组内同事也不会讲到,但这些都是干货!
划线
评论
复制
发布于: 2020 年 10 月 25 日阅读数: 41
版权声明: 本文为 InfoQ 作者【TheSRE】的原创文章。
原文链接:【http://xie.infoq.cn/article/bc0ed7ca93e68aa726527bbc0】。文章转载请联系作者。
TheSRE
关注
The SRE. 2019.06.25 加入
A SRE engineer.
评论