1
架构师训练营第二期 Week 5 作业
发布于: 2020 年 11 月 22 日
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
require 'digest'require 'forwardable'class ConsistentHashing CIRCLE_LENGTH = 2 ** 32 def self.hash(string) Digest::MD5.hexdigest(string).to_i(16) % (CIRCLE_LENGTH) endendclass Node extend Forwardable attr_reader :name def_delegator :data, :fetch def_delegator :data, :store def_delegator :data, :delete def initialize(name) @name = name @data = {} end def count data.keys.size end private attr_reader :dataendclass VirtualNode include Comparable extend Forwardable attr_reader :key, :name, :node def_delegator :node, :fetch def_delegator :node, :store def_delegator :node, :delete def initialize(name, node) @name = name @node = node @key = ConsistentHashing.hash(name) end def <=>(other) key <=> other.key endendclass DistributedHash WEIGHT = 150 attr_reader :nodes, :virtual_nodes, :weight def initialize(nodes = [], weight = WEIGHT) @virtual_nodes = [] @nodes = [] @weight = weight nodes.each { |node| add_node(node) } end def add_node(node) weight.times do |i| virtual_nodes << VirtualNode.new("N_#{node.name}_#{i}", node) end virtual_nodes.sort! nodes << node end def remove_node(node_name) virtual_nodes.reject! { |vn| vn.node.name == node_name } nodes.reject! { |n| n.name == node_name } end def store(key, value) vn = find_nearest_virtual_node(key) vn.store(key, value) end def fetch(key) vn = find_nearest_virtual_node(key) vn.fetch(key) end def delete(key) vn = find_nearest_virtual_node(key) vn.delete(key) end private def find_nearest_virtual_node(key) hashed_key = ConsistentHashing.hash(key) virtual_nodes.bsearch { |vn| vn.key >= hashed_key } || virtual_nodes[0] endenddef variance(values) m = values.sum.to_f / values.size sum = 0.0 values.each { |v| sum += (v-m) ** 2 } sum / values.sizeenddef standard_deviation(values) Math.sqrt(variance(values))endds = DistributedHash.new( [ Node.new('s1'), Node.new('s2'), Node.new('s3'), Node.new('s4'), Node.new('s5'), Node.new('s6'), Node.new('s7'), Node.new('s8'), Node.new('s9'), Node.new('s10'), ])1_000_000.times do |i| ds.store(i.to_s, i)endnode_counts = ds.nodes.map(&:count)puts "Standard deviation for 150 virtual nodes: #{standard_deviation(node_counts)}"ds = DistributedHash.new( [ Node.new('s1'), Node.new('s2'), Node.new('s3'), Node.new('s4'), Node.new('s5'), Node.new('s6'), Node.new('s7'), Node.new('s8'), Node.new('s9'), Node.new('s10'), ], 200)1_000_000.times do |i| ds.store(i.to_s, i)endnode_counts = ds.nodes.map(&:count)puts "Standard deviation for 200 virtual nodes: #{standard_deviation(node_counts)}"
标准差结果:
$ ruby consistent_hashing.rbStandard deviation for 150 virtual nodes: 5911.8596228259685Standard deviation for 200 virtual nodes: 3909.191476507642
基本上我测了从100-500个虚拟节点,这个用例里面200的时候标准差最小,但并不意味着别的用例也这样,毕竟不同节点和键值经过hash后分布不可知。
划线
评论
复制
发布于: 2020 年 11 月 22 日阅读数: 20
bigxiang
关注
还未添加个人签名 2018.03.21 加入
还未添加个人简介
评论 (1 条评论)