写点什么

架构师训练营第五周 - 作业

用户头像
Eric
关注
发布于: 2020 年 07 月 05 日
架构师训练营第五周 - 作业

1 用你熟悉的编程语言实现一致性 hash 算法

Clojure 代码
(def m (math/expt 2 32))

(defn consistent-hash [x number-of-nodes]
(math/ceil (/ (mod x m) (int (/ m number-of-nodes)))))



2 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性



源代码:

https://github.com/gzeureka/consistent-hash

(ns consistent-hash.core
(:require [clojure.math.numeric-tower :as math]
)
)

(def m (math/expt 2 32))

(defn consistent-hash [x number-of-nodes]
(math/ceil (/ (mod x m) (int (/ m number-of-nodes)))))

(defn average [num-seq]
(/ (apply + num-seq) (count num-seq)))

(defn standard-deviation [num-seq]
(let [avg (average num-seq)]
(->> (map (fn [x]
(let [n (math/abs (- x avg))]
(* n n)
)
) num-seq)
average
math/sqrt
)
)
)

(defn -main [& args]
(dotimes [n 10]
(let [number-of-nodes 10
times 1000000
hash-values (repeatedly times (fn [] (consistent-hash (long (rand m)) number-of-nodes)))
loads (sort (reduce (fn [ret node] (update ret node inc)) (zipmap (range 1 (inc number-of-nodes)) (repeat 0)) hash-values))
]
(println "Test" (inc n))
(println "Server load: " loads)
(println "Standard deviation: " (standard-deviation (map second loads)))
)
)
)




测试运行结果
consistent-hash on  master [!]
➜ lein run
Test 1
Server load: ([1 99935] [2 99672] [3 100057] [4 100364] [5 100368] [6 99835] [7 99740] [8 99855] [9 100260] [10 99914])
Standard deviation: 239.5462377078797
Test 2
Server load: ([1 100346] [2 99479] [3 99432] [4 99978] [5 100362] [6 99822] [7 100190] [8 100146] [9 99852] [10 100393])
Standard deviation: 333.28096255261863
Test 3
Server load: ([1 99890] [2 99895] [3 99926] [4 99823] [5 99989] [6 99659] [7 100480] [8 100141] [9 99821] [10 100376])
Standard deviation: 244.95509792613012
Test 4
Server load: ([1 100170] [2 100411] [3 99720] [4 100021] [5 99680] [6 100136] [7 99722] [8 99967] [9 100201] [10 99972])
Standard deviation: 227.40184695819863
Test 5
Server load: ([1 100093] [2 99987] [3 99735] [4 100110] [5 100498] [6 99870] [7 99784] [8 99398] [9 100470] [10 100055])
Standard deviation: 314.48879153318006
Test 6
Server load: ([1 99654] [2 99906] [3 100346] [4 99983] [5 100660] [6 99370] [7 99595] [8 100176] [9 100805] [10 99505])
Standard deviation: 465.7368355627457
Test 7
Server load: ([1 100073] [2 99934] [3 100243] [4 99690] [5 99684] [6 100088] [7 99906] [8 100358] [9 99759] [10 100265])
Standard deviation: 231.89221634198935
Test 8
Server load: ([1 99933] [2 99388] [3 100096] [4 99626] [5 99927] [6 100204] [7 100054] [8 100202] [9 100186] [10 100384])
Standard deviation: 282.99151930755806
Test 9
Server load: ([1 100429] [2 100245] [3 100177] [4 100295] [5 99943] [6 99687] [7 99213] [8 100452] [9 99910] [10 99649])
Standard deviation: 376.6446601241016
Test 10
Server load: ([1 100118] [2 100691] [3 100076] [4 99777] [5 99853] [6 99581] [7 100174] [8 100247] [9 99562] [10 99921])
Standard deviation: 321.47317150891456



理论上10个节点存储 100万 数据,平均每个节点保存 10万 数据。但从实际测试(测试了10组随机数据)结果看,各个节点的负载不是完全均衡的,标准差在几百之间。



当物理节点数比较少时,数据倾斜现象比较明显。在实践中可以通过引入虚拟节点来解决,用一个比较大的虚拟节点数作为 hash 的节点数,再将虚拟节点映射到少量的物理节点。



节点数为 100 时 10 组测试数据的标准差:

consistent-hash on  master [!] took 16s
➜ lein run
Test 1
Standard deviation: 95.42861206158246
Test 2
Standard deviation: 96.88663478519625
Test 3
Standard deviation: 104.64052752160607
Test 4
Standard deviation: 106.48258073506672
Test 5
Standard deviation: 99.03191404794718
Test 6
Standard deviation: 106.48333202900818
Test 7
Standard deviation: 106.9067818241668
Test 8
Standard deviation: 104.08102612868495
Test 9
Standard deviation: 107.8383976142079
Test 10
Standard deviation: 95.76659125185567



用户头像

Eric

关注

给写代码的人写代码 2017.10.17 加入

Clojure

评论

发布
暂无评论
架构师训练营第五周 - 作业