架构师训练营第五周 - 作业
发布于: 2020 年 07 月 05 日
1 用你熟悉的编程语言实现一致性 hash 算法
Clojure 代码
(def m (math/expt 2 32))(defn consistent-hash [x number-of-nodes] (math/ceil (/ (mod x m) (int (/ m number-of-nodes)))))
2 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性
源代码:
https://github.com/gzeureka/consistent-hash
(ns consistent-hash.core (:require [clojure.math.numeric-tower :as math] ) )(def m (math/expt 2 32))(defn consistent-hash [x number-of-nodes] (math/ceil (/ (mod x m) (int (/ m number-of-nodes)))))(defn average [num-seq] (/ (apply + num-seq) (count num-seq)))(defn standard-deviation [num-seq] (let [avg (average num-seq)] (->> (map (fn [x] (let [n (math/abs (- x avg))] (* n n) ) ) num-seq) average math/sqrt ) ) )(defn -main [& args] (dotimes [n 10] (let [number-of-nodes 10 times 1000000 hash-values (repeatedly times (fn [] (consistent-hash (long (rand m)) number-of-nodes))) loads (sort (reduce (fn [ret node] (update ret node inc)) (zipmap (range 1 (inc number-of-nodes)) (repeat 0)) hash-values)) ] (println "Test" (inc n)) (println "Server load: " loads) (println "Standard deviation: " (standard-deviation (map second loads))) ) ) )
测试运行结果
consistent-hash on master [!]➜ lein runTest 1Server load: ([1 99935] [2 99672] [3 100057] [4 100364] [5 100368] [6 99835] [7 99740] [8 99855] [9 100260] [10 99914])Standard deviation: 239.5462377078797Test 2Server load: ([1 100346] [2 99479] [3 99432] [4 99978] [5 100362] [6 99822] [7 100190] [8 100146] [9 99852] [10 100393])Standard deviation: 333.28096255261863Test 3Server load: ([1 99890] [2 99895] [3 99926] [4 99823] [5 99989] [6 99659] [7 100480] [8 100141] [9 99821] [10 100376])Standard deviation: 244.95509792613012Test 4Server load: ([1 100170] [2 100411] [3 99720] [4 100021] [5 99680] [6 100136] [7 99722] [8 99967] [9 100201] [10 99972])Standard deviation: 227.40184695819863Test 5Server load: ([1 100093] [2 99987] [3 99735] [4 100110] [5 100498] [6 99870] [7 99784] [8 99398] [9 100470] [10 100055])Standard deviation: 314.48879153318006Test 6Server load: ([1 99654] [2 99906] [3 100346] [4 99983] [5 100660] [6 99370] [7 99595] [8 100176] [9 100805] [10 99505])Standard deviation: 465.7368355627457Test 7Server load: ([1 100073] [2 99934] [3 100243] [4 99690] [5 99684] [6 100088] [7 99906] [8 100358] [9 99759] [10 100265])Standard deviation: 231.89221634198935Test 8Server load: ([1 99933] [2 99388] [3 100096] [4 99626] [5 99927] [6 100204] [7 100054] [8 100202] [9 100186] [10 100384])Standard deviation: 282.99151930755806Test 9Server load: ([1 100429] [2 100245] [3 100177] [4 100295] [5 99943] [6 99687] [7 99213] [8 100452] [9 99910] [10 99649])Standard deviation: 376.6446601241016Test 10Server load: ([1 100118] [2 100691] [3 100076] [4 99777] [5 99853] [6 99581] [7 100174] [8 100247] [9 99562] [10 99921])Standard deviation: 321.47317150891456
理论上10个节点存储 100万 数据,平均每个节点保存 10万 数据。但从实际测试(测试了10组随机数据)结果看,各个节点的负载不是完全均衡的,标准差在几百之间。
当物理节点数比较少时,数据倾斜现象比较明显。在实践中可以通过引入虚拟节点来解决,用一个比较大的虚拟节点数作为 hash 的节点数,再将虚拟节点映射到少量的物理节点。
节点数为 100 时 10 组测试数据的标准差:
consistent-hash on master [!] took 16s➜ lein runTest 1Standard deviation: 95.42861206158246Test 2Standard deviation: 96.88663478519625Test 3Standard deviation: 104.64052752160607Test 4Standard deviation: 106.48258073506672Test 5Standard deviation: 99.03191404794718Test 6Standard deviation: 106.48333202900818Test 7Standard deviation: 106.9067818241668Test 8Standard deviation: 104.08102612868495Test 9Standard deviation: 107.8383976142079Test 10Standard deviation: 95.76659125185567
划线
评论
复制
发布于: 2020 年 07 月 05 日阅读数: 60
Eric
关注
给写代码的人写代码 2017.10.17 加入
Clojure
评论