Week 5 作业
发布于: 2020 年 07 月 07 日
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
java 实现的一致性 hash 算法:
import java.util.*;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
int nodeCount = 10;
int dataSize = 1_000_000;
Map<String, Integer> data = new HashMap<>();
for (int j = 0; j < dataSize; j++) {
data.put("key-" + Math.random(), j);
}
int base = 500;
for (int i = 10; i <= 1000; i += 5) {
HashRing<String, Integer> hashRing = new HashRing<>(nodeCount, i);
data.forEach(hashRing::put);
int sd = (int) hashRing.calcStandardDeviation();
System.out.printf("%4d => %6d ", i, sd);
int count = sd / base;
for (int j = 0; j < count; j++) {
System.out.print(".");
}
System.out.println();
}
}
}
class HashRing<K, V> {
private static final Random random = new Random();
private int nodeCount;
private int virtualNodeSize;
private TreeMap<Integer, DataNode<K, V>> hashRingMap;
private List<DataNode<K, V>> dataNodes;
public HashRing(int nodeCount, int virtualNodeSize) {
this.nodeCount = nodeCount;
this.virtualNodeSize = virtualNodeSize;
init();
}
private void init() {
hashRingMap = new TreeMap<>();
dataNodes = new ArrayList<>(nodeCount);
for (int i = 0; i < nodeCount; i++) {
addDataNode();
}
}
public void addDataNode() {
DataNode<K, V> dataNode = new DataNode<>();
dataNodes.add(dataNode);
for (int j = 0; j < virtualNodeSize; j++) {
hashRingMap.put(random.nextInt(), dataNode);
}
}
public void removeDataNode(int i){
DataNode<K, V> kvDataNode = dataNodes.get(i);
if (kvDataNode == null){
throw new IllegalArgumentException();
}
dataNodes.remove(i);
hashRingMap.entrySet().removeIf(integerDataNodeEntry -> integerDataNodeEntry.getValue() == kvDataNode);
}
public V get(K k) {
return getDataNode(k).get(k);
}
public V put(K k, V v) {
return getDataNode(k).put(k, v);
}
private DataNode<K, V> getDataNode(K k) {
assert k != null;
return Optional.ofNullable(hashRingMap.ceilingEntry(k.hashCode())).orElse(hashRingMap.firstEntry()).getValue();
}
public double calcStandardDeviation() {
double average = dataNodes.stream().map(DataNode::size).collect(Collectors.summarizingInt(Integer::intValue)).getAverage();
return Math.sqrt(dataNodes.stream().map(DataNode::size).map(size -> Math.pow(size - average, 2)).reduce(0D, Double::sum) / dataNodes.size());
}
private static class DataNode<K, V> extends HashMap<K, V> {
}
}
复制代码
总数据 100 万 KV、10 个数据节点不变,虚拟节点个数从 10 逐渐增加到 1000 步进值为 5,记录标准差的变化,多次运行结果可能不一样,其中之一如下所示,标准差的变化规律为:
10 ~ 150 之间快速下降
150 ~ 400 之间缓慢下降
400 之后随机波动不再下降
因此虚拟节点 400 以上为佳,200 左右亦可,少于 150 则会有较大的数据倾斜。
10 => 30542 .............................................................
15 => 25151 ..................................................
20 => 19787 .......................................
25 => 15100 ..............................
30 => 14915 .............................
35 => 10430 ....................
40 => 10432 ....................
45 => 12076 ........................
50 => 14701 .............................
55 => 12178 ........................
60 => 8165 ................
65 => 13423 ..........................
70 => 10831 .....................
75 => 11920 .......................
80 => 13061 ..........................
85 => 8093 ................
90 => 14513 .............................
95 => 8206 ................
100 => 10173 ....................
105 => 9210 ..................
110 => 9260 ..................
115 => 9755 ...................
120 => 6674 .............
125 => 7931 ...............
130 => 7544 ...............
135 => 7299 ..............
140 => 7296 ..............
145 => 8821 .................
150 => 7742 ...............
155 => 3395 ......
160 => 7306 ..............
165 => 8690 .................
170 => 8565 .................
175 => 7199 ..............
180 => 9408 ..................
185 => 6208 ............
190 => 7257 ..............
195 => 4889 .........
200 => 4636 .........
205 => 8274 ................
210 => 9178 ..................
215 => 7881 ...............
220 => 5861 ...........
225 => 5108 ..........
230 => 8181 ................
235 => 6631 .............
240 => 3819 .......
245 => 6797 .............
250 => 6104 ............
255 => 7939 ...............
260 => 4585 .........
265 => 8550 .................
270 => 5161 ..........
275 => 6196 ............
280 => 3424 ......
285 => 5403 ..........
290 => 3812 .......
295 => 4287 ........
300 => 6187 ............
305 => 5059 ..........
310 => 3897 .......
315 => 5463 ..........
320 => 8230 ................
325 => 4347 ........
330 => 5951 ...........
335 => 4779 .........
340 => 4265 ........
345 => 5768 ...........
350 => 4798 .........
355 => 4406 ........
360 => 3672 .......
365 => 3955 .......
370 => 5607 ...........
375 => 2656 .....
380 => 5713 ...........
385 => 5396 ..........
390 => 5068 ..........
395 => 5720 ...........
400 => 2626 .....
405 => 4581 .........
410 => 6259 ............
415 => 2669 .....
420 => 4049 ........
425 => 3796 .......
430 => 4607 .........
435 => 3564 .......
440 => 2947 .....
445 => 2971 .....
450 => 2459 ....
455 => 2481 ....
460 => 3773 .......
465 => 3989 .......
470 => 6942 .............
475 => 2898 .....
480 => 4617 .........
485 => 6044 ............
490 => 4217 ........
495 => 3374 ......
500 => 5312 ..........
505 => 3115 ......
510 => 4212 ........
515 => 5001 ..........
520 => 3437 ......
525 => 4463 ........
530 => 4394 ........
535 => 5161 ..........
540 => 3811 .......
545 => 2919 .....
550 => 2903 .....
555 => 4181 ........
560 => 3842 .......
565 => 2985 .....
570 => 4257 ........
575 => 2751 .....
580 => 2616 .....
585 => 2275 ....
590 => 3788 .......
595 => 3884 .......
600 => 4610 .........
605 => 2767 .....
610 => 4245 ........
615 => 4664 .........
620 => 4200 ........
625 => 2573 .....
630 => 2156 ....
635 => 3063 ......
640 => 3278 ......
645 => 3342 ......
650 => 3185 ......
655 => 3914 .......
660 => 4025 ........
665 => 3744 .......
670 => 4392 ........
675 => 2866 .....
680 => 3864 .......
685 => 4608 .........
690 => 3334 ......
695 => 3203 ......
700 => 3085 ......
705 => 2755 .....
710 => 4331 ........
715 => 3261 ......
720 => 4452 ........
725 => 3255 ......
730 => 3491 ......
735 => 3768 .......
740 => 4017 ........
745 => 3301 ......
750 => 2200 ....
755 => 4483 ........
760 => 3757 .......
765 => 3059 ......
770 => 4131 ........
775 => 3124 ......
780 => 2648 .....
785 => 3940 .......
790 => 2706 .....
795 => 3488 ......
800 => 3849 .......
805 => 2795 .....
810 => 2568 .....
815 => 2844 .....
820 => 2422 ....
825 => 3518 .......
830 => 2920 .....
835 => 3418 ......
840 => 3818 .......
845 => 3683 .......
850 => 3252 ......
855 => 3200 ......
860 => 3517 .......
865 => 3194 ......
870 => 3241 ......
875 => 2654 .....
880 => 4007 ........
885 => 3835 .......
890 => 2670 .....
895 => 2799 .....
900 => 3591 .......
905 => 2980 .....
910 => 3484 ......
915 => 2956 .....
920 => 2890 .....
925 => 3760 .......
930 => 1691 ...
935 => 2195 ....
940 => 2718 .....
945 => 2243 ....
950 => 1779 ...
955 => 1856 ...
960 => 2641 .....
965 => 3052 ......
970 => 3334 ......
975 => 3005 ......
980 => 2895 .....
985 => 1887 ...
990 => 2617 .....
995 => 2872 .....
1000 => 2608 .....
复制代码
划线
评论
复制
发布于: 2020 年 07 月 07 日阅读数: 60
版权声明: 本文为 InfoQ 作者【Shawn】的原创文章。
原文链接:【http://xie.infoq.cn/article/8261b2ae097373256bf17ffc8】。未经作者许可,禁止转载。
Shawn
关注
还未添加个人签名 2017.10.23 加入
还未添加个人简介
评论