架构师训练营 - 第五周课后练习
发布于: 2020 年 11 月 25 日
作业:
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
解答:
public class HashUtils { /** * MurMurHash算法,是非加密HASH算法,性能很高, * 比传统的CRC32,MD5,SHA-1(这两个算法都是加密HASH算法,复杂度本身就很高,带来的性能上的损害也不可避免) * 等HASH算法要快很多,而且据说这个算法的碰撞率很低. * http://murmurhash.googlepages.com/ */ public static Long hash(String key) { ByteBuffer buf = ByteBuffer.wrap(key.getBytes()); int seed = 0x1234ABCD; ByteOrder byteOrder = buf.order(); buf.order(ByteOrder.LITTLE_ENDIAN); long m = 0xc6a4a7935bd1e995L; int r = 47; long h = seed ^ (buf.remaining() * m); long k; while (buf.remaining() >= 8) { k = buf.getLong(); k *= m; k ^= k >>> r; k *= m; h ^= k; h *= m; } if (buf.remaining() > 0) { ByteBuffer finish = ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN); finish.put(buf).rewind(); h ^= finish.getLong(); h *= m; } h ^= h >>> r; h *= m; h ^= h >>> r; buf.order(byteOrder); return h; }}
public class Node { private String ip; private Map<String, Object> kvPairs = new HashMap<>(); public Node(String ip) { this.ip = ip; } public void put(String k, Object v) { kvPairs.put(k, v); } public int size() { return kvPairs.size(); } public String getIp() { return ip; } @Override public String toString() { return this.ip; }}
public class ConsistentHash { private TreeMap<Long, String> virtualNodes = new TreeMap<>(); public ConsistentHash(List<Node> nodeList, int virtualCount) { for (Node n : nodeList) { for (int i=0; i<=virtualCount; i++) { long hash = HashUtils.hash(n.getIp() + "#" + i); virtualNodes.put(hash, n.getIp()); } } } public String dispatch(String key) { long hash = HashUtils.hash(key); Map.Entry<Long, String> entry = virtualNodes.ceilingEntry(hash); if (entry == null) { entry = virtualNodes.firstEntry(); } return entry.getValue(); } public void printAllNodesWithIndexes() { for(Iterator<Long> it = virtualNodes.keySet().iterator();it.hasNext();){ Long key = it.next(); System.out.println("hash("+key+")连接到主机->"+virtualNodes.get(key)); } } public static void main(String[] args) { // 10个物理节点 int nodeCount = 10; // 1百万个KV int kvCount = 1000000; // 每个物理节点的初始虚拟节点个数,每轮增加个数,最大虚拟节点个数 int virtualCount = 0; int increaseInterval = 100; int maxVirtualCount = 1000; // 每轮计算次数 int batchCount = 1; for (;virtualCount<=maxVirtualCount;) { double standardDeviation = 0.0; for (int b=0; b<batchCount; b++) { //System.out.println("Batch " + (b+1)); List<Node> nodeList = new ArrayList<>(); for (int i = 1; i <= nodeCount; i++) { nodeList.add(new Node("192.168.1." + i)); } Map<String, Node> nodeMap = nodeList.stream().collect(Collectors.toMap(Node::getIp, node -> node)); ConsistentHash consistentHash = new ConsistentHash(nodeList, virtualCount); //consistentHash.printAllNodesWithIndexes(); for (int i = 0; i < kvCount; i++) { String key = UUID.randomUUID().toString(); String value = key; String ip = consistentHash.dispatch(key); nodeMap.get(ip).put(key, value); } double average = kvCount / nodeCount; double variance = 0; for (Node node : nodeList) { variance += (node.size() - average) * (node.size() - average); } //System.out.println("standardDeviation: " + Math.sqrt(variance / nodeCount)); standardDeviation += Math.sqrt(variance / nodeCount); } System.out.println("当虚拟节点个数为" + virtualCount + "时, 标准差为" + standardDeviation/batchCount); virtualCount += increaseInterval; } }}
运行输出:
当虚拟节点个数为0时, 标准差为63621.32278096707
当虚拟节点个数为100时, 标准差为8755.287499562764
当虚拟节点个数为200时, 标准差为6582.103599913936
当虚拟节点个数为300时, 标准差为6441.640738197063
当虚拟节点个数为400时, 标准差为5688.685911526492
当虚拟节点个数为500时, 标准差为5000.89817932739
当虚拟节点个数为600时, 标准差为4676.9576222155365
当虚拟节点个数为700时, 标准差为3523.9862939574555
当虚拟节点个数为800时, 标准差为3149.0620508335496
当虚拟节点个数为900时, 标准差为2600.461651322703
当虚拟节点个数为1000时, 标准差为2740.45886668638
从趋势来看,虚拟节点越多,标准差越小,分布越均匀。
划线
评论
复制
发布于: 2020 年 11 月 25 日阅读数: 22
joshuamai
关注
还未添加个人签名 2019.05.21 加入
还未添加个人简介
评论