架构训练营第五周作业
发布于: 2020 年 07 月 08 日
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。
Hash算法设计
定义Hash算法接口
public interface HashAlgrth { int getHashCode(String key);}
查阅了网上资料,采用FNV132算法,生成32位Hash Code
public class FNVHashAlgrth implements HashAlgrth { private final static int PRIME = 16777619; private final static long OFFSET_BASIC = 2166136261L; @Override public int getHashCode(String data) { int hashCode = (int) OFFSET_BASIC; for (int i = 0; i < data.length(); i++) { hashCode = (hashCode ^ data.charAt(i)) * PRIME; } hashCode += hashCode << 13; hashCode ^= hashCode >> 7; hashCode += hashCode << 3; hashCode ^= hashCode >> 17; hashCode += hashCode << 5; if (hashCode < 0) { hashCode = Math.abs(hashCode); } return hashCode; }}
定义节点类Node,由于要记录节点有多少个对象落在该节点,定义了countNum栏位记录对象数
public class Node { private String IpAdress; //用于记录当前服务器记录的对象数 private int countNum = 0; public String getIpAdress() { return IpAdress; } public void setIpAdress(String ipAdress) { IpAdress = ipAdress; } public int getCountNum() { return countNum; } public void setCountNum(int countNum) { this.countNum = countNum; } public void addOne() { this.countNum += 1; } //虚拟IP形式为"XXX.XXX.XXX.XXX{INT}" public String createVirtualIP(int replicaNumber) { return IpAdress + Integer.toString(replicaNumber); } public Node(String ipAdress) { IpAdress = ipAdress; }}
按照一致性Hash的算法定义实现。算法关键是如何设计一个数据结构实现哈希环。一开始想采用循环链表来实现,但如果记录比较稀疏,发现查询效率不高,算法复杂度是O(N)。参考了网上资料,采用了排序树。Java的排序树实现TreeMap是红黑树,查找时间复杂度是O(logN)。
public class ConsistentHash { //虚拟节点数 private int replicasCount; private HashAlgrth hashAlgrth = new FNVHashAlgrth(); private SortedMap<Integer, Node> hashCircle = new TreeMap<> (); public ConsistentHash(int replicasCount, Collection<Node> nodes){ this.replicasCount = replicasCount; nodes.forEach( n -> addNode(n)); } public void addNewNode(Node node) { if(node == null) throw new IllegalArgumentException("Node must be not null"); if(hashCircle.containsKey(hashAlgrth.getHashCode(node.createVirtualIP(0)))) throw new IllegalArgumentException("Node does exist!"); addNode(node); } private void addNode(Node node) { for(int i = 0; i < replicasCount; i++) { hashCircle.put(hashAlgrth.getHashCode(node.createVirtualIP(i)), node); } } public Node getNode(String keyData) { if(hashCircle.isEmpty()) return null; int hashCode = hashAlgrth.getHashCode(keyData); if(!hashCircle.containsKey(hashCode)){ SortedMap<Integer, Node> tailMap = hashCircle.tailMap(hashCode); hashCode = tailMap.isEmpty() ? hashCircle.firstKey() : tailMap.firstKey(); } return hashCircle.get(hashCode); }}
整个作业的实现: 采用10台IP地址为192.168.0.1至10的机器,查询Key采用Java自带的UUID生成,计算标准差
public class ConsistentHashDemo { private static final String[] nodesIP = new String[] { "192.168.0.1", "192.168.0.2", "192.168.0.3", "192.168.0.4", "192.168.0.5", "192.168.0.6", "192.168.0.7", "192.168.0.8", "192.168.0.9", "192.168.0.10" }; private static final long requestCount = 1000000L; private static final int replicasCount = 50; public static void main(String[] args) { List<Node> nodes = Stream.of(nodesIP).map(s -> new Node(s)).collect(Collectors.toList()); ConsistentHash consistentHash = new ConsistentHash(replicasCount, nodes); for(long i = 0; i < requestCount; i++){ consistentHash.getNode(UUID.randomUUID().toString()).addOne(); } long average = requestCount / nodesIP.length; double sum = 0; for(Node node : nodes){ System.out.println(node.getIpAdress() + ": " + node.getCountNum()); sum += (average - node.getCountNum()) * (average - node.getCountNum()); } System.out.println(Math.sqrt(sum / (nodes.size()))); }}
在10000000KV数据下,50个虚拟节点标准差约为15000,100个虚拟节点约为7000,150约为5600,200约为4600
代码见:https://github.com/Albertsirius/consistenthashingdemo
划线
评论
复制
发布于: 2020 年 07 月 08 日阅读数: 52
子豪sirius
关注
还未添加个人签名 2018.05.03 加入
还未添加个人简介
评论