架构训练营第五周作业

关注
发布于: 2020 年 07 月 08 日
﻿
用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法，测试 100 万 KV 数据，10 个服务器节点的情况下，计算这些 KV 数据在服务器上分布数量的标准差，以评估算法的存储负载不均衡性。
Hash算法设计定义Hash算法接口
public interface HashAlgrth {
    int  getHashCode(String key);
}
查阅了网上资料，采用FNV132算法，生成32位Hash Code
public class FNVHashAlgrth implements HashAlgrth {
    private final static int PRIME = 16777619;
    private final static long OFFSET_BASIC = 2166136261L;
    @Override
    public int getHashCode(String data) {
        int hashCode = (int) OFFSET_BASIC;
        for (int i = 0; i < data.length(); i++) {
            hashCode = (hashCode ^ data.charAt(i)) * PRIME;
        }
        hashCode += hashCode << 13;
        hashCode ^= hashCode >> 7;
        hashCode += hashCode << 3;
        hashCode ^= hashCode >> 17;
        hashCode += hashCode << 5;
        if (hashCode < 0) {
            hashCode = Math.abs(hashCode);
        }
        return hashCode;
    }
}
﻿
定义节点类Node，由于要记录节点有多少个对象落在该节点，定义了countNum栏位记录对象数
public class Node {
    private String IpAdress;
    //用于记录当前服务器记录的对象数
    private int countNum = 0;
    public String getIpAdress() {
        return IpAdress;
    }
    public void setIpAdress(String ipAdress) {
        IpAdress = ipAdress;
    }
    public int getCountNum() {
        return countNum;
    }
    public void setCountNum(int countNum) {
        this.countNum = countNum;
    }
    public void addOne() {
        this.countNum += 1;
    }
    //虚拟IP形式为"XXX.XXX.XXX.XXX{INT}"
    public String createVirtualIP(int replicaNumber) {
        return IpAdress +  Integer.toString(replicaNumber);
    }
    public Node(String ipAdress) {
        IpAdress = ipAdress;
    }
}
﻿
按照一致性Hash的算法定义实现。算法关键是如何设计一个数据结构实现哈希环。一开始想采用循环链表来实现，但如果记录比较稀疏，发现查询效率不高，算法复杂度是O(N)。参考了网上资料，采用了排序树。Java的排序树实现TreeMap是红黑树，查找时间复杂度是O(logN)。
public class ConsistentHash {
    //虚拟节点数
    private int replicasCount;
    private HashAlgrth hashAlgrth = new FNVHashAlgrth();
    private SortedMap<Integer, Node> hashCircle = new TreeMap<> ();
    public ConsistentHash(int replicasCount, Collection<Node> nodes){
        this.replicasCount = replicasCount;
        nodes.forEach( n -> addNode(n));
    }
    public void addNewNode(Node node) {
        if(node == null)
            throw new IllegalArgumentException("Node must be not null");
        if(hashCircle.containsKey(hashAlgrth.getHashCode(node.createVirtualIP(0))))
            throw new IllegalArgumentException("Node does exist!");
        addNode(node);
    }
    private void addNode(Node node) {
        for(int i = 0; i < replicasCount; i++) {
            hashCircle.put(hashAlgrth.getHashCode(node.createVirtualIP(i)), node);
        }
    }
    public Node getNode(String keyData) {
        if(hashCircle.isEmpty())
            return null;
        int hashCode = hashAlgrth.getHashCode(keyData);
        if(!hashCircle.containsKey(hashCode)){
            SortedMap<Integer, Node> tailMap = hashCircle.tailMap(hashCode);
            hashCode = tailMap.isEmpty() ? hashCircle.firstKey() : tailMap.firstKey();
        }
        return hashCircle.get(hashCode);
    }
}
﻿
整个作业的实现: 采用10台IP地址为192.168.0.1至10的机器，查询Key采用Java自带的UUID生成，计算标准差
public class ConsistentHashDemo {
    private static final String[] nodesIP = new String[] {
            "192.168.0.1",
            "192.168.0.2",
            "192.168.0.3",
            "192.168.0.4",
            "192.168.0.5",
            "192.168.0.6",
            "192.168.0.7",
            "192.168.0.8",
            "192.168.0.9",
            "192.168.0.10"
    };
    private static final long requestCount = 1000000L;
    private static final int replicasCount = 50;
    public static void main(String[] args) {
        List<Node> nodes = Stream.of(nodesIP).map(s -> new Node(s)).collect(Collectors.toList());
        ConsistentHash consistentHash = new ConsistentHash(replicasCount, nodes);
        for(long i = 0; i < requestCount; i++){
            consistentHash.getNode(UUID.randomUUID().toString()).addOne();
        }
        long average = requestCount / nodesIP.length;
        double sum = 0;
        for(Node node : nodes){
            System.out.println(node.getIpAdress() + ": " + node.getCountNum());
            sum += (average - node.getCountNum()) * (average - node.getCountNum());
        }
        System.out.println(Math.sqrt(sum / (nodes.size())));
    }
}
﻿
在10000000KV数据下，50个虚拟节点标准差约为15000，100个虚拟节点约为7000，150约为5600，200约为4600
﻿
代码见：https://github.com/Albertsirius/consistenthashingdemo
﻿
发布于: 2020 年 07 月 08 日阅读数: 52
子豪sirius

关注
还未添加个人签名 2018.05.03 加入
还未添加个人简介
发布
暂无评论
创作场景

架构训练营第五周作业

Hash算法设计

子豪sirius

评论