架构师训练营第五周作业 一致性哈希
发布于: 2020 年 07 月 08 日
- 用你熟悉的编程语言实现一致性 hash 算法。 
- 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。 
先写了一个工具类,用于计算标准差:
import java.util.Collection;public class StandardDivision {    public double calculate(Collection<Integer> nums, int average) {        if(nums == null || nums.isEmpty()) throw new IllegalArgumentException();        int n = nums.size();        double variance = 0;        for(int num : nums) {            variance += (num - average) * (num - average);        }        variance /= n;        return Math.sqrt(variance);    }}
主要功能类,构造函数接收两个参数,一个是物理机器数,一个是每个机器创建的虚拟节点数。
generateVirtualNodes() 是主要功能方法,针对每一个物理机器,循环调用生成虚拟节点方法,同时将虚拟节点跟物理节点的关系存在一个Map中。
assignValueToVirtualNode()是用于当接收到一个value后,分配到哪个虚拟节点中,同时将对应的物理节点的映射数据总数加1.
import java.util.*;public class ConsistentHash {    private final Random random = new Random();    private final Map<Integer, Integer> virtualPhysicalMapping = new HashMap<>();//key is virtual node, value is physical node id    private final List<Integer> virtualNodes = new ArrayList<>(); //keep all virtualNodes    private final int machineNumbers;    private final int virtualNodesNumbersForOneMachine;    // key is index of physical machine, value is number of data in this physical machine    Map<Integer, Integer> physicalMachineCountMap = new HashMap<>();    public ConsistentHash(int machineNumbers, int virtualNodesNumbersForOneMachine) {        if(machineNumbers <= 0) throw new IllegalArgumentException("machineNumbers needs to be positive");        if(virtualNodesNumbersForOneMachine <= 0) throw new IllegalArgumentException("virtualNodesNumbersForOneMachine needs to be positive");        this.machineNumbers = machineNumbers;        this.virtualNodesNumbersForOneMachine = virtualNodesNumbersForOneMachine;        generateVirtualNodes();    }    public void assignValueToVirtualNode(int value) {        int index = Collections.binarySearch(virtualNodes, value);        if(index < 0) index = -index - 1; //if index < 0, get the correct insert position        if(index == virtualNodes.size()) index = 0; //reach the end of the hash cycle, use the first virtual node        int physicalMachineId = virtualPhysicalMapping.get(virtualNodes.get(index));        int count = physicalMachineCountMap.getOrDefault(physicalMachineId, 0);        physicalMachineCountMap.put(physicalMachineId, count + 1);    }    public Map<Integer, Integer> getPhysicalMachineCountMap() {        return physicalMachineCountMap;    }    private void generateVirtualNodes() {        for(int i=0; i<machineNumbers; i++) {            List<Integer> virtualNodesForOneMachine = generateVirtualNodesForOneMachine();            for(Integer virtual : virtualNodesForOneMachine) {                virtualPhysicalMapping.put(virtual, i);            }            virtualNodes.addAll(virtualNodesForOneMachine);        }        virtualNodes.sort(Comparator.comparingInt(x -> x));    }    private List<Integer> generateVirtualNodesForOneMachine() {        List<Integer> virtualNodesForOneMachine = new ArrayList<>();        for(int i=0; i<virtualNodesNumbersForOneMachine; i++) {            virtualNodesForOneMachine.add(random.nextInt());        }        return virtualNodesForOneMachine;    }}
最后,就是测试了,简单写了一个main方法用于测试10个机器,100万数据情况下的标准差:
    public static void main(String[] args) {        int numberOfTestData = 1000000;        int numberOfMachine = 10;        int numberOfVirtualNodes = 200;        ConsistentHash consistentHash = new ConsistentHash(numberOfMachine, numberOfVirtualNodes);        Random r = new Random();        for(int i=0; i<numberOfTestData; i++) {            int value = r.nextInt();            consistentHash.assignValueToVirtualNode(value);        }        Map<Integer, Integer> physicalMachineCountMap = consistentHash.getPhysicalMachineCountMap();        StandardDivision standardDivision = new StandardDivision();        double standDivision = standardDivision.calculate(physicalMachineCountMap.values(), numberOfTestData/numberOfMachine);        System.out.println("Standard Division is: " + standDivision);    }
经过10次左右的测试,输出的数值均为6900上下。
划线
评论
复制
发布于: 2020 年 07 月 08 日阅读数: 52
版权声明: 本文为 InfoQ 作者【sunnywhy】的原创文章。
原文链接:【http://xie.infoq.cn/article/735aefcd86f51776bec7c64f1】。文章转载请联系作者。

sunnywhy
关注
还未添加个人签名 2019.04.25 加入
还未添加个人简介











 
    
评论