架构师训练营第五周 - 作业

关注
发布于: 2020 年 07 月 08 日
作业：用你熟悉的编程语言实现一致性 hash 算法。
编写测试用例测试这个算法，测试 100 万 KV 数据，10 个服务器节点的情况下，计算这些 KV 数据在服务器上分布数量的标准差，以评估算法的存储负载不均衡性。
﻿
算法实现及插入测试数据如下：
实例化Cache对象，实例化时传入服务器列表与虚拟节点比例（如100，表示一个节点虚拟出100个）
根据虚拟化比例，对每个节点生成相应的虚拟节点，生成规则如下：在IP后加#和序号，如*#0
对所有节点（真实和虚拟）做sha256散列，生成一个字符串，求字符串hashcode值，这中间，若有相同hashcode值的，去除后来的，只保留第一个加入的。
节点及其虚拟节点对应的服务器是节点代表的服务器
 因为所有节点是近乎均匀的占据0~Integer.MAX_VALUE, 所以可以认为随机的key是均匀的落到服务器上。
随机生成100万个正的Integer，落入代表服务器的容器里
计算容器所含有key的标准差
类关系图如下：
﻿
Cache2类，包含虚拟出的放key的容器数组
package com.zd.cache;
import java.util.*;
import java.util.stream.Collectors;
/**
 * @author zhaodeng
 * @date 2020/07/08 20:00
 */
public class Cache2 {
    private static final int K_V_PAIRS = 1_000_000;
    private int shadeServerCount;
    private Collection<Integer>[] serverCollectors;
    private String[] servers;
    private NodeLocation[] nodeLocations;
    private NodeLocation[] sortedNodeLocations;
    public NodeLocation[] getSortedNodeLocations() {
        return sortedNodeLocations;
    }
    public Collection<Integer>[] getServerCollectors() {
        return serverCollectors;
    }
    public Collection<Integer> get(Integer key) {
        return serverCollectors[sortedNodeLocations[findIndex(key)].getServerIndex()];
    }
    public Integer getIndex(Integer key) {
        return sortedNodeLocations[findIndex(key)].getServerIndex();
    }
    public void set(Integer key) {
        int serverIndex = sortedNodeLocations[findIndex(key)].getServerIndex();
        serverCollectors[serverIndex].add(key);
    }
    private int findIndex(Integer key) {
        int beginIndex = 0;
        int endIndex = sortedNodeLocations.length - 1;
        if (key > sortedNodeLocations[endIndex].getLocation()) {
            return 0;
        }
        while (beginIndex < endIndex) {
            if (sortedNodeLocations[(beginIndex + endIndex) / 2].getLocation() > key) {
                endIndex = (beginIndex + endIndex) / 2;
            } else if (sortedNodeLocations[(beginIndex + endIndex) / 2].getLocation() < key) {
                beginIndex = (beginIndex + endIndex) / 2 + 1;
            } else {
                return (beginIndex + endIndex) / 2;
            }
        }
        return endIndex;
    }
    public Cache2(String[] server, int shadeServerCount) {
        this.shadeServerCount = shadeServerCount;
        this.servers = server;
        int average = K_V_PAIRS / servers.length;
        this.serverCollectors = new Collection[server.length];
        for (int i = 0; i < servers.length; i++) {
            serverCollectors[i] = new ArrayList<>(average);
        }
        hashNodes();
    }
    /**
     * 分散节点
     */
    private void hashNodes() {
        Set<Integer> hashCodeSet = new HashSet<>(servers.length * (1 + shadeServerCount));
        List<NodeLocation> nodeLocationList = new ArrayList<>(servers.length * (1 + shadeServerCount));
        for (int i = 0; i < this.servers.length; i++) {
            String ip = servers[i];
            Integer hashCode = MessageDigestUtil.sha256(ip);
            if (!hashCodeSet.contains(hashCode)) {
                nodeLocationList.add(NodeLocation.builder().location(hashCode).serverIndex(i).build());
                hashCodeSet.add(hashCode);
            }
            for (int j = 0; j < this.shadeServerCount; j++) {
                hashCode = MessageDigestUtil.sha256(ip+"#"+j);
                if (!hashCodeSet.contains(hashCode)) {
                    nodeLocationList.add(NodeLocation.builder().location(hashCode).serverIndex(i).build());
                    hashCodeSet.add(hashCode);
                }
            }
        }
        List<NodeLocation> sortedNodeLocationList = nodeLocationList.stream()
                .sorted(Comparator.comparingInt(NodeLocation::getLocation))
                .collect(Collectors.toList());
        sortedNodeLocations = sortedNodeLocationList.toArray(new NodeLocation[sortedNodeLocationList.size()]);
    }
    public static void main(String[] args) {
        for (int k = 0; k <= 250; k+=10) {
            System.out.print("shadow:" + k + " ");
            compute(k);
        }
    }
    private static void compute(int shadow) {
        int total = 1_000_000;
//        List<Integer> integerList = new ArrayList<>(total);
        Random random = new Random(System.currentTimeMillis());
        String[] server = new String[10];
        for (int i =0; i < 10; i++) {
            server[i] = "192.168.0." + (i + 1);
        }
        Cache2 cache = new Cache2(server, shadow);
        int count = 0;
        while (count++ < total) {
            int num = random.nextInt();
            if (num < 0) {
                count--;
                continue;
            }
            cache.set(num);
//            integerList.add(num);
        }
        long sum = 0;
        for (int i = 0; i < cache.getServerCollectors().length; i++) {
            long gap = cache.getServerCollectors()[i].size() - 100000;
            sum += gap * gap;
        }
        System.out.println("标准差：" + (int)Math.sqrt(sum / 10));
    }
}
NodeLocation类，节点信息，节点在整数环中所占的位置，以及节点所代表的服务器编号
package com.zd.cache;
import lombok.Builder;
import lombok.Data;
/**
 * @author zhaodeng
 * @date 2020/07/06 21:47
 */
@Data
@Builder
public class NodeLocation {
    private int location;
    private int serverIndex;
}
MessageDigestUtil类，给服务器IP信息做hash散列用，对相似的IP可以生成大不相同的字符串，进而得到hashcode值，试了MD5，sha256，sha512，sha1，sha256效果比较好
package com.zd.cache;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
/**
 * @author zhaodeng
 * @date 2020/07/08 20:01
 */
public class MessageDigestUtil {
    public static int sha256(String plainText) {
        byte[] secretBytes;
        try {
            //获取明文字节数组
            secretBytes = MessageDigest.getInstance("SHA-256").digest(plainText.getBytes());
//            secretBytes = MessageDigest.getInstance("MD5").digest(plainText.getBytes());
        }
        catch(NoSuchAlgorithmException e) {
            throw new RuntimeException("No Such Algorithm.");
        }
        int code = new String(secretBytes).hashCode();
        return code > 0 ? code : Math.abs(code+1);
    }
    public static void main(String[] args) {
        String password = "190.168.0.1#0";
        System.out.println(sha256("190.168.0.1#0"));
        System.out.println(sha256("190.168.0.1#1"));
        System.out.println(sha256("190.168.0.1#2"));
        System.out.println(sha256("190.168.0.2#0"));
        System.out.println(sha256("190.168.0.2#1"));
        System.out.println(sha256("190.168.0.2#2"));
    }
}
﻿
最后使用0-250个虚拟节点，跳度为5，得到标准差如下：
shadow:0 标准差：44468
shadow:10 标准差：22388
shadow:20 标准差：18289
shadow:30 标准差：18896
shadow:40 标准差：21324
shadow:50 标准差：12157
shadow:60 标准差：11793
shadow:70 标准差：7396
shadow:80 标准差：8132
shadow:90 标准差：7121
shadow:100 标准差：3324
shadow:110 标准差：4223
shadow:120 标准差：5281
shadow:130 标准差：6633
shadow:140 标准差：6902
shadow:150 标准差：5949
shadow:160 标准差：5307
shadow:170 标准差：4928
shadow:180 标准差：5074
shadow:190 标准差：5171
shadow:200 标准差：5018
shadow:210 标准差：4236
shadow:220 标准差：4485
shadow:230 标准差：4471
shadow:240 标准差：4090
shadow:250 标准差：3939
﻿
从标准差趋势可以看出在虚拟节点比例在100左右时，标准差比较低，表示分布式缓存分布比较均匀。
使用带虚拟节点的一致性hash算法的好处是：新增和删减服务器时，对当前的服务器缓存数据影响较小。
新增时，只会影响新增节点顺时针方向后的节点，且因为节点为均匀分布，且插入的实节点及其虚拟节点数量比较多，则新增节点会比较均匀接收已有节点的缓存key。对已有节点影响程度基本相同，可以缓存的key和已有服务器的数量上也接近。
服务器下线后，服务器缓存的key会在hash环上落入其后节点上，及落入其它服务器上，且数量比较均匀。
综上，可看出，使用虚拟节点一致性hash后，服务器变动，对当前服务器缓存数据影响从理论上来说，接近理论影响最小值。
发布于: 2020 年 07 月 08 日阅读数: 47
草原上的奔跑

关注
喜欢简洁干净的代码 2018.05.04 加入
使用技术，实现业务。思考业务，创新技术。
发布
暂无评论
创作场景

架构师训练营第五周 - 作业

作业：

草原上的奔跑

评论