写点什么

架构师训练营 - 第五周课后练习

用户头像
joshuamai
关注
发布于: 2020 年 11 月 25 日

作业:

  1. 用你熟悉的编程语言实现一致性 hash 算法。

  2. 编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。



解答:

public class HashUtils {
/**
* MurMurHash算法,是非加密HASH算法,性能很高,
* 比传统的CRC32,MD5,SHA-1(这两个算法都是加密HASH算法,复杂度本身就很高,带来的性能上的损害也不可避免)
* 等HASH算法要快很多,而且据说这个算法的碰撞率很低.
* http://murmurhash.googlepages.com/
*/
public static Long hash(String key) {
ByteBuffer buf = ByteBuffer.wrap(key.getBytes());
int seed = 0x1234ABCD;
ByteOrder byteOrder = buf.order();
buf.order(ByteOrder.LITTLE_ENDIAN);
long m = 0xc6a4a7935bd1e995L;
int r = 47;
long h = seed ^ (buf.remaining() * m);
long k;
while (buf.remaining() >= 8) {
k = buf.getLong();
k *= m;
k ^= k >>> r;
k *= m;
h ^= k;
h *= m;
}
if (buf.remaining() > 0) {
ByteBuffer finish = ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN);
finish.put(buf).rewind();
h ^= finish.getLong();
h *= m;
}
h ^= h >>> r;
h *= m;
h ^= h >>> r;
buf.order(byteOrder);
return h;
}
}



public class Node {
private String ip;
private Map<String, Object> kvPairs = new HashMap<>();

public Node(String ip) {
this.ip = ip;
}

public void put(String k, Object v) {
kvPairs.put(k, v);
}

public int size() {
return kvPairs.size();
}

public String getIp() {
return ip;
}

@Override
public String toString() {
return this.ip;
}
}



public class ConsistentHash {

private TreeMap<Long, String> virtualNodes = new TreeMap<>();

public ConsistentHash(List<Node> nodeList, int virtualCount) {
for (Node n : nodeList) {
for (int i=0; i<=virtualCount; i++) {
long hash = HashUtils.hash(n.getIp() + "#" + i);
virtualNodes.put(hash, n.getIp());
}
}
}

public String dispatch(String key) {
long hash = HashUtils.hash(key);
Map.Entry<Long, String> entry = virtualNodes.ceilingEntry(hash);
if (entry == null) {
entry = virtualNodes.firstEntry();
}
return entry.getValue();
}

public void printAllNodesWithIndexes() {
for(Iterator<Long> it = virtualNodes.keySet().iterator();it.hasNext();){
Long key = it.next();
System.out.println("hash("+key+")连接到主机->"+virtualNodes.get(key));
}
}

public static void main(String[] args) {
// 10个物理节点
int nodeCount = 10;
// 1百万个KV
int kvCount = 1000000;
// 每个物理节点的初始虚拟节点个数,每轮增加个数,最大虚拟节点个数
int virtualCount = 0;
int increaseInterval = 100;
int maxVirtualCount = 1000;
// 每轮计算次数
int batchCount = 1;
for (;virtualCount<=maxVirtualCount;) {
double standardDeviation = 0.0;
for (int b=0; b<batchCount; b++) {
//System.out.println("Batch " + (b+1));
List<Node> nodeList = new ArrayList<>();

for (int i = 1; i <= nodeCount; i++) {
nodeList.add(new Node("192.168.1." + i));
}
Map<String, Node> nodeMap = nodeList.stream().collect(Collectors.toMap(Node::getIp, node -> node));

ConsistentHash consistentHash = new ConsistentHash(nodeList, virtualCount);
//consistentHash.printAllNodesWithIndexes();
for (int i = 0; i < kvCount; i++) {
String key = UUID.randomUUID().toString();
String value = key;
String ip = consistentHash.dispatch(key);
nodeMap.get(ip).put(key, value);
}

double average = kvCount / nodeCount;
double variance = 0;
for (Node node : nodeList) {
variance += (node.size() - average) * (node.size() - average);
}
//System.out.println("standardDeviation: " + Math.sqrt(variance / nodeCount));
standardDeviation += Math.sqrt(variance / nodeCount);
}
System.out.println("当虚拟节点个数为" + virtualCount + "时, 标准差为" + standardDeviation/batchCount);
virtualCount += increaseInterval;
}
}
}



运行输出:

当虚拟节点个数为0时, 标准差为63621.32278096707

当虚拟节点个数为100时, 标准差为8755.287499562764

当虚拟节点个数为200时, 标准差为6582.103599913936

当虚拟节点个数为300时, 标准差为6441.640738197063

当虚拟节点个数为400时, 标准差为5688.685911526492

当虚拟节点个数为500时, 标准差为5000.89817932739

当虚拟节点个数为600时, 标准差为4676.9576222155365

当虚拟节点个数为700时, 标准差为3523.9862939574555

当虚拟节点个数为800时, 标准差为3149.0620508335496

当虚拟节点个数为900时, 标准差为2600.461651322703

当虚拟节点个数为1000时, 标准差为2740.45886668638



从趋势来看,虚拟节点越多,标准差越小,分布越均匀。

用户头像

joshuamai

关注

还未添加个人签名 2019.05.21 加入

还未添加个人简介

评论

发布
暂无评论
架构师训练营 - 第五周课后练习