写点什么

Nacos 源码—Nacos 集群高可用分析

  • 2025-05-06
    福建
  • 本文字数:27301 字

    阅读完需:约 90 分钟

1.Nacos 集群的几个问题


问题一:在单机模式下,Nacos 服务端会开启心跳健康检查的定时任务。那么在集群模式下,是否有必要让全部集群节点都执行这个定时任务?

 

问题二:Nacos 服务端通过心跳健康检查的定时任务感知服务实例健康状态改变时,如何把服务实例的健康状态同步给其他 Nacos 集群节点?

 

问题三:一个新服务实例发起注册请求,只会有一个 Nacos 集群节点处理对应请求,那么处理完注册请求后,集群节点间应该如何同步服务实例数据?

 

问题四:假设 Nacos 集群有三个节点,现在需要新增了一个节点,那么新增的节点应该如何从集群中同步已存在的服务实例数据?

 

问题五:Nacos 集群节点相互之间,是否有心跳机制来检测集群节点是否可用?

 

2.单节点对服务进行心跳健康检查和同步检查结果


(1)集群对服务进行心跳健康检查的架构设计


假设 Nacos 集群有三个节点:现已知单机模式下的 Nacos 服务端是会开启心跳健康检查的定时任务的。既然集群节点有三个,是否每个节点都要执行心跳健康检查的定时任务?

 

方案一:三个节点全都去执行心跳健康检查任务。如果每个节点执行的结果都不同,那么以哪个为准?

 

方案二:只有一个节点去执行心跳健康检查任务,然后把检查结果同步给其他节点。

 

明显方案二逻辑简洁清晰,而 Nacos 集群也选择了方案二。在 Nacos 集群模式下,三个节点都会开启一个心跳健康检查的定时任务,但只有一个节点会真正地执行心跳健康检查的逻辑。然后在检查完成后,会开启一个定时任务将检查结果同步给其他节点。

 

(2)选择一个节点对服务进行心跳健康检查的源码


对服务进行心跳健康检查的任务,其实就是 ClientBeatCheckTask 任务。Nacos 服务端在处理服务实例注册接口请求时,就会开启这个任务。如下所示:



ClientBeatCheckTask 这个类是一个线程任务。在 ClientBeatCheckTask 的 run()方法中,一开始就有两个 if 判断。第一个 if 判断:判断当前节点在集群模式下是否需要对该 Service 执行心跳健康检查任务。第二个 if 判断:是否开启了健康检查任务,默认是开启的。注意:ClientBeatProcessor 用于处理服务实例的心跳,服务实例和服务都需要心跳健康检查。

 

在集群模式下,为了保证只有一个节点对该 Service 执行心跳健康检查,就需要第一个 if 判断中的 DistroMapper 的 responsible()方法来实现了。通过 DistroMapper 的 responsible()方法可知:只会有一个集群节点能够对该 Service 执行心跳健康检查。而其他的集群节点,并不会去执行对该 Service 的心跳健康检查。


//Check and update statues of ephemeral instances, remove them if they have been expired.public class ClientBeatCheckTask implements Runnable {    private Service service;//每个ClientBeatCheckTask都会对应一个Service    ...        @JsonIgnore    public DistroMapper getDistroMapper() {        return ApplicationUtils.getBean(DistroMapper.class);    }        @Override    public void run() {        try {            //第一个if判断:DistroMapper.responsible()方法            //判断当前节点在集群模式下是否需要对该Service执行心跳健康检查任务            if (!getDistroMapper().responsible(service.getName())) {                return;            }            //第二个if判断:            //是否开启了健康检查任务,默认是开启的            if (!getSwitchDomain().isHealthCheckEnabled()) {                return;            }            List<Instance> instances = service.allIPs(true);                    //first set health status of instances:            for (Instance instance : instances) {                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {                    if (!instance.isMarked()) {                        if (instance.isHealthy()) {                            instance.setHealthy(false);                            getPushService().serviceChanged(service);                            ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));                        }                    }                }            }                    if (!getGlobalConfig().isExpireInstance()) {                return;            }                    //then remove obsolete instances:            for (Instance instance : instances) {                if (instance.isMarked()) {                    continue;                }                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {                    //delete instance                    deleteIp(instance);                }            }        } catch (Exception e) {            Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);        }    }    ...}
//Distro mapper, judge which server response input service.@Component("distroMapper")public class DistroMapper extends MemberChangeListener { //List of service nodes, you must ensure that the order of healthyList is the same for all nodes. private volatile List<String> healthyList = new ArrayList<>(); //init server list. @PostConstruct public void init() { NotifyCenter.registerSubscriber(this);//注册订阅者 this.healthyList = MemberUtil.simpleMembers(memberManager.allMembers()); } ... //Judge whether current server is responsible for input service. public boolean responsible(String serviceName) { //获取集群节点数量,这里假设的是三个集群节点 final List<String> servers = healthyList; //如果采用单机模式启动,直接返回true if (!switchDomain.isDistroEnabled() || EnvUtil.getStandaloneMode()) { return true; } //如果没有可用的健康集群节点,直接返回false if (CollectionUtils.isEmpty(servers)) { //means distro config is not ready yet return false; } int index = servers.indexOf(EnvUtil.getLocalAddress()); int lastIndex = servers.lastIndexOf(EnvUtil.getLocalAddress()); if (lastIndex < 0 || index < 0) { return true; } //对serviceName进行Hash操作,然后对servers.size()取模,得到负责执行心跳健康检查任务的那个节点索引 int target = distroHash(serviceName) % servers.size(); return target >= index && target <= lastIndex; } private int distroHash(String serviceName) { return Math.abs(serviceName.hashCode() % Integer.MAX_VALUE); } ...}
复制代码


(3)集群之间同步服务的健康状态的源码


既然集群中只有一个节点能够对某 Service 执行心跳健康检查,那么心跳健康检查的结果应该如何同步给集群的其他节点。

 

一.集群间同步服务的健康状态的实现逻辑


每个节点都会有一个定时任务,用来同步心跳健康检查的结果给其他节点。该异步任务会通过 HTTP 方式,调用其他集群节点的接口来实现数据同步。

 

二.集群间同步服务的健康状态的实现源码


在 ServiceManager 类中,有一个 init()方法。该方法被 @PostConstruct 注解修饰了。在创建 ServiceManager 这个 Bean 时,便会调用这个 init()方法。而在这个方法中,就会开启同步心跳健康检查结果的定时任务。

 

其中与同步服务实例健康状态相关的有两个异步任务:第一个是用来发起同步心跳健康检查结果请求的异步任务,第二个是用来处理同步心跳健康检查结果请求的异步任务。处理请求的思路是:内存队列削峰 + 异步任务提速。


//Core manager storing all services in Nacos.@Componentpublic class ServiceManager implements RecordListener<Service> {    ...    //Init service maneger.    @PostConstruct    public void init() {        //用来发起 同步心跳健康检查结果请求 的异步任务        GlobalExecutor.scheduleServiceReporter(new ServiceReporter(), 60000, TimeUnit.MILLISECONDS);        //用来处理 同步心跳健康检查结果请求 的异步任务:内存队列削峰 + 异步任务提速        GlobalExecutor.submitServiceUpdateManager(new UpdatedServiceProcessor());            if (emptyServiceAutoClean) {            Loggers.SRV_LOG.info("open empty service auto clean job, initialDelay : {} ms, period : {} ms", cleanEmptyServiceDelay, cleanEmptyServicePeriod);                    //delay 60s, period 20s;            //This task is not recommended to be performed frequently in order to avoid            //the possibility that the service cache information may just be deleted            //and then created due to the heartbeat mechanism            GlobalExecutor.scheduleServiceAutoClean(new EmptyServiceAutoClean(), cleanEmptyServiceDelay, cleanEmptyServicePeriod);        }        try {            Loggers.SRV_LOG.info("listen for service meta change");            consistencyService.listen(KeyBuilder.SERVICE_META_KEY_PREFIX, this);        } catch (NacosException e) {            Loggers.SRV_LOG.error("listen for service meta change failed!");        }    }    ...}
public class GlobalExecutor { private static final ScheduledExecutorService SERVICE_SYNCHRONIZATION_EXECUTOR = ExecutorFactory.Managed.newSingleScheduledExecutorService( ClassUtils.getCanonicalName(NamingApp.class), new NameThreadFactory("com.alibaba.nacos.naming.service.worker") ); public static final ScheduledExecutorService SERVICE_UPDATE_MANAGER_EXECUTOR = ExecutorFactory.Managed.newSingleScheduledExecutorService( ClassUtils.getCanonicalName(NamingApp.class), new NameThreadFactory("com.alibaba.nacos.naming.service.update.processor") ); ... public static void scheduleServiceReporter(Runnable command, long delay, TimeUnit unit) { //在指定的延迟后执行某项任务 SERVICE_SYNCHRONIZATION_EXECUTOR.schedule(command, delay, unit); } public static void submitServiceUpdateManager(Runnable runnable) { //向线程池提交任务,让线程池执行任务 SERVICE_UPDATE_MANAGER_EXECUTOR.submit(runnable); } ...}
public final class ExecutorFactory { ... public static final class Managed { private static final String DEFAULT_NAMESPACE = "nacos"; private static final ThreadPoolManager THREAD_POOL_MANAGER = ThreadPoolManager.getInstance(); ... //Create a new single scheduled executor service with input thread factory and register to manager. public static ScheduledExecutorService newSingleScheduledExecutorService(final String group, final ThreadFactory threadFactory) { ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, threadFactory); THREAD_POOL_MANAGER.register(DEFAULT_NAMESPACE, group, executorService); return executorService; } ... }}
//线程池管理器public final class ThreadPoolManager { private Map<String, Map<String, Set<ExecutorService>>> resourcesManager; private Map<String, Object> lockers = new ConcurrentHashMap<String, Object>(8); private static final ThreadPoolManager INSTANCE = new ThreadPoolManager(); private static final AtomicBoolean CLOSED = new AtomicBoolean(false); static { INSTANCE.init(); //JVM关闭时添加勾子,释放线程资源 ThreadUtils.addShutdownHook(new Thread(new Runnable() { @Override public void run() { LOGGER.warn("[ThreadPoolManager] Start destroying ThreadPool"); //关闭线程池管理器 shutdown(); LOGGER.warn("[ThreadPoolManager] Destruction of the end"); } })); } public static ThreadPoolManager getInstance() { return INSTANCE; } private ThreadPoolManager() { } private void init() { resourcesManager = new ConcurrentHashMap<String, Map<String, Set<ExecutorService>>>(8); } //Register the thread pool resources with the resource manager. public void register(String namespace, String group, ExecutorService executor) { if (!resourcesManager.containsKey(namespace)) { synchronized (this) { lockers.put(namespace, new Object()); } } final Object monitor = lockers.get(namespace); synchronized (monitor) { Map<String, Set<ExecutorService>> map = resourcesManager.get(namespace); if (map == null) { map = new HashMap<String, Set<ExecutorService>>(8); map.put(group, new HashSet<ExecutorService>()); map.get(group).add(executor); resourcesManager.put(namespace, map); return; } if (!map.containsKey(group)) { map.put(group, new HashSet<ExecutorService>()); } map.get(group).add(executor); } } //Shutdown thread pool manager. 关闭线程池管理器 public static void shutdown() { if (!CLOSED.compareAndSet(false, true)) { return; } Set<String> namespaces = INSTANCE.resourcesManager.keySet(); for (String namespace : namespaces) { //销毁所有线程池资源 INSTANCE.destroy(namespace); } } //Destroys all thread pool resources under this namespace. public void destroy(final String namespace) { final Object monitor = lockers.get(namespace); if (monitor == null) { return; } synchronized (monitor) { Map<String, Set<ExecutorService>> subResource = resourcesManager.get(namespace); if (subResource == null) { return; } for (Map.Entry<String, Set<ExecutorService>> entry : subResource.entrySet()) { for (ExecutorService executor : entry.getValue()) { //关闭线程池 ThreadUtils.shutdownThreadPool(executor); } } resourcesManager.get(namespace).clear(); resourcesManager.remove(namespace); } } ...}
public final class ThreadUtils { ... public static void addShutdownHook(Runnable runnable) { Runtime.getRuntime().addShutdownHook(new Thread(runnable)); } public static void shutdownThreadPool(ExecutorService executor) { shutdownThreadPool(executor, null); } //Shutdown thread pool. public static void shutdownThreadPool(ExecutorService executor, Logger logger) { executor.shutdown(); int retry = 3; while (retry > 0) { retry--; try { if (executor.awaitTermination(1, TimeUnit.SECONDS)) { return; } } catch (InterruptedException e) { executor.shutdownNow(); Thread.interrupted(); } catch (Throwable ex) { if (logger != null) { logger.error("ThreadPoolManager shutdown executor has error : {}", ex); } } } executor.shutdownNow(); } ...}
复制代码


三.第一个异步任务 ServiceReporter


首先从内存注册表中,获取全部的服务名称。ServiceManager 的 getAllServiceNames()方法返回的是一个 Map 对象。其中的 key 是对应的命名空间 ID,value 是对应命名空间下的全部服务名称。然后遍历 allServiceNames 中的内容,此时会有两个 for 循环来处理。最后这个任务执行完,会继续提交一个延时执行的任务进行健康检查。

 

第一个 for 循环:遍历某命名空间 ID 下的全部服务名称,封装请求参数。

首先采用同样的 Hash 算法,判断遍历到的 Service 是否需要同步健康结果。如果需要执行,则把参数放到 ServiceChecksum 对象中。然后通过 JacksonUtils 转成 JSON 数据后,再放到 Message 请求参数对象。

 

第二个 for 循环:遍历集群节点,发送请求给其他节点进行数据同步。

首先判断是否是自身节点,如果是则跳过。否则调用 ServiceStatusSynchronizer 的 send()方法。通过向其他集群节点的接口发起请求,来实现心跳健康检查结果的同步。集群节点同步的核心方法就在 ServiceStatusSynchronizer 的 send()方法中。

 

通过 ServiceStatusSynchronizer 的 send()方法中的代码可知,最终会通过 HTTP 方式进行数据同步,请求地址是"v1/ns/service/status"。该请求地址对应的请求处理入口是 ServiceController 的 serviceStatus()方法。

 

在 ServiceController 的 serviceStatus()方法中,如果通过对比入参和注册表的 ServiceChecksum 后,发现服务状态发生了改变,那么就会调用 ServiceManager.addUpdatedServiceToQueue()方法。

 

在 addUpdatedServiceToQueue()方法中,首先会把传入的参数包装成 ServiceKey 对象,然后放入到 toBeUpdatedServicesQueue 阻塞队列中。

 

既然最后会将 ServiceKey 对象放入到阻塞队列中,那必然有一个异步任务,从阻塞队列中获取 ServiceKey 对象进行处理。这个处理逻辑和处理服务实例注册时,将 Pair 对象放入阻塞队列一样,而这个异步任务便是 ServiceManager 的 init()方法的第二个异步任务。


//Core manager storing all services in Nacos.@Componentpublic class ServiceManager implements RecordListener<Service> {    //Map(namespace, Map(group::serviceName, Service)).    private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>();    private final DistroMapper distroMapper;    private final Synchronizer synchronizer = new ServiceStatusSynchronizer();    ...    public Map<String, Set<String>> getAllServiceNames() {        Map<String, Set<String>> namesMap = new HashMap<>(16);        for (String namespaceId : serviceMap.keySet()) {            namesMap.put(namespaceId, serviceMap.get(namespaceId).keySet());        }        return namesMap;    }        private class ServiceReporter implements Runnable {        @Override        public void run() {            try {                //获取内存注册表下的所有服务名称,按命名空间分类                Map<String, Set<String>> allServiceNames = getAllServiceNames();                if (allServiceNames.size() <= 0) {                    //ignore                    return;                }                //遍历allServiceNames中的内容                //也就是遍历每一个命名空间,然后封装请求参数,接着发送请求来同步心跳健康检查结果                for (String namespaceId : allServiceNames.keySet()) {                    ServiceChecksum checksum = new ServiceChecksum(namespaceId);                    //第一个循环:封装请求参数                    for (String serviceName : allServiceNames.get(namespaceId)) {                        //采用同样的算法,确保当前的集群节点,只对自己负责的那些Service,同步心跳健康检查结果                        if (!distroMapper.responsible(serviceName)) {                            continue;                        }                        Service service = getService(namespaceId, serviceName);                        if (service == null || service.isEmpty()) {                            continue;                        }                        service.recalculateChecksum();                        //添加请求参数                        checksum.addItem(serviceName, service.getChecksum());                    }                    //创建请求参数对象Message,准备进行同步                    Message msg = new Message();                    //对请求对象进行JSON序列化                    msg.setData(JacksonUtils.toJson(checksum));                    Collection<Member> sameSiteServers = memberManager.allMembers();                    if (sameSiteServers == null || sameSiteServers.size() <= 0) {                        return;                    }                                       //第二个循环:遍历所有集群节点,发送请求给其他节点进行数据同步                    for (Member server : sameSiteServers) {                        //判断地址是否是本节点,如果是则直接跳过                        if (server.getAddress().equals(NetUtils.localServer())) {                            continue;                        }                        //同步其他集群节点                        synchronizer.send(server.getAddress(), msg);                    }                }            } catch (Exception e) {                Loggers.SRV_LOG.error("[DOMAIN-STATUS] Exception while sending service status", e);            } finally {                //继续提交一个延时执行的任务                GlobalExecutor.scheduleServiceReporter(this, switchDomain.getServiceStatusSynchronizationPeriodMillis(), TimeUnit.MILLISECONDS);            }        }    }    ...}
public class ServiceStatusSynchronizer implements Synchronizer { @Override public void send(final String serverIP, Message msg) { if (serverIP == null) { return; } //构建请求参数 Map<String, String> params = new HashMap<String, String>(10); params.put("statuses", msg.getData()); params.put("clientIP", NetUtils.localServer()); //拼接url地址 String url = "http://" + serverIP + ":" + EnvUtil.getPort() + EnvUtil.getContextPath() + UtilsAndCommons.NACOS_NAMING_CONTEXT + "/service/status"; if (IPUtil.containsPort(serverIP)) { url = "http://" + serverIP + EnvUtil.getContextPath() + UtilsAndCommons.NACOS_NAMING_CONTEXT + "/service/status"; } try { //异步发送HTTP请求,url地址就是:http://ip/v1/ns/service/status, 用来同步心跳健康检查结果 HttpClient.asyncHttpPostLarge(url, null, JacksonUtils.toJson(params), new Callback<String>() { @Override public void onReceive(RestResult<String> result) { if (!result.ok()) { Loggers.SRV_LOG.warn("[STATUS-SYNCHRONIZE] failed to request serviceStatus, remote server: {}", serverIP); } } @Override public void onError(Throwable throwable) { Loggers.SRV_LOG.warn("[STATUS-SYNCHRONIZE] failed to request serviceStatus, remote server: " + serverIP, throwable); } @Override public void onCancel() {

} }); } catch (Exception e) { Loggers.SRV_LOG.warn("[STATUS-SYNCHRONIZE] failed to request serviceStatus, remote server: " + serverIP, e); } } ...}
//Service operation controller.@RestController@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/service")public class ServiceController { @Autowired protected ServiceManager serviceManager; ... //Check service status whether latest. @PostMapping("/status") public String serviceStatus(HttpServletRequest request) throws Exception { String entity = IoUtils.toString(request.getInputStream(), "UTF-8"); String value = URLDecoder.decode(entity, "UTF-8"); JsonNode json = JacksonUtils.toObj(value); String statuses = json.get("statuses").asText(); String serverIp = json.get("clientIP").asText(); if (!memberManager.hasMember(serverIp)) { throw new NacosException(NacosException.INVALID_PARAM, "ip: " + serverIp + " is not in serverlist"); } try { ServiceManager.ServiceChecksum checksums = JacksonUtils.toObj(statuses, ServiceManager.ServiceChecksum.class); if (checksums == null) { Loggers.SRV_LOG.warn("[DOMAIN-STATUS] receive malformed data: null"); return "fail"; } for (Map.Entry<String, String> entry : checksums.serviceName2Checksum.entrySet()) { if (entry == null || StringUtils.isEmpty(entry.getKey()) || StringUtils.isEmpty(entry.getValue())) { continue; } String serviceName = entry.getKey(); String checksum = entry.getValue(); Service service = serviceManager.getService(checksums.namespaceId, serviceName); if (service == null) { continue; } service.recalculateChecksum(); //通过对比入参和注册表的checksum,如果发现服务状态有变动 if (!checksum.equals(service.getChecksum())) { if (Loggers.SRV_LOG.isDebugEnabled()) { Loggers.SRV_LOG.debug("checksum of {} is not consistent, remote: {}, checksum: {}, local: {}", serviceName, serverIp, checksum, service.getChecksum()); } //添加到阻塞队列 serviceManager.addUpdatedServiceToQueue(checksums.namespaceId, serviceName, serverIp, checksum); } } } catch (Exception e) { Loggers.SRV_LOG.warn("[DOMAIN-STATUS] receive malformed data: " + statuses, e); } return "ok"; } ...}
//Core manager storing all services in Nacos.@Componentpublic class ServiceManager implements RecordListener<Service> { private final Lock lock = new ReentrantLock(); //阻塞队列 private final LinkedBlockingDeque<ServiceKey> toBeUpdatedServicesQueue = new LinkedBlockingDeque<>(1024 * 1024); ... //Add a service into queue to update. public void addUpdatedServiceToQueue(String namespaceId, String serviceName, String serverIP, String checksum) { lock.lock(); try { //包装成ServiceKey对象,放入到toBeUpdatedServicesQueue阻塞队列中 toBeUpdatedServicesQueue.offer(new ServiceKey(namespaceId, serviceName, serverIP, checksum), 5, TimeUnit.MILLISECONDS); } catch (Exception e) { toBeUpdatedServicesQueue.poll(); toBeUpdatedServicesQueue.add(new ServiceKey(namespaceId, serviceName, serverIP, checksum)); Loggers.SRV_LOG.error("[DOMAIN-STATUS] Failed to add service to be updated to queue.", e); } finally { lock.unlock(); } } ...}
复制代码


四.第二个异步任务 UpdatedServiceProcessor


UpdatedServiceProcessor 的 run()方法中有一个 while 无限循环,这个 while 无限循环会从 toBeUpdatedServicesQueue 阻塞队列中一直取任务。取得任务 ServiceKey 对象后,会将其封装成 ServiceUpdater 对象,然后继续将 ServiceUpdater 对象作为一个任务提交给一个线程池。

 

这个心跳健康检查结果的数据同步逻辑,和服务实例注册的处理逻辑类似,都使用了"阻塞队列 + 异步任务"的设计思想。放入阻塞队列是为了削峰,从阻塞队列取出任务再提交线程池是为了提速。

 

线程池在执行同步健康状态任务时,即执行 ServiceUpdater 的 run()方法时,会调用 ServiceManager 的 updatedHealthStatus()方法来更改服务的健康状态。

 

在 ServiceManager 的 updatedHealthStatus()方法中,首先会解析参数,然后获取注册表中全部的 Instance 实例,并遍历实例。如果实例的健康状态有变动,则直接更改实例的 healthy 属性,并且针对 healthy 有变动的实例,发布服务改变事件通知客户端进行更新。


//Core manager storing all services in Nacos.@Componentpublic class ServiceManager implements RecordListener<Service> {    //阻塞队列    private final LinkedBlockingDeque<ServiceKey> toBeUpdatedServicesQueue = new LinkedBlockingDeque<>(1024 * 1024);    ...    private class UpdatedServiceProcessor implements Runnable {        //get changed service from other server asynchronously        @Override        public void run() {            ServiceKey serviceKey = null;            try {                //无限循环                while (true) {                    try {                        //从阻塞队列中获取任务                        serviceKey = toBeUpdatedServicesQueue.take();                    } catch (Exception e) {                        Loggers.EVT_LOG.error("[UPDATE-DOMAIN] Exception while taking item from LinkedBlockingDeque.");                    }                    if (serviceKey == null) {                        continue;                    }                    GlobalExecutor.submitServiceUpdate(new ServiceUpdater(serviceKey));                }            } catch (Exception e) {                Loggers.EVT_LOG.error("[UPDATE-DOMAIN] Exception while update service: {}", serviceKey, e);            }        }    }
private class ServiceUpdater implements Runnable { String namespaceId; String serviceName; String serverIP; public ServiceUpdater(ServiceKey serviceKey) { this.namespaceId = serviceKey.getNamespaceId(); this.serviceName = serviceKey.getServiceName(); this.serverIP = serviceKey.getServerIP(); } @Override public void run() { try { //修改服务实例的健康状态 updatedHealthStatus(namespaceId, serviceName, serverIP); } catch (Exception e) { Loggers.SRV_LOG.warn("[DOMAIN-UPDATER] Exception while update service: {} from {}, error: {}", serviceName, serverIP, e); } } } //Update health status of instance in service. 修改服务实例的健康状态 public void updatedHealthStatus(String namespaceId, String serviceName, String serverIP) { Message msg = synchronizer.get(serverIP, UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName)); //解析参数 JsonNode serviceJson = JacksonUtils.toObj(msg.getData()); ArrayNode ipList = (ArrayNode) serviceJson.get("ips"); Map<String, String> ipsMap = new HashMap<>(ipList.size()); for (int i = 0; i < ipList.size(); i++) { String ip = ipList.get(i).asText(); String[] strings = ip.split("_"); ipsMap.put(strings[0], strings[1]); } Service service = getService(namespaceId, serviceName); if (service == null) { return; } //是否改变标识 boolean changed = false; //获取全部的实例数据,进行遍历 List<Instance> instances = service.allIPs(); for (Instance instance : instances) { //同步健康状态结果 boolean valid = Boolean.parseBoolean(ipsMap.get(instance.toIpAddr())); if (valid != instance.isHealthy()) { changed = true; //更新服务实例的健康状态 instance.setHealthy(valid); Loggers.EVT_LOG.info("{} {SYNC} IP-{} : {}:{}@{}", serviceName, (instance.isHealthy() ? "ENABLED" : "DISABLED"), instance.getIp(), instance.getPort(), instance.getClusterName()); } } //如果服务实例健康状态改变了,那么就发布"服务改变事件",使用UDP方式通知客户端 if (changed) { pushService.serviceChanged(service); if (Loggers.EVT_LOG.isDebugEnabled()) { StringBuilder stringBuilder = new StringBuilder(); List<Instance> allIps = service.allIPs(); for (Instance instance : allIps) { stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(","); } Loggers.EVT_LOG.debug("[HEALTH-STATUS-UPDATED] namespace: {}, service: {}, ips: {}", service.getNamespaceId(), service.getName(), stringBuilder.toString()); } } } ...}
复制代码


(4)总结


问题一:在单机模式下,Nacos 服务端会开启一个对服务进行心跳健康检查的定时任务。那么在集群模式下,是否有必要让全部节点都执行这个定时任务?

 

答:当 Service 的 init()方法执行心跳健康检查任务时,首先会有一个逻辑判断。具体就是根据服务名称进行哈希运算,然后结合集群节点数量进行取模,最终选出一个节点来执行心跳健康检查任务。所以 Nacos 服务端对服务 Service 的心跳健康检查任务,在集群架构下,并不是每一台集群机器都会执行这个任务的,而是通过算法选出一台机器来执行,然后再把结果同步给其他集群节点。

 

问题二:Nacos 服务端通过心跳健康检查的定时任务感知服务的健康状态改变时,如何把服务的健康状态同步给其他 Nacos 集群节点?

 

答:当 Nacos 服务端也就是 Service 的 init()方法执行完成心跳健康检查任务后,ServiceManager 的 init()方法会有一个定时任务,同步检查结果到其他节点。这个定时任务会使用 HTTP 的方式来进行心跳健康检查结果的同步。这个定时任务执行完,会继续创建一个延迟执行的定时任务继续进行同步。

 

ServiceManager 的 init()方法还有一个定时任务用来处理检查结果的同步请求。这个定时任务的设计采用了:内存阻塞队列 + 异步任务的方式。这个定时任务会通过 while 无限循环一直从阻塞队列获取数据进行处理。


 

3.集群新增服务实例时如何同步给其他节点


(1)新增服务实例时同步给集群其他节点的架构


Nacos 使用的架构是:双层内存队列 + 异步任务。

 

第一层:

Nacos 会使用一个 ConcurrentHashMap 作为延迟任务的存储容器,把新增服务实例的信息包装成一个 DistroDelayTask 任务,放入到该 Map 中。

 

DistroTaskEngineHolder 有一个属性叫 DistroDelayTaskExecuteEngine,该属性父类构造方法会开启一个异步任务从 ConcurrentHashMap 获取 DistroDelayTask 任务。

 

第二层:

Nacos 会使用 BlockingQueue 作为同步任务的存储容器,根据参数创建 DistroSyncChangeTask 线程任务,并放入 BlockingQueue。

 

Nacos 会开启一个 InnerWorker 异步任务,它会从 BlockingQueue 取出 DistroSyncChangeTask 并调用其 run()方法。

 

在 DistroSyncChangeTask 的 run()方法中,最后会通过 HTTP 方式,调用其他集群节点的 API 接口来完成数据同步。

 

(2)新增服务实例时同步给集群其他节点的源码


一.构造延迟任务存储在 Map 中 + 异步任务处理


Nacos 服务端在处理服务实例注册请求时,会调用 DistroConsistencyServiceImpl 的 onPut()方法来触发更新内存注册表,然后才调用 DistroProtocol 的 sync()方法进行集群数据的同步。

 

在 DistroProtocol 的 sync()方法的 for 循环会遍历除自身外的其他集群节点。这个集群节点数据是在搭建 Nacos 集群时,在 cluster.conf 文件中配置的,所以 Nacos 服务端能够获取到整个集群节点的信息。遍历除自身外的集群节点,是因为自己本身是不需要进行数据同步的,当前节点自己只需要同步数据到其他集群节点即可。

 

DistroProtocol 的 sync()方法的 for 循环最后封装一个 DistroDelayTask 任务,然后调用 NacosDelayTaskExecuteEngine 的 addTask()方法添加到 tasks 属性,也就是 ConcurrentHashMap 类型的 tasks 属性中,其中 DistroDelayTask 任务实现了 NacosTask 任务。

 

而 NacosDelayTaskExecuteEngine 在初始化时,会开启一个异步任务。这个异步任务会执行 ProcessRunnable 的 run()方法,接着会执行 NacosDelayTaskExecuteEngine 的 processTasks()方法。

 

在 processTasks()方法中,先从 tasks 这个 map 中获取全部的 key 进行遍历,然后根据 key 调用 NacosDelayTaskExecuteEngine 的 removeTask()方法。removeTask()方法会将从 tasks 这个 map 中获取到的延迟任务进行删除然后返回,接着根据 taskKey 获取 DistroDelayTaskProcessor 同步任务处理器,最后调用 DistroDelayTaskProcessor 的 process()方法,把从 removeTask()方法返回的 NacosTask 延迟任务放入第二层内存队列中。


@DependsOn("ProtocolManager")@org.springframework.stereotype.Service("distroConsistencyService")public class DistroConsistencyServiceImpl implements EphemeralConsistencyService, DistroDataProcessor {    private final DistroProtocol distroProtocol;    ...    @Override    public void put(String key, Record value) throws NacosException {        //把包含了当前注册的服务实例的、最新的服务实例列表,存储到DataStore对象中,        //并添加异步任务来实现将最新的服务实例列表更新到内存注册表        onPut(key, value);        //在集群架构下,DistroProtocol.sync()方法会进行集群节点的服务实例数据同步        distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE, globalConfig.getTaskDispatchPeriod() / 2);    }    ...}
@Componentpublic class DistroProtocol { private final ServerMemberManager memberManager; private final DistroTaskEngineHolder distroTaskEngineHolder; ... //Start to sync data to all remote server. public void sync(DistroKey distroKey, DataOperation action, long delay) { //遍历除自身以外的其他集群节点 for (Member each : memberManager.allMembersWithoutSelf()) { //包装第一层 DistroKey distroKeyWithTarget = new DistroKey(distroKey.getResourceKey(), distroKey.getResourceType(), each.getAddress()); //包装第二层 DistroDelayTask distroDelayTask = new DistroDelayTask(distroKeyWithTarget, action, delay); //实际调用的是NacosDelayTaskExecuteEngine.addTask()方法添加任务 distroTaskEngineHolder.getDelayTaskExecuteEngine().addTask(distroKeyWithTarget, distroDelayTask); if (Loggers.DISTRO.isDebugEnabled()) { Loggers.DISTRO.debug("[DISTRO-SCHEDULE] {} to {}", distroKey, each.getAddress()); } } } ...}
public class DistroKey { private String resourceKey; private String resourceType; private String targetServer; public DistroKey() { } public DistroKey(String resourceKey, String resourceType, String targetServer) { this.resourceKey = resourceKey; this.resourceType = resourceType; this.targetServer = targetServer; } ...}
//Distro delay task.public class DistroDelayTask extends AbstractDelayTask { private final DistroKey distroKey; private DataOperation action; private long createTime; public DistroDelayTask(DistroKey distroKey, DataOperation action, long delayTime) { this.distroKey = distroKey; this.action = action; this.createTime = System.currentTimeMillis(); setLastProcessTime(createTime); setTaskInterval(delayTime); } ...}
//Abstract task which can delay and merge.public abstract class AbstractDelayTask implements NacosTask { //Task time interval between twice processing, unit is millisecond. private long taskInterval; //The time which was processed at last time, unit is millisecond. private long lastProcessTime; public void setTaskInterval(long interval) { this.taskInterval = interval; } public void setLastProcessTime(long lastProcessTime) { this.lastProcessTime = lastProcessTime; } ...}
//Distro task engine holder.@Componentpublic class DistroTaskEngineHolder { private final DistroDelayTaskExecuteEngine delayTaskExecuteEngine = new DistroDelayTaskExecuteEngine(); public DistroDelayTaskExecuteEngine getDelayTaskExecuteEngine() { return delayTaskExecuteEngine; } ...}
public class DistroDelayTaskExecuteEngine extends NacosDelayTaskExecuteEngine { public DistroDelayTaskExecuteEngine() { super(DistroDelayTaskExecuteEngine.class.getName(), Loggers.DISTRO); } ...}
//Nacos delay task execute engine.public class NacosDelayTaskExecuteEngine extends AbstractNacosTaskExecuteEngine<AbstractDelayTask> { private final ScheduledExecutorService processingExecutor; protected final ConcurrentHashMap<Object, AbstractDelayTask> tasks;//任务池 protected final ReentrantLock lock = new ReentrantLock(); ... public NacosDelayTaskExecuteEngine(String name, int initCapacity, Logger logger, long processInterval) { super(logger); tasks = new ConcurrentHashMap<Object, AbstractDelayTask>(initCapacity); processingExecutor = ExecutorFactory.newSingleScheduledExecutorService(new NameThreadFactory(name)); //开启延时任务 processingExecutor.scheduleWithFixedDelay(new ProcessRunnable(), processInterval, processInterval, TimeUnit.MILLISECONDS); } @Override public void addTask(Object key, AbstractDelayTask newTask) { lock.lock(); try { AbstractDelayTask existTask = tasks.get(key); if (null != existTask) { newTask.merge(existTask); } //最后放入到ConcurrentHashMap中 tasks.put(key, newTask); } finally { lock.unlock(); } } ... private class ProcessRunnable implements Runnable { @Override public void run() { try { processTasks(); } catch (Throwable e) { getEngineLog().error(e.toString(), e); } } } ... //process tasks in execute engine. protected void processTasks() { //获取tasks中所有的任务,然后进行遍历 Collection<Object> keys = getAllTaskKeys(); for (Object taskKey : keys) { //通过任务key,获取具体的任务,并且从任务池中移除掉 AbstractDelayTask task = removeTask(taskKey); if (null == task) { continue; } //根据taskKey获取NacosTaskProcessor延迟任务处理器:DistroDelayTaskProcessor NacosTaskProcessor processor = getProcessor(taskKey); if (null == processor) { getEngineLog().error("processor not found for task, so discarded. " + task); continue; } try { //ReAdd task if process failed //调用DistroDelayTaskProcessor.process()方法,把task同步任务放入到第二层内存队列中 if (!processor.process(task)) { //如果失败了,会重试添加task回tasks这个map中 retryFailedTask(taskKey, task); } } catch (Throwable e) { getEngineLog().error("Nacos task execute error : " + e.toString(), e); retryFailedTask(taskKey, task); } } } @Override public AbstractDelayTask removeTask(Object key) { lock.lock(); try { AbstractDelayTask task = tasks.get(key); if (null != task && task.shouldProcess()) { return tasks.remove(key); } else { return null; } } finally { lock.unlock(); } }}
复制代码


二.构造同步任务存储在 Queue 中 + 异步任务处理


在 DistroDelayTaskProcessor 的 process()方法中,会把获取到的 NacosTask 延迟任务放入第二层内存队列。也就是先将 NacosTask 任务对象转换为 DistroDelayTask 延迟任务对象,然后包装一个 DistroSyncChangeTask 同步任务对象,最后调用 NacosExecuteTaskExecuteEngine 的 addTask()方法添加到队列中。

 

具体在执行 NacosExecuteTaskExecuteEngine 的 addTask()方法时,会调用同一个类下的 getWorker()方法获取其中一个 TaskExecuteWorker。然后通过调用 TaskExecuteWorker 的 process()方法,把 DistroSyncChangeTask 同步任务放入 TaskExecuteWorker 的 queue 队列。

 

创建 NacosExecuteTaskExecuteEngine 时会创建多个 TaskExecuteWorker,而 TaskExecuteWorker 初始化时又会启动一个 InnerWorker 线程。这个 InnerWorker 线程会不断从阻塞队列中取出同步任务进行处理,也就是 InnerWorker 的 run()方法会调用 DistroSyncChangeTask 的 run()方法,通过 DistroSyncChangeTask 的 run()方法来处理服务实例数据的集群同步。


//Distro delay task processor.public class DistroDelayTaskProcessor implements NacosTaskProcessor {    private final DistroTaskEngineHolder distroTaskEngineHolder;    private final DistroComponentHolder distroComponentHolder;        public DistroDelayTaskProcessor(DistroTaskEngineHolder distroTaskEngineHolder, DistroComponentHolder distroComponentHolder) {        this.distroTaskEngineHolder = distroTaskEngineHolder;        this.distroComponentHolder = distroComponentHolder;    }        @Override    public boolean process(NacosTask task) {        if (!(task instanceof DistroDelayTask)) {            return true;        }        //将NacosTask任务对象转换为DistroDelayTask任务对象        DistroDelayTask distroDelayTask = (DistroDelayTask) task;        DistroKey distroKey = distroDelayTask.getDistroKey();        if (DataOperation.CHANGE.equals(distroDelayTask.getAction())) {            //包装成一个DistroSyncChangeTask对象            DistroSyncChangeTask syncChangeTask = new DistroSyncChangeTask(distroKey, distroComponentHolder);            //调用NacosExecuteTaskExecuteEngine.addTask()方法添加到队列中去            distroTaskEngineHolder.getExecuteWorkersManager().addTask(distroKey, syncChangeTask);            return true;        }        return false;    }}
//Nacos execute task execute engine.public class NacosExecuteTaskExecuteEngine extends AbstractNacosTaskExecuteEngine<AbstractExecuteTask> { private final TaskExecuteWorker[] executeWorkers; public NacosExecuteTaskExecuteEngine(String name, Logger logger, int dispatchWorkerCount) { super(logger); //TaskExecuteWorker在初始化时会启动一个线程处理其队列中的任务 executeWorkers = new TaskExecuteWorker[dispatchWorkerCount]; for (int mod = 0; mod < dispatchWorkerCount; ++mod) { executeWorkers[mod] = new TaskExecuteWorker(name, mod, dispatchWorkerCount, getEngineLog()); } } ... @Override public void addTask(Object tag, AbstractExecuteTask task) { //根据tag获取到TaskExecuteWorker NacosTaskProcessor processor = getProcessor(tag); if (null != processor) { processor.process(task); return; } TaskExecuteWorker worker = getWorker(tag); //调用TaskExecuteWorker.process()方法把DistroSyncChangeTask任务放入到队列当中去 worker.process(task); } private TaskExecuteWorker getWorker(Object tag) { int idx = (tag.hashCode() & Integer.MAX_VALUE) % workersCount(); return executeWorkers[idx]; } ...}
//Nacos execute task execute worker.public final class TaskExecuteWorker implements NacosTaskProcessor, Closeable { //任务存储容器 private final BlockingQueue<Runnable> queue; public TaskExecuteWorker(final String name, final int mod, final int total, final Logger logger) { this.name = name + "_" + mod + "%" + total; this.queue = new ArrayBlockingQueue<Runnable>(QUEUE_CAPACITY); this.closed = new AtomicBoolean(false); this.log = null == logger ? LoggerFactory.getLogger(TaskExecuteWorker.class) : logger; new InnerWorker(name).start(); } ... @Override public boolean process(NacosTask task) { if (task instanceof AbstractExecuteTask) { //把DistroSyncChangeTask任务放入到队列中 putTask((Runnable) task); } return true; } private void putTask(Runnable task) { try { //把DistroSyncChangeTask任务放入到队列中 queue.put(task); } catch (InterruptedException ire) { log.error(ire.toString(), ire); } } ... //Inner execute worker. private class InnerWorker extends Thread { InnerWorker(String name) { setDaemon(false); setName(name); } @Override public void run() { while (!closed.get()) { try { //一直取队列中的任务,这里的task任务类型是:DistroSyncChangeTask Runnable task = queue.take(); long begin = System.currentTimeMillis(); //调用DistroSyncChangeTask中的run方法 task.run(); long duration = System.currentTimeMillis() - begin; if (duration > 1000L) { log.warn("distro task {} takes {}ms", task, duration); } } catch (Throwable e) { log.error("[DISTRO-FAILED] " + e.toString(), e); } } } }}
复制代码


三.同步服务实例数据到集群节点的核心方法


在 DistroSyncChangeTask 的 run()方法中,会先获取 DistroHttpAgent,然后调用 DistroHttpAgent 的 syncData()方法,通过 HTTP 方式把新增的服务实例数据同步给其他集群节点。向集群节点进行同步服务实例数据的地址是:/v1/ns/distro/datum,这对应于 DistroController 的 onSyncDatum()方法。

 

DistroController 的 onSyncDatum()方法会遍历传递过来的服务实例对象。如果调用 ServiceManager 的 containService()方法时发现服务不存在,则先通过 ServiceManager 的 createEmptyService()方法创建空的服务,然后会调用 DistroProtocol 的 onReceive()方法注册服务实例,接着会调用 DistroConsistencyServiceImpl 的 processData()方法进行处理,最后又会调用实例注册时的 DistroConsistencyServiceImpl 的 onPut()方法。


//Distro sync change task.public class DistroSyncChangeTask extends AbstractDistroExecuteTask {        private final DistroComponentHolder distroComponentHolder;        public DistroSyncChangeTask(DistroKey distroKey, DistroComponentHolder distroComponentHolder) {        super(distroKey);        this.distroComponentHolder = distroComponentHolder;    }          @Override    public void run() {        Loggers.DISTRO.info("[DISTRO-START] {}", toString());        try {            //构建请求参数            String type = getDistroKey().getResourceType();            DistroData distroData = distroComponentHolder.findDataStorage(type).getDistroData(getDistroKey());            distroData.setType(DataOperation.CHANGE);            //调用DistroHttpAgent.syncData()方法,通过HTTP方式同步新增的服务实例数据            boolean result = distroComponentHolder.findTransportAgent(type).syncData(distroData, getDistroKey().getTargetServer());            if (!result) {                handleFailedTask();            }            Loggers.DISTRO.info("[DISTRO-END] {} result: {}", toString(), result);        } catch (Exception e) {            Loggers.DISTRO.warn("[DISTRO] Sync data change failed.", e);            handleFailedTask();        }    }    ...}
//Distro http agent.public class DistroHttpAgent implements DistroTransportAgent { private final ServerMemberManager memberManager; public DistroHttpAgent(ServerMemberManager memberManager) { this.memberManager = memberManager; } @Override public boolean syncData(DistroData data, String targetServer) { if (!memberManager.hasMember(targetServer)) { return true; } byte[] dataContent = data.getContent(); //通过HTTP方式同步新增的服务实例数据 return NamingProxy.syncData(dataContent, data.getDistroKey().getTargetServer()); } ...}
public class NamingProxy { ... //Synchronize datum to target server. public static boolean syncData(byte[] data, String curServer) { Map<String, String> headers = new HashMap<>(128); headers.put(HttpHeaderConsts.CLIENT_VERSION_HEADER, VersionUtils.version); headers.put(HttpHeaderConsts.USER_AGENT_HEADER, UtilsAndCommons.SERVER_VERSION); headers.put(HttpHeaderConsts.ACCEPT_ENCODING, "gzip,deflate,sdch"); headers.put(HttpHeaderConsts.CONNECTION, "Keep-Alive"); headers.put(HttpHeaderConsts.CONTENT_ENCODING, "gzip"); try { //通过HTTP同步数据 :/v1/ns/distro/datum RestResult<String> result = HttpClient.httpPutLarge( "http://" + curServer + EnvUtil.getContextPath() + UtilsAndCommons.NACOS_NAMING_CONTEXT + DATA_ON_SYNC_URL, headers, data); if (result.ok()) { return true; } if (HttpURLConnection.HTTP_NOT_MODIFIED == result.getCode()) { return true; } throw new IOException("failed to req API:" + "http://" + curServer + EnvUtil.getContextPath() + UtilsAndCommons.NACOS_NAMING_CONTEXT + DATA_ON_SYNC_URL + ". code:" + result.getCode() + " msg: " + result.getData()); } catch (Exception e) { Loggers.SRV_LOG.warn("NamingProxy", e); } return false; } ...}
//Restful methods for Partition protocol.@RestController@RequestMapping(UtilsAndCommons.NACOS_NAMING_CONTEXT + "/distro")public class DistroController { @Autowired private DistroProtocol distroProtocol; @Autowired private ServiceManager serviceManager; ... //Synchronize datum. @PutMapping("/datum") public ResponseEntity onSyncDatum(@RequestBody Map<String, Datum<Instances>> dataMap) throws Exception { if (dataMap.isEmpty()) { Loggers.DISTRO.error("[onSync] receive empty entity!"); throw new NacosException(NacosException.INVALID_PARAM, "receive empty entity!"); }

//遍历新增的服务实例对象 for (Map.Entry<String, Datum<Instances>> entry : dataMap.entrySet()) { if (KeyBuilder.matchEphemeralInstanceListKey(entry.getKey())) { //获取命名空间、服务实例名称 String namespaceId = KeyBuilder.getNamespace(entry.getKey()); String serviceName = KeyBuilder.getServiceName(entry.getKey()); if (!serviceManager.containService(namespaceId, serviceName) && switchDomain.isDefaultInstanceEphemeral()) { //创建空的服务Service,这和服务实例注册时一样 serviceManager.createEmptyService(namespaceId, serviceName, true); } DistroHttpData distroHttpData = new DistroHttpData(createDistroKey(entry.getKey()), entry.getValue()); //注册新的服务实例对象 distroProtocol.onReceive(distroHttpData); } } return ResponseEntity.ok("ok"); } ...}
@Componentpublic class DistroProtocol { ... //Receive synced distro data, find processor to process. public boolean onReceive(DistroData distroData) { String resourceType = distroData.getDistroKey().getResourceType(); //获取到DistroConsistencyServiceImpl DistroDataProcessor dataProcessor = distroComponentHolder.findDataProcessor(resourceType); if (null == dataProcessor) { Loggers.DISTRO.warn("[DISTRO] Can't find data process for received data {}", resourceType); return false; } //调用DistroConsistencyServiceImpl.processData()方法处理新增的服务实例 return dataProcessor.processData(distroData); } ...}
@DependsOn("ProtocolManager")@org.springframework.stereotype.Service("distroConsistencyService")public class DistroConsistencyServiceImpl implements EphemeralConsistencyService, DistroDataProcessor { //用于存储所有已注册的服务实例数据 private final DataStore dataStore; private volatile Notifier notifier = new Notifier(); ... @Override public boolean processData(DistroData distroData) { DistroHttpData distroHttpData = (DistroHttpData) distroData; Datum<Instances> datum = (Datum<Instances>) distroHttpData.getDeserializedContent(); //这里的onPut()方法和服务实例注册时调用的onPut()方法一样 onPut(datum.key, datum.value); return true; } public void onPut(String key, Record value) { if (KeyBuilder.matchEphemeralInstanceListKey(key)) { //创建Datum对象,把服务key和服务的所有服务实例Instances放入Datum对象中 Datum<Instances> datum = new Datum<>(); datum.value = (Instances) value; datum.key = key; datum.timestamp.incrementAndGet(); //添加到DataStore的Map对象里 dataStore.put(key, datum); } if (!listeners.containsKey(key)) { return; } //添加处理任务 notifier.addTask(key, DataOperation.CHANGE); } @Override public void put(String key, Record value) throws NacosException { //把包含了当前注册的服务实例的、最新的服务实例列表,存储到DataStore对象中 onPut(key, value); //在集群架构下,DistroProtocol.sync()方法会进行集群节点的服务实例数据同步 distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE, globalConfig.getTaskDispatchPeriod() / 2); } ...}
复制代码


(4)总结


一开始调用 DistroConsistencyServiceImpl 的 put()方法进行服务实例注册时,会调用 DistroProtocol 的 sync()方法同步新增的服务实例给其他集群节点,然后会构造延迟任务存储在 Map 中 + 异步任务处理,接着继续构造同步任务存储在阻塞队列 Queue 中 + 异步任务处理,最后异步任务会发起 HTTP 请求来进行服务实例的数据同步,最终又调用回 DistroConsistencyServiceImpl 的 onPut()方法来更新注册表。所以集群的每个节点都会有所有服务实例的数据。

 

之所以使用双层内存队列,而不是使用一个内存队列,直接将同步新增服务实例的任务异步交给 TaskExecuteWorker 进行处理,是因为希望通过加多一个内存队列进行中转来进一步提升处理的性能。服务实例的注册是有可能出现超高并发的,比如上千台机器同时启动,那么就会对 Nacos 服务端产生上千并发的服务实例注册请求。这时候如果只有一个内存队列,那么上千的新增服务实例的同步请求任务在竞争锁进入 TaskExecuteWorker 的阻塞队列(内存队列)时,就会让发起服务实例注册请求的 Nacos 客户端等待 Nacos 服务端响应的时间过长。



文章转载自:东阳马生架构

原文链接:https://www.cnblogs.com/mjunz/p/18860669

体验地址:http://www.jnpfsoft.com/?from=001YH

用户头像

还未添加个人签名 2023-06-19 加入

还未添加个人简介

评论

发布
暂无评论
Nacos源码—Nacos集群高可用分析_Java_不在线第一只蜗牛_InfoQ写作社区