写点什么

Docker 下 RabbitMQ 四部曲之四:高可用实战

作者:程序员欣宸
  • 2022 年 5 月 29 日
  • 本文字数:8775 字

    阅读完需:约 29 分钟

Docker下RabbitMQ四部曲之四:高可用实战

欢迎访问我的 GitHub

这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos


  • 本章是《Docker 下 RabbitMQ 四部曲》系列的终篇,今天的我们一起来体验 Rabbit'MQ 集群的高可用能力,看看 RabbitMQ 集群中的部分节点宕机时,是否还能生产和消费消息;

前文链接

  • 《Docker 下 RabbitMQ 四部曲》的前三篇链接如下:


  1. 《Docker下RabbitMQ四部曲之一:极速体验(单机和集群)》

  2. 《Docker下RabbitMQ四部曲之二:细说RabbitMQ镜像制作》

  3. 《Docker下RabbitMQ四部曲之三:细说java开发》

实战概要

  • 今天实战的步骤如下:


  1. 制作 docker-compose.yml 文件,为每个容器配置好参数;

  2. 启动所有容器,包括 RabbitMQ 集群、消息生产者的 web 应用、消息消费者的 web 应用;

  3. 逐个停止集群中的 RabbitMQ 容器,每停止一个,就验证一次消息的生产和消费;

  4. 逐个恢复集群中的 RabbitMQ 容器,每恢复一个,就验证一次消息的生产和消费;

制作 docker-compose.yml 文件

  • 本次实战会创建 6 个容器,整理如下:


  • 前面章节的实战中,我们也创建了上述六个容器,今天依然是六个,并且身份角色不变,变化的地方主要是以下三点:


  1. 负责生产消息的 hacluster_producer_1 容器,前面章节只连接了一个 RabbitMQ 容器,本章会连接三个;

  2. 负责消费消息的 hacluster_consumer1_1,前面章节只连接了一个 RabbitMQ 容器,本章会连接三个;

  3. 负责消费消息的 hacluster_consumer2_1,前面章节只连接了一个 RabbitMQ 容器,本章会连接三个;


  • 基于以上总结,我们写出的 docker-compose.yml 文件内容如下:


version: '2'services:  rabbit1:    image: bolingcavalry/rabbitmq-server:0.0.3    hostname: rabbit1    ports:      - "15672:15672"    environment:      - RABBITMQ_DEFAULT_USER=admin      - RABBITMQ_DEFAULT_PASS=888888  rabbit2:    image: bolingcavalry/rabbitmq-server:0.0.3    hostname: rabbit2    depends_on:      - rabbit1    links:      - rabbit1    environment:     - CLUSTERED=true     - CLUSTER_WITH=rabbit1     - RAM_NODE=true     - HA_ENABLE=true    ports:      - "15673:15672"  rabbit3:    image: bolingcavalry/rabbitmq-server:0.0.3    hostname: rabbit3    depends_on:      - rabbit2    links:      - rabbit1      - rabbit2    environment:      - CLUSTERED=true      - CLUSTER_WITH=rabbit1    ports:      - "15675:15672"  producer:    image: bolingcavalry/rabbitmqproducer:0.0.2-SNAPSHOT    hostname: producer    depends_on:      - rabbit3    links:      - rabbit1:rabbitmqhost1      - rabbit2:rabbitmqhost2      - rabbit3:rabbitmqhost3    ports:      - "18080:8080"    environment:      - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672      - mq.rabbit.username=admin      - mq.rabbit.password=888888  consumer1:    image: bolingcavalry/rabbitmqconsumer:0.0.5-SNAPSHOT    hostname: consumer1    depends_on:      - producer    links:      - rabbit1:rabbitmqhost1      - rabbit2:rabbitmqhost2      - rabbit3:rabbitmqhost3    environment:     - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672     - mq.rabbit.username=admin     - mq.rabbit.password=888888     - mq.rabbit.queue.name=consumer1.queue  consumer2:    image: bolingcavalry/rabbitmqconsumer:0.0.5-SNAPSHOT    hostname: consumer2    depends_on:      - consumer1    links:      - rabbit1:rabbitmqhost1      - rabbit2:rabbitmqhost2      - rabbit3:rabbitmqhost3    environment:      - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672      - mq.rabbit.username=admin      - mq.rabbit.password=888888      - mq.rabbit.queue.name=consumer2.queue
复制代码


以上的 docker-compose.yml 文件,有以下两点需要注意:


  • rabbit2:增加了一个环境变量 HA_ENABLE=true《Docker下RabbitMQ四部曲之二:细说RabbitMQ镜像制作》一文中分析镜像制作的时候曾提到过,容器创建时 startrabbit.sh 脚本中会检查这个环境变量,如果为 true,就会执行命令:rabbitmqctl set_policy HA '^(?!amq.).*' '{"ha-mode": "all"}',该命令会将队列设置为镜像模式,在三个 Rabbit MQ 之间同步;

  • producer、consumer1、consumer3 这三个容器的环境变量 mq.rabbit.address,都设置成了三个 RabbitMQ 容器的地址加端口:rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672

启动所有容器

  • 在刚刚创建的 docker-compose.yml 文件所在目录下执行命令 docker-compose up -d,即可创建所有容器,创建完成后执行以下操作来确认是否启动成功:

  • 例如我的电脑 IP 地址是 192.168.119.155,那么在浏览器输入地址:192.168.119.155:15672 即可访问 RabbitMQ 的管理页面,用户名:admin,密码:888888,如下图:

  • 点击"Exchanges"这个 Tab 页,如下图,看到交换机创建成功,HA 模式:

  • 点击"Queues"这个 Tab 页,如下图,看到两个队列创建成功,HA 模式:

  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,即可控制容器 hacluster_producer_1 生产一条消息,如下图:

  • 在控制台输入 docker logs -f hacluster_consumer1_1,即可看到 hacluster_consumer1_1 消费消息的日志,如下:


2018-05-19 11:21:44.217  INFO 1 --- [           main] c.b.r.RabbitmqconsumerApplication        : Started RabbitmqconsumerApplication in 29.099 seconds (JVM running for 39.398)2018-05-19 11:36:21.332  INFO 1 --- [cTaskExecutor-1] c.b.r.receiver.FanoutReceiver            : receive message : hello, aaa, bbb, 2018-05-19 11:36:21
复制代码


  • 看来整个 RabbitMQ 集群的生产和消费是没有问题的,接下来通过停止容器的方式来模拟生产环境的宕机;

逐个停止集群中的 RabbitMQ 容器

  • 先停 hacluster_rabbit1_1 ,执行命令 docker stop hacluster_rabbit1_1,如下:


root@maven:~# docker stop hacluster_rabbit1_1hacluster_rabbit1_1
复制代码


  • 去管理页面看看,由于 hacluster_rabbit1_1 容器已经停止了,所以我们要访问 hacluster_rabbit2_1 容器提供的 web 页面:http://192.168.119.155:15673,如下图红框,可以看见页面提示节点故障:

  • 交换机和队列的页面并无异常;

  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,尝试发送一条消息,响应时间明显变长,但是依然会返回操作成功;

  • 在控制台输入 docker logs -f hacluster_producer_1,查看生产消息的 web 容器的日志,如下:


2018-05-19 11:43:22.681  WARN 1 --- [172.19.0.2:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 11:43:22.703 ERROR 1 --- [172.19.0.2:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 11:53:31.836  INFO 1 --- [io-8080-exec-10] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 11:53:46.878  INFO 1 --- [io-8080-exec-10] o.s.a.r.c.CachingConnectionFactory       : Created new connection: connectionFactory#4ae3c1cd:1/SimpleConnection@44028da7 [delegate=amqp://admin@172.19.0.3:5672/, localPort= 37818]
复制代码


  • 从日志中可以清晰的看到,停止 hacluster_rabbit1_1 容器是,消息生产者会立即报异常,但是不会自动重连,等到发送消息的时候,才会连接到新的 RabbitMQ,这次连接的是 hacluster_rabbit2_1 ;

  • 在控制台输入 docker logs -f hacluster_consumer1_1,查看消费消息的 web 容器的日志,如下:


2018-05-19 11:38:14.945  INFO 1 --- [cTaskExecutor-1] c.b.r.receiver.FanoutReceiver            : receive message : hello, aaa, bbb, 2018-05-19 11:38:142018-05-19 11:43:22.672  WARN 1 --- [172.19.0.2:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 11:43:22.726 ERROR 1 --- [172.19.0.2:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 11:43:23.163  INFO 1 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@9f116cc: tags=[{amq.ctag-0csUBn5OQiTGEphcGI2p3A=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.2:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@5a1a52da [delegate=amqp://admin@172.19.0.2:5672/, localPort= 34240], acknowledgeMode=AUTO local queue size=02018-05-19 11:43:23.181  INFO 1 --- [cTaskExecutor-2] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 11:43:29.042  INFO 1 --- [cTaskExecutor-2] o.s.a.r.c.CachingConnectionFactory       : Created new connection: connectionFactory#45f45fa1:1/SimpleConnection@2b9e231d [delegate=amqp://admin@172.19.0.3:5672/, localPort= 49624]2018-05-19 11:53:46.899  INFO 1 --- [cTaskExecutor-2] c.b.r.receiver.FanoutReceiver            : receive message : hello, aaa, bbb, 2018-05-19 11:53:31
复制代码


  • 从日志上可以看出:RabbitMQ 宕机的时候,消费者会立即重连到集群中的其他机器;(日志关键字:Created new connection)

  • 停掉 RabbitMQ 集群中的第二个容器,执行命令 docker stop hacluster_rabbit2_1

  • 访问管理页面的时候,要输入容器 hacluster_rabbit3_1 的地址:http://192.168.119.155:15675,基本情况如下图,两个节点的问题都能看到:

  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,尝试发送一条消息,响应时间再次明显变长,但是依然会返回操作成功;

  • 在控制台输入 docker logs -f hacluster_producer_1,查看生产消息的 web 容器的日志,如下,提示重连成功,这次连接到了容器 hacluster_rabbit3_1 :


2018-05-19 12:07:45.322  WARN 1 --- [172.19.0.3:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 12:07:45.334 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:07:45.336 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:12:06.404  INFO 1 --- [nio-8080-exec-4] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 12:12:41.467  INFO 1 --- [nio-8080-exec-4] o.s.a.r.c.CachingConnectionFactory       : Created new connection: connectionFactory#4ae3c1cd:2/SimpleConnection@6d23e50 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 54310]
复制代码


  • 在控制台输入 docker logs -f hacluster_consumer1_1,查看消费消息的 web 容器的日志,如下:


2018-05-19 12:07:45.327  WARN 1 --- [172.19.0.3:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 12:07:45.346 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:07:45.348 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:07:45.427  INFO 1 --- [cTaskExecutor-2] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@317c5a8a: tags=[{amq.ctag-ZKT8Q4gcU9v7bA-lNOUEFQ=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.3:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@2b9e231d [delegate=amqp://admin@172.19.0.3:5672/, localPort= 49624], acknowledgeMode=AUTO local queue size=02018-05-19 12:07:45.432  INFO 1 --- [cTaskExecutor-3] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 12:08:07.352  INFO 1 --- [cTaskExecutor-3] o.s.a.r.c.CachingConnectionFactory       : Created new connection: connectionFactory#45f45fa1:2/SimpleConnection@71dadbb0 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 34416]2018-05-19 12:12:56.869  INFO 1 --- [cTaskExecutor-3] c.b.r.receiver.FanoutReceiver            : receive message : hello, aaa, bbb, 2018-05-19 12:12:06
复制代码


日志显示,也是连到了容器 hacluster_rabbit3_1 ,并且消费消息成功;


  • 停掉 RabbitMQ 集群中的第三个容器(也是最后一个),执行命令 docker stop hacluster_rabbit3_1

  • 这次没有管理页面看了......

  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,尝试发送一条消息,长时间等待后,页面提示错误如下图:

  • 查看容器 hacluster_producer_1 的日志,如下:


2018-05-19 12:18:27.812  WARN 1 --- [172.19.0.4:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 12:18:27.813 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:18:27.813 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:18:55.836  INFO 1 --- [nio-8080-exec-7] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 12:19:50.921 ERROR 1 --- [nio-8080-exec-7] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.amqp.AmqpIOException: java.net.UnknownHostException: rabbitmqhost3] with root cause
java.net.UnknownHostException: rabbitmqhost3 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[na:1.8.0_111] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_111] at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_111] at com.rabbitmq.client.impl.SocketFrameHandlerFactory.create(SocketFrameHandlerFactory.java:60) ~[amqp-client-5.1.2.jar!/:5.1.2] at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:955) ~[amqp-client-5.1.2.jar!/:5.1.2] at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:907) ~[amqp-client-5.1.2.jar!/:5.1.2] at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:847) ~[amqp-client-5.1.2.jar!/:5.1.2] at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:449) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:614) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:564) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:538) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:520) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE].........
复制代码


  • 如上所示,异常信息为连接 RabbitMQ 服务器失败;

  • 查看容器 hacluster_consumer1_1 的日志,如下:


2018-05-19 12:18:27.815  WARN 1 --- [172.19.0.4:5672] c.r.c.impl.ForgivingExceptionHandler     : An unexpected connection driver error occured (Exception message: Connection reset)2018-05-19 12:18:27.816 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:18:27.816 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory       : Channel shutdown: connection error2018-05-19 12:18:28.100  INFO 1 --- [cTaskExecutor-3] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@5b0307b0: tags=[{amq.ctag-0UhQ6jE-D5Wl2ZPl4EWhDQ=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.4:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@71dadbb0 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 34416], acknowledgeMode=AUTO local queue size=02018-05-19 12:18:28.104  INFO 1 --- [cTaskExecutor-4] o.s.a.r.c.CachingConnectionFactory       : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]2018-05-19 12:19:23.178 ERROR 1 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerContainer : Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpIOException: java.net.UnknownHostException: rabbitmqhost3 at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:476) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE] at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:614) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
复制代码


  • 如上所示,也是连接失败,并且,日志的最后会发现应用在自动尝试重新连接 RabbitMQ;

  • 至此,RabbitMQ 集群宕机模拟就完成了,结果说明在 HA 模式下,只要还有可用的节点,应用就会尝试连接,如果连接成功,消息的消费是不受影响的;

  • 目前 RabbitMQ 集群的所有容器都停掉了,接下来我们逐个恢复刚才停下来的容器,看看服务能否恢复;

逐个恢复集群中的 RabbitMQ 容器

  • 先恢复 hacluster_rabbit1_1,执行命令 docker start hacluster_rabbit1_1

  • 执行命令 docker logs -f hacluster_rabbit1_1,查看容器日志,发现一直停留在下面的位置,不再更新:


  • 浏览器访问管理页面:http://192.168.119.155:15672,结果页面无法打开;

  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,尝试发送一条消息,会显示报错页面;

  • 去看生产者和消费者容器的日志,发现都是连接 RabbitMQ 失败的错误;

  • 从上述现象可以发现:恢复集群服务时,只恢复一台机器是不够的;

  • 再恢复 hacluster_rabbit2_1,执行命令 docker start hacluster_rabbit12_1

  • 管理页面依旧不可用,发送消息失败,并且生产者和消费者容器都无法连接到 RabbitMQ 容器上去;

  • 再恢复 hacluster_rabbit3_1,执行命令 docker start hacluster_rabbit3_1,这样整个集群的所有容器都恢复了;

  • hacluster_rabbit1_1 的日志发生了变化,如下:


  • hacluster_rabbit2_1 的日志也更新了,如下:


  • 管理页面可以正常打开,并且显示三个节点都正常了:


  • 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb ,尝试发送一条消息,消息的生产和消费都正常了;

  • 至此,RabbitMQ 的高可用实战已经完成了,从宕机到恢复我们都试了一遍,对 RabbitMQ 集群也有了更多直观的了解;

  • 文章写到这里,《Docker 下 RabbitMQ 四部曲》就全部结束了,希望这个系列能够对您在学习 RabbitMQ 过程中有所帮助,在 docker 实战中,也期待能对您自己定制的 RabbitMQ 镜像提供一些参考;

欢迎关注 InfoQ:程序员欣宸

学习路上,你不孤单,欣宸原创一路相伴...

发布于: 2022 年 05 月 29 日阅读数: 4
用户头像

搜索"程序员欣宸",一起畅游Java宇宙 2018.04.19 加入

前腾讯、前阿里员工,从事Java后台工作,对Docker和Kubernetes充满热爱,所有文章均为作者原创,个人Github:https://github.com/zq2599/blog_demos

评论

发布
暂无评论
Docker下RabbitMQ四部曲之四:高可用实战_Java_程序员欣宸_InfoQ写作社区