实现说明
ChaosBlade-Operator 是 ChaosBlade 的 K8s 平台场景的实现。将混沌实验通过 Kubernetes 标准的 CRD 方式定义,用户可以像定义 Deployment 或 StatefulSet 那样定义 ChaosBlade 实验,只要对 kubectl 和 Kubernetes 对象有所了解,就可以轻松的创建、更新和删除实验场景;同时也可以通过 chaosblade cli 工具来操作实验场景。
关于 ChaosBlade 原理你可以参考此文:
安装
使用 Helm 3 安装:
# 下载安装包[root@s5 k8s]# wget -qO chaosblade-operator-1.2.0-v3.tgz https://chaosblade.oss-cn-hangzhou.aliyuncs.com/agent/github/1.2.0/chaosblade-operator-1.2.0-v3.tgz
# 为 chaosblade 创建一个 namespace[root@s5 k8s]# kubectl create namespace chaosbladenamespace/chaosblade created
# 安装 ChaosBlade-Operator[root@s5 k8s]# ./helm-darwin-amd64/helm install chaos chaosblade-operator-1.2.0-v3.tgz --set webhook.enable=true --namespace=chaosbladeW0621 14:39:16.362347 42437 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinitionW0621 14:39:16.375507 42437 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinitionW0621 14:39:18.394761 42437 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinitionW0621 14:39:20.669546 42437 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleW0621 14:39:20.674105 42437 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBindingW0621 14:39:20.687832 42437 warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfigurationW0621 14:39:20.734308 42437 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleW0621 14:39:20.742406 42437 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBindingW0621 14:39:20.804103 42437 warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfigurationNAME: chaosLAST DEPLOYED: Mon Jun 21 14:39:20 2021NAMESPACE: chaosbladeSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:Thank you for using chaosblade.
# 查看安装结果[root@s5 k8s]# kubectl get pod -n chaosblade | grep chaosbladechaosblade-operator-67779995db-cs2lv 1/1 Running 0 4m49schaosblade-tool-58ch2 1/1 Running 0 3m27schaosblade-tool-qdwb6 1/1 Running 0 2m57schaosblade-tool-z8jds 1/1 Running 0 2m57s[root@s5 k8s]#
复制代码
ChaosBlade-Operator 启动后将会在每个节点部署一个 chaosblade-tool Pod 和一个 chaosblade-operator Pod,如果都运行正常,则安装成功。上面设置 --set webhook.enable=true 是为了 Pod 文件系统 I/O 故障实验,如果不需要进行该实验,则无需添加该设置。
示例应用准备
配置实验对象,这里使用 guestbook 应用,如果你已经有应用了,就不需要安装这个小的示例应用。
示例应用安装
## 添加helm仓库[root@s5 k8s]# helm repo add apphub-incubator https://apphub.aliyuncs.com/incubator/"apphub-incubator" has been added to your repositories
## 安装应用示例[root@s5 k8s]# helm install guestbook apphub-incubator/guestbook --set service.type=NodePort --namespace=chaosbladeNAME: guestbookLAST DEPLOYED: Mon Jun 21 22:42:41 2021NAMESPACE: chaosbladeSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:1. Get the application URL by running these commands: export NODE_PORT=$(kubectl get --namespace chaosblade -o jsonpath="{.spec.ports[0].nodePort}" services guestbook) export NODE_IP=$(kubectl get nodes --namespace chaosblade -o jsonpath="{.items[0].status.addresses[0].address}") echo http://$NODE_IP:$NODE_PORT[root@s5 k8s]# echo http://$NODE_IP:$NODE_PORThttp://172.31.184.225:32310
复制代码
默认的 Service 类型为 LoadBalancer,这里为了方便访问设置为了 NodePort。
示例应用安装验证
访问 http://nodeip:nodeport。
成功后可以看到这个界面,输入任何字符点击 submit 都会显示在上面,如此而已的一个小应用。
模拟 Pod 网络丢包场景
目标
对 redis-master-b96c9795b-4ghxq Pod 注入丢包率 50% 的故障,持续 10 分钟,只针对 IP 为 10.100.53.195 的 pod 生效,也就是除 10.100.53.195 以外的 pod 都能正常访问 redis-master-b96c9795b-4ghxq。
配置
当前网络信息如下:
[root@s5 chaosblade_scenarios]# kubectl get pods -n chaosblade -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESchaosblade-operator-67779995db-ns4qg 1/1 Running 0 100m 10.100.53.193 s7 <none> <none>chaosblade-tool-bhgbk 1/1 Running 0 100m 172.31.184.224 s7 <none> <none>chaosblade-tool-mqmrc 1/1 Running 0 100m 172.31.184.226 s6 <none> <none>chaosblade-tool-xqgl5 1/1 Running 0 100m 172.31.184.225 s5 <none> <none>guestbook-7fcc447874-q248s 1/1 Running 0 98m 10.100.53.194 s7 <none> <none>guestbook-7fcc447874-zpbn4 1/1 Running 0 98m 10.100.220.67 s6 <none> <none>mall-tiny-deployment-85bdb875cf-zl6jw 1/1 Running 0 54m 10.100.220.71 s6 <none> <none>redis-master-b96c9795b-4ghxq 1/1 Running 0 10m 10.100.53.196 s7 <none> <none>redis-slave-6b8d456947-c6h64 1/1 Running 0 98m 10.100.53.195 s7 <none> <none>redis-slave-6b8d456947-twgk9 1/1 Running 0 98m 10.100.220.68 s6 <none> <none>
复制代码
配置文件如下。
[root@s5 chaosblade_scenarios]# cat loss_pod_network_by_names.yamlapiVersion: chaosblade.io/v1alpha1kind: ChaosBlademetadata: name: loss-pod-network-by-namesspec: experiments: - scope: pod target: network action: loss desc: "loss pod network by names" matchers: - name: names value: - "redis-master-b96c9795b-4ghxq" - name: namespace value: - "chaosblade" - name: interface value: ["eth0"] - name: percent value: ["50"] - name: timeout value: ["600"] - name: destination-ip value: ["10.100.53.195"][root@s5 chaosblade_scenarios]#
复制代码
执行
[root@s5 chaosblade_scenarios]# kubectl apply -f loss_pod_network_by_names.yamlchaosblade.chaosblade.io/loss-pod-network-by-names created[root@s5 chaosblade_scenarios]#
复制代码
验证
登录到 10.100.53.195 服务器,执行 ping 命令。
[root@s5 chaosblade_scenarios]# kubectl exec -it redis-slave-6b8d456947-c6h64 bash -n chaosbladekubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.[ root@redis-slave-6b8d456947-c6h64:/data ]$ ping 10.100.53.196PING 10.100.53.196 (10.100.53.196) 56(84) bytes of data.64 bytes from 10.100.53.196: icmp_seq=1 ttl=63 time=0.112 ms64 bytes from 10.100.53.196: icmp_seq=2 ttl=63 time=0.096 ms64 bytes from 10.100.53.196: icmp_seq=3 ttl=63 time=0.098 ms64 bytes from 10.100.53.196: icmp_seq=4 ttl=63 time=0.091 ms64 bytes from 10.100.53.196: icmp_seq=7 ttl=63 time=0.092 ms64 bytes from 10.100.53.196: icmp_seq=8 ttl=63 time=0.084 ms64 bytes from 10.100.53.196: icmp_seq=13 ttl=63 time=0.085 ms64 bytes from 10.100.53.196: icmp_seq=14 ttl=63 time=0.088 ms64 bytes from 10.100.53.196: icmp_seq=17 ttl=63 time=0.086 ms^C--- 10.100.53.196 ping statistics ---17 packets transmitted, 9 received, 47% packet loss, time 15999msrtt min/avg/max/mdev = 0.084/0.092/0.112/0.012 ms[ root@redis-slave-6b8d456947-c6h64:/data ]$
复制代码
确实产生近 50%的丢包。
进入 master 机器。检查网络队列。
[root@s5 chaosblade_scenarios]# kubectl exec -it redis-master-b96c9795b-4ghxq bash -n chaosbladekubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.root@redis-master-b96c9795b-4ghxq:/data# tc qdisc ls dev eth0qdisc prio 1: root refcnt 2 bands 4 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1qdisc netem 40: parent 1:4 limit 1000 loss 50%root@redis-master-b96c9795b-4ghxq:/data#
复制代码
确实产生队列规则。
可见这个模拟是通过操作 qdisc 来实现的。如果你有兴趣的话,可以去查一下 linux 上的 traffic control 的逻辑。
恢复
[root@s5 chaosblade_scenarios]# kubectl delete -f loss_pod_network_by_names.yamlchaosblade.chaosblade.io "loss-pod-network-by-names" deleted
复制代码
留个思考的空间:
相关系列:
评论