[kube 022] 混沌测试框架 -Litmus

用户头像
zbyufei
关注
发布于: 2020 年 06 月 06 日
[kube 022] 混沌测试框架-Litmus

[kube 022] 混沌测试框架-Litmus

Litmus是进行云原生混沌工程的开源工具集。Litmus提供了一些工具来协调Kubernetes上的混乱情况,以帮助SRE发现其部署中的弱点。SRE最初在staging环境中使用Litmus进行混乱的实验,最终在production中使用它们来发现错误和漏洞。修复弱点可提高系统的弹性。

Litmus采用云原生方法来创建,管理和监视混乱。使用以下Kubernetes Custom Resource Definitions (CRDs)编排混沌:

  • ChaosEngine:将Kubernetes应用程序或Kubernetes节点链接到ChaosExperiment的资源。LitmusChaos-Operator监视ChaosEngine,然后调用Chaos-Experiments

  • ChaosExperiment:用于分组混沌实验的配置参数的资源。当ChaosEngine调用实验时,ChaosExperiment CRoperator创建。

  • ChaosResult:用于保存混沌实验结果的资源。Chaos-exporter读取结果并将度量导出到已配置的Prometheus服务器中。

混沌实验位于hub.litmuschaos.io上。它是应用程序开发人员和云厂商共享混乱实验的中央枢纽,以便他们的用户可以使用它们来提高应用程序在生产中的弹性。

在本文中,我们将运行一些混乱的实验来验证系统的弹性。

准备工作

请准备好一个Kubernetes集群以及链接该集群的kubectlhelm

操作记录

主要分为以下过程:

  • 安装Litmus Operator

  • 使用Chaos Charts

  • 创建Pod删除混沌实验

  • 查看混沌的实验结果

  • 查看混沌的实验日志

安装 Litmus Operator

让我们执行以下步骤在集群中安装Litmus

  1. 安装 litmus-operator-latest

❯ kubectl apply -f "https://litmuschaos.github.io/pages/litmus-operator-latest.yaml"
namespace/litmus created
serviceaccount/litmus created
clusterrole.rbac.authorization.k8s.io/litmus created
clusterrolebinding.rbac.authorization.k8s.io/litmus created
deployment.apps/chaos-operator-ce created
customresourcedefinition.apiextensions.k8s.io/chaosengines.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosexperiments.litmuschaos.io created
customresourcedefinition.apiextensions.k8s.io/chaosresults.litmuschaos.io created
  1. 确认 Litmus Chaos Operatorpod 运行状态:

❯ kubectl get pods -n litmus
NAME READY STATUS RESTARTS AGE
chaos-operator-ce-7c76fc797f-7nm42 1/1 Running 0 67s
  1. 确认 chaos CRDs

❯ kubectl get crds -n litmus
chaosengines.litmuschaos.io 2020-06-05T13:08:05Z
chaosexperiments.litmuschaos.io 2020-06-05T13:08:05Z
chaosresults.litmuschaos.io 2020-06-05T13:08:05Z
  1. 确认 chaos api resources

❯ kubectl api-resources | grep chaos
chaosengines litmuschaos.io true ChaosEngine
chaosexperiments litmuschaos.io true ChaosExperiment
chaosresults litmuschaos.io true ChaosResult
  1. 确认 chaos clusterrolechaos clusterrolebinding

❯ kubectl get clusterroles,clusterrolebinding | grep "litmus\|chaos"
clusterrole.rbac.authorization.k8s.io/litmus 2020-06-05T13:08:05Z
clusterrolebinding.rbac.authorization.k8s.io/litmus ClusterRole/litmus 6m39s

现在,我们在集群中已经正常运行了Litmus Chaos Operator。接下来,我们需要部署混乱的实验来测试集群资源的弹性。

使用 Chaos Charts

Litmus Chaos Charts用于安装混沌实验包。混沌实验包含实际的混沌细节。让我们执行以下步骤为Litmus Operator安装Litmus Chaos Charts

  1. 在浏览器上打开 https://hub.litmuschaos.io 上的 Chaos Charts for Kubernetes,然后在搜索字段中搜索 generic , 并点击 Generic Chaos chart

  2. 点击 Install All Experiments 按钮。

  3. 复制下载安装链接。

  4. 先创建一个测试的命令空间,如nginx

❯ kubectl create namespace nginx
namespace/nginx created
  1. 安装chaos experiment

❯ kubectl apply -f "https://hub.litmuschaos.io/api/chaos/1.4.0\?file\=charts/generic/experiments.yaml" -n nginx
chaosexperiment.litmuschaos.io/node-drain created
chaosexperiment.litmuschaos.io/disk-fill created
chaosexperiment.litmuschaos.io/pod-cpu-hog created
chaosexperiment.litmuschaos.io/pod-memory-hog created
chaosexperiment.litmuschaos.io/pod-network-corruption created
chaosexperiment.litmuschaos.io/pod-delete created
chaosexperiment.litmuschaos.io/pod-network-loss created
chaosexperiment.litmuschaos.io/disk-loss created
chaosexperiment.litmuschaos.io/pod-network-latency created
chaosexperiment.litmuschaos.io/node-cpu-hog created
chaosexperiment.litmuschaos.io/node-memory-hog created
chaosexperiment.litmuschaos.io/container-kill created
  1. 确认chaos experiments状态:

❯ kubectl get chaosexperiments -n nginx
NAME AGE
container-kill 4m6s
disk-fill 4m6s
disk-loss 4m6s
node-cpu-hog 4m6s
node-drain 4m6s
node-memory-hog 4m6s
pod-cpu-hog 4m6s
pod-delete 4m6s
pod-memory-hog 4m6s
pod-network-corruption 4m6s
pod-network-latency 4m6s
pod-network-loss 4m6s

通用混沌图表下提供了混沌实验方案,如Pod删除,网络延迟,网络丢失和容器销毁。也可以安装或构建自己的特定于应用程序的混沌图以运行特定于应用程序的混沌。

创建 Pod 删除混沌实验

我们将部署一个示例应用程序nginx,并对该应用程序进行Kubernetes混沌实验。让我们执行以下步骤来测试pod删除对集群的影响:

  1. 创建nginx.yaml并使用kubectl部署:

❯ cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx-deployment
namespace: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
❯ kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment created
  1. 确认Pod的运行状态:

❯ kubectl get pod -n nginx
NAME READY STATUS RESTARTS AGE
nginx-deployment-558fc78868-269v5 1/1 Running 0 99s
nginx-deployment-558fc78868-cblpc 1/1 Running 0 99s
  1. 添加注释,应用程序必须带有注释litmuschaos.io/chaos="true"。作为安全措施,以及作为减小爆炸半径的一种方法,chaos operator会在应用程序上调用混沌实验之前检查此注释。

❯ kubectl annotate deploy nginx-deployment litmuschaos.io/chaos="true" -n nginx
deployment.apps/nginx-deployment annotated


注意:

Litmus

支持对deploymentsstatefulsetsdaemonsets进行混乱。



  1. 设置Service Accountrbac

$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-delete-sa
namespace: nginx
labels:
name: pod-delete-sa
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: pod-delete-sa
namespace: nginx
labels:
name: pod-delete-sa
rules:
- apiGroups: ["","litmuschaos.io","batch","apps"]
resources: ["pods","deployments","pods/log","events","jobs","chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: pod-delete-sa
namespace: nginx
labels:
name: pod-delete-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-delete-sa
subjects:
- kind: ServiceAccount
name: pod-delete-sa
namespace: nginx
EOF
serviceaccount/pod-delete-sa created
role.rbac.authorization.k8s.io/pod-delete-sa created
rolebinding.rbac.authorization.k8s.io/pod-delete-sa created
  1. 查看实验CRschaos参数:

❯ kubectl get chaosexperiment pod-delete -o yaml -n nginx
apiVersion: litmuschaos.io/v1alpha1
description:
message: |
Deletes a pod belonging to a deployment/statefulset/daemonset
kind: ChaosExperiment
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"litmuschaos.io/v1alpha1","description":{"message":"Deletes a pod belonging to a deployment/statefulset/daemonset\n"},"kind":"ChaosExperiment","metadata":{"annotations":{},"name":"pod-delete","namespace":"nginx","version":"0.1.13"},"spec":{"definition":{"args":["-c","ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0"],"command":["/bin/bash"],"env":[{"name":"ANSIBLE_STDOUT_CALLBACK","value":"default"},{"name":"TOTAL_CHAOS_DURATION","value":"15"},{"name":"RAMP_TIME","value":""},{"name":"KILL_COUNT","value":""},{"name":"FORCE","value":"true"},{"name":"CHAOS_INTERVAL","value":"5"},{"name":"LIB","value":""}],"image":"litmuschaos/ansible-runner:1.4.0","labels":{"name":"pod-delete"},"permissions":[{"apiGroups":["","apps","batch","litmuschaos.io"],"resources":["deployments","jobs","pods","pods/log","events","configmaps","chaosengines","chaosexperiments","chaosresults"],"verbs":["create","list","get","patch","update","delete"]},{"apiGroups":[""],"resources":["nodes"],"verbs":["get","list"]}],"scope":"Namespaced"}}}
creationTimestamp: "2020-06-05T13:22:17Z"
generation: 1
managedFields:
- apiVersion: litmuschaos.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:description:
.: {}
f:message: {}
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
.: {}
f:definition:
.: {}
f:args: {}
f:command: {}
f:env: {}
f:image: {}
f:labels:
.: {}
f:name: {}
f:permissions: {}
f:scope: {}
manager: kubectl
operation: Update
time: "2020-06-05T13:22:17Z"
name: pod-delete
namespace: nginx
resourceVersion: "3465"
selfLink: /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosexperiments/pod-delete
uid: 1ea49dc0-2e58-41ff-9953-2e4844702aaa
spec:
definition:
args:
- -c
- ansible-playbook ./experiments/generic/pod_delete/pod_delete_ansible_logic.yml
-i /etc/ansible/hosts -vv; exit 0
command:
- /bin/bash
env:
- name: ANSIBLE_STDOUT_CALLBACK
value: default
- name: TOTAL_CHAOS_DURATION
value: "15"
- name: RAMP_TIME
value: ""
- name: KILL_COUNT
value: ""
- name: FORCE
value: "true"
- name: CHAOS_INTERVAL
value: "5"
- name: LIB
value: ""
image: litmuschaos/ansible-runner:1.4.0
labels:
name: pod-delete
permissions:
- apiGroups:
- ""
- apps
- batch
- litmuschaos.io
resources:
- deployments
- jobs
- pods
- pods/log
- events
- configmaps
- chaosengines
- chaosexperiments
- chaosresults
verbs:
- create
- list
- get
- patch
- update
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
scope: Namespaced
  1. 运行Chaos

cat <<EOF | kubectl apply -f -
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-chaos
namespace: nginx
spec:
appinfo:
appns: 'nginx'
applabel: 'app=nginx'
appkind: 'deployment'
# It can be true/false
annotationCheck: 'true'
# It can be active/stop
engineState: 'active'
#ex. values: ns1:name=percona,ns2:run=nginx
auxiliaryAppInfo: ''
chaosServiceAccount: pod-delete-sa
monitoring: false
# It can be delete/retain
jobCleanUpPolicy: 'delete'
experiments:
- name: pod-delete
spec:
components:
env:
# set chaos duration (in sec) as desired
- name: TOTAL_CHAOS_DURATION
value: '30'
# set chaos interval (in sec) as desired
- name: CHAOS_INTERVAL
value: '10'
# pod failures without '--force' & default terminationGracePeriodSeconds
- name: FORCE
value: 'false'
EOF

查看混沌的实验结果

混沌实验是作为Kubernetes作业执行的,受影响的豆荚将由混沌执行者根据实验定义删除。

让我们执行以下步骤来回顾我们的混沌实验的结果:

  1. 观看正在进行的实验:

$ watch -n 1 kubectl get pods -n nginx
Every 1.0s: kubectl get pods -n nginx 192.168.1.102: Sat Jun 6 01:35:22 2020
NAME READY STATUS RESTARTS AGE
nginx-chaos-runner 1/1 Running 0 31s
nginx-deployment-558fc78868-f4tcd 0/1 Terminating 0 3m54s
nginx-deployment-558fc78868-g6wjm 0/1 ContainerCreating 0 1s
nginx-deployment-558fc78868-wbzd2 1/1 Running 0 3m38s
pod-delete-xb472u-rvjc8 1/1 Running 0 24s
  1. 获取结果列表:

❯ kubectl get chaosresults -n nginx
NAME AGE
nginx-chaos-pod-delete 11m
  1. 查看nginx-chaos-pod-delete实验结果:

❯ kubectl describe chaosresults nginx-chaos-pod-delete -n nginx
Name: nginx-chaos-pod-delete
Namespace: nginx
Labels: chaosUID=7181dd32-dcd2-44c8-b9a1-62f76b4426d4
type=ChaosResult
Annotations: API Version: litmuschaos.io/v1alpha1
Kind: ChaosResult
Metadata:
Creation Timestamp: 2020-06-05T17:25:49Z
Generation: 6
Managed Fields:
API Version: litmuschaos.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:labels:
.:
f:chaosUID:
f:type:
f:spec:
.:
f:engine:
f:experiment:
f:status:
.:
f:experimentstatus:
.:
f:failStep:
f:phase:
f:verdict:
Manager: kubectl
Operation: Update
Time: 2020-06-05T17:37:39Z
Resource Version: 8339
Self Link: /apis/litmuschaos.io/v1alpha1/namespaces/nginx/chaosresults/nginx-chaos-pod-delete
UID: bffa195d-4bf5-47a3-9a6c-7f2287107ea5
Spec:
Engine: nginx-chaos
Experiment: pod-delete
Status:
Experimentstatus:
Fail Step: N/A
Phase: Completed
Verdict: Pass
Events: <none>

查看混沌的实验事件

可以查看指定命名空间nginx下的events来了解和还原我们的混沌实验:

❯ kubectl get events -n nginx --sort-by='{.lastTimestamp}'
LAST SEEN TYPE REASON OBJECT MESSAGE
13m Normal ChaosInject chaosengine/nginx-chaos Injecting pod-delete chaos on nginx-deployment-558fc78868-s26cl pod
13m Normal Scheduled pod/nginx-deployment-558fc78868-sgswp Successfully assigned nginx/nginx-deployment-558fc78868-sgswp to minikube
13m Normal Killing pod/nginx-deployment-558fc78868-s26cl Stopping container nginx
13m Normal SuccessfulCreate replicaset/nginx-deployment-558fc78868 (combined from similar events): Created pod: nginx-deployment-558fc78868-sgswp
13m Normal Pulled pod/nginx-deployment-558fc78868-sgswp Container image "nginx" already present on machine
13m Normal Started pod/nginx-deployment-558fc78868-sgswp Started container nginx
13m Normal Created pod/nginx-deployment-558fc78868-sgswp Created container nginx
12m Normal PostChaosCheck chaosengine/nginx-chaos AUT is Running successfully
12m Normal Summary chaosengine/nginx-chaos pod-delete Experiment Passed!
12m Normal Completed job/pod-delete-xb472u Job completed
12m Normal ExperimentJobCleanUp chaosengine/nginx-chaos Experiment Job 'pod-delete-xb472u' is deleted
12m Normal Killing pod/nginx-chaos-runner Stopping container chaos-runner
12m Normal ChaosEngineCompleted chaosengine/nginx-chaos Chaos Engine completed, will delete or retain the resources according to jobCleanUpPolicy

更多内容



发布于: 2020 年 06 月 06 日 阅读数: 63
用户头像

zbyufei

关注

和我一起 “互动云原生” 吧 2017.10.26 加入

云原生爱好者和实践者(All in Cloud Native)

评论

发布
暂无评论
[kube 022] 混沌测试框架-Litmus