K8S 内置的 StatefulSet 为 Pods 分配连续的序号。比如 3 个副本时,Pods 分别为 pod-0, pod-1, pod-2。扩缩容时,必须在尾部增加或删除 Pods。比如扩容到 4 个副本时,会新增 pod-3。缩容到 2 副本时,会删除 pod-2。

在使用本地存储时,Pods 与 Nodes 存储资源绑定,无法自由调度。若希望删除掉中间某个 Pod ,以便维护其所在的 Node 但并没有其他 Node 可以迁移时,或者某个 Pod 故障想直接删除,另起一个序号不一样的 Pod 时,无法通过内置 StatefulSet 实现。

增强型 StatefulSet 控制器 基于内置 StatefulSet 实现,新增了自由控制 Pods 序号的功能。本文介绍如何在 TiDB Operator 中使用。


载入 Advanced StatefulSet 的 CRD 文件:

  • Kubernetes 1.16 之前版本:

  • Kubernetes 1.16 及之后版本:


[root@k8s-master tidb]# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.9/manifests/advanced-statefulset-crd.v1beta1.yamlWarning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinitioncustomresourcedefinition.apiextensions.k8s.io/statefulsets.apps.pingcap.com configured

在 TiDB Operator chart 的 values.yaml 中启用 AdvancedStatefulSet 特性:

##修改1features:- AdvancedStatefulSet=trueadvancedStatefulset:  create: true  ##修改2:advancedStatefulset:  create: true  image: pingcap/advanced-statefulset:v0.4.0  imagePullPolicy: IfNotPresent  serviceAccount: advanced-statefulset-controller  logLevel: 4  replicas: 1  resources:    limits:      cpu: 500m      memory: 300Mi    requests:      cpu: 200m      memory: 50Mi  



[root@k8s-master tidb]# helm inspect values pingcap/tidb-operator --version=v1.3.9 > /home/tidb/tidb-operator/values-tidb-operator.yaml[root@k8s-master tidb]# vim tidb-operator/values-tidb-operator.yaml[root@k8s-master tidb-operator]# cat  values-tidb-operator.yaml# Default values for tidb-operator
# clusterScoped is whether tidb-operator should manage kubernetes cluster wide tidb clusters# Also see rbac.create, controllerManager.serviceAccount, scheduler.create and controllerManager.clusterPermissions.clusterScoped: true
# Also see clusterScoped and controllerManager.serviceAccountrbac: create: true
# timezone is the default system timzonetimezone: UTC
# operatorImage is TiDB Operator imageoperatorImage: pingcap/tidb-operator:v1.3.9imagePullPolicy: IfNotPresent# imagePullSecrets: []
# tidbBackupManagerImage is tidb backup manager imagetidbBackupManagerImage: pingcap/tidb-backup-manager:v1.3.9
## Enable or disable tidb-operator features:## StableScheduling (default: true)# Enable stable scheduling of tidb servers.## AdvancedStatefulSet (default: false)# If enabled, tidb-operator will use AdvancedStatefulSet to manage pods# instead of Kubernetes StatefulSet.# It's ok to turn it on if this feature is not enabled. However it's not ok# to turn it off when the tidb-operator already uses AdvancedStatefulSet to# manage pods. This is in alpha phase.#features:- AdvancedStatefulSet=trueadvancedStatefulset: create: true# - AdvancedStatefulSet=false# - StableScheduling=true# - AutoScaling=false
appendReleaseSuffix: false
controllerManager: create: true # With rbac.create=false, the user is responsible for creating this account # With rbac.create=true, this service account will be created # Also see rbac.create and clusterScoped serviceAccount: tidb-controller-manager
# clusterPermissions are some cluster scoped permissions that will be used even if `clusterScoped: false`. # the default value of these fields is `true`. if you want them to be `false`, you MUST set them to `false` explicitly. clusterPermissions: nodes: true persistentvolumes: true storageclasses: true
logLevel: 2 replicas: 1 resources: requests: cpu: 80m memory: 50Mi# # REF: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/# priorityClassName: system-cluster-critical#
# REF: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection#LeaderElectionConfig ## leaderLeaseDuration is the duration that non-leader candidates will wait to force acquire leadership # leaderLeaseDuration: 15s ## leaderRenewDeadline is the duration that the acting master will retry refreshing leadership before giving up # leaderRenewDeadline: 10s ## leaderRetryPeriod is the duration the LeaderElector clients should wait between tries of actions # leaderRetryPeriod: 2s
## number of workers that are allowed to sync concurrently. default 5 # workers: 5
# autoFailover is whether tidb-operator should auto failover when failure occurs autoFailover: true # pd failover period default(5m) pdFailoverPeriod: 5m # tikv failover period default(5m) tikvFailoverPeriod: 5m # tidb failover period default(5m) tidbFailoverPeriod: 5m # tiflash failover period default(5m) tiflashFailoverPeriod: 5m # dm-master failover period default(5m) dmMasterFailoverPeriod: 5m # dm-worker failover period default(5m) dmWorkerFailoverPeriod: 5m ## affinity defines pod scheduling rules,affinity default settings is empty. ## please read the affinity document before set your scheduling rule: ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector nodeSelector: {} ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints. ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration tolerations: [] # - key: node-role # operator: Equal # value: tidb-operator # effect: "NoSchedule" ## Selector (label query) to filter on, make sure that this controller manager only manages the custom resources that match the labels ## refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#equality-based-requirement selector: [] # - canary-release=v1 # - k1==v1 # - k2!=v2
# SecurityContext is security config of this component, it will set template.spec.securityContext # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context securityContext: {} # runAsUser: 1000 # runAsGroup: 2000 # fsGroup: 2000 # PodAnnotations will set template.metadata.annotations # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/ podAnnotations: {}
scheduler: create: true # With rbac.create=false, the user is responsible for creating this account # With rbac.create=true, this service account will be created # Also see rbac.create and clusterScoped serviceAccount: tidb-scheduler logLevel: 2 replicas: 1 schedulerName: tidb-scheduler resources: limits: cpu: 250m memory: 150Mi requests: cpu: 80m memory: 50Mi kubeSchedulerImageName: k8s.gcr.io/kube-scheduler # This will default to matching your kubernetes version # kubeSchedulerImageTag: ## affinity defines pod scheduling rules,affinity default settings is empty. ## please read the affinity document before set your scheduling rule: ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector nodeSelector: {} ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints. ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration tolerations: [] # - key: node-role # operator: Equal # value: tidb-operator # effect: "NoSchedule" # # SecurityContext is security config of this component, it will set template.spec.securityContext # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context securityContext: {} # runAsUser: 1000 # runAsGroup: 2000 # fsGroup: 2000 # PodAnnotations will set template.metadata.annotations # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/ podAnnotations: {}
# additional annotations for the configmap, mainly to prevent spinnaker versioning the cm configmapAnnotations: {}
# When AdvancedStatefulSet feature is enabled, you must install# AdvancedStatefulSet controller.# Note that AdvancedStatefulSet CRD must be installed manually via the following# command:# kubectl apply -f manifests/advanced-statefulset-crd.v1beta1.yaml # k8s version < 1.16.0# kubectl apply -f manifests/advanced-statefulset-crd.v1.yaml # k8s version >= 1.16.0advancedStatefulset: create: true image: pingcap/advanced-statefulset:v0.4.0 imagePullPolicy: IfNotPresent serviceAccount: advanced-statefulset-controller logLevel: 4 replicas: 1 resources: limits: cpu: 500m memory: 300Mi requests: cpu: 200m memory: 50Mi ## affinity defines pod scheduling rules,affinity default settings is empty. ## please read the affinity document before set your scheduling rule: ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} ## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector nodeSelector: {} ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints. ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration tolerations: [] # - key: node-role # operator: Equal # value: tidb-operator # effect: "NoSchedule" # # SecurityContext is security config of this component, it will set template.spec.securityContext # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context securityContext: {} # runAsUser: 1000 # runAsGroup: 2000 # fsGroup: 2000
admissionWebhook: create: false replicas: 1 serviceAccount: tidb-admission-webhook logLevel: 2 rbac: create: true ## validation webhook would check the given request for the specific resource and operation validation: ## statefulsets hook would check requests for updating tidbcluster's statefulsets ## If enabled it, the statefulsets of tidbcluseter would update in partition by tidbcluster's annotation statefulSets: false ## validating hook validates the correctness of the resources under pingcap.com group pingcapResources: false ## mutation webhook would mutate the given request for the specific resource and operation mutation: ## defaulting hook set default values for the the resources under pingcap.com group pingcapResources: true ## failurePolicy are applied to ValidatingWebhookConfiguration which affect tidb-admission-webhook ## refer to https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy failurePolicy: ## the validation webhook would check the request of the given resources. ## If the kubernetes api-server version >= 1.15.0, we recommend the failurePolicy as Fail, otherwise, as Ignore. validation: Ignore ## the mutation webhook would mutate the request of the given resources. ## If the kubernetes api-server version >= 1.15.0, we recommend the failurePolicy as Fail, otherwise, as Ignore. mutation: Ignore ## tidb-admission-webhook deployed as kubernetes apiservice server ## refer to https://github.com/openshift/generic-admission-server apiservice: ## apiservice config ## refer to https://kubernetes.io/docs/tasks/access-kubernetes-api/configure-aggregation-layer/#contacting-the-extension-apiserver insecureSkipTLSVerify: true ## The Secret includes the TLS ca, cert and key for the `tidb-admission-webook.<Release Namespace>.svc` Service. ## If insecureSkipTLSVerify is true, this would be ignored. ## You can create the tls secret by: ## kubectl create secret generic <secret-name> --namespace=<release-namespace> --from-file=tls.crt=<path-to-cert> --from-file=tls.key=<path-to-key> --from-file=ca.crt=<path-to-ca> tlsSecret: "" ## The caBundle for the webhook apiservice, you could get it by the secret you created previously: ## kubectl get secret <secret-name> --namespace=<release-namespace> -o=jsonpath='{.data.ca\.crt}' caBundle: "" ## certProvider indicate the key and cert for the webhook configuration to communicate with `kubernetes.default` service. ## If your kube-apiserver's version >= 1.13.0, you can leave cabundle empty and the kube-apiserver ## would trust the roots on the apiserver. ## refer to https://github.com/kubernetes/api/blob/master/admissionregistration/v1/types.go#L529 ## or you can get the cabundle by: ## kubectl get configmap -n kube-system extension-apiserver-authentication -o=jsonpath='{.data.client-ca-file}' | base64 | tr -d '\n' cabundle: "" # SecurityContext is security config of this component, it will set template.spec.securityContext # Refer to https://kubernetes.io/docs/tasks/configure-pod-container/security-context securityContext: {} # runAsUser: 1000 # runAsGroup: 2000 # fsGroup: 2000 ## nodeSelector ensures that pods are only scheduled to nodes that have each of the indicated key-value pairs as labels ## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector nodeSelector: {} ## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints. ## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration tolerations: [] # - key: node-role # operator: Equal # value: tidb-operator # effect: "NoSchedule" #

更新相关的 tidb operator 配置

[root@k8s-master tidb]# helm upgrade tidb-operator pingcap/tidb-operator --namespace=tidb-admin --version=v1.3.9 -f ./tidb-operator/values-tidb-operator.yaml && kubectl get po -n tidb-admin -l app.kubernetes.io/name=tidb-operatorRelease "tidb-operator" has been upgraded. Happy Helming!NAME: tidb-operatorLAST DEPLOYED: Fri Dec  9 16:31:42 2022NAMESPACE: tidb-adminSTATUS: deployedREVISION: 7TEST SUITE: NoneNOTES:Make sure tidb-operator components are running:
kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operatorNAME READY STATUS RESTARTS AGEadvanced-statefulset-controller-67885c5dd9-lfmbl 1/1 Running 0 27mtidb-controller-manager-d5fc64f85-nlv4k 1/1 Running 1 34mtidb-scheduler-566f48d4bd-82rrd 2/2 Running 0 34m[root@k8s-master tidb]# kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operatorNAME READY STATUS RESTARTS AGEadvanced-statefulset-controller-67885c5dd9-lfmbl 1/1 Running 0 28mtidb-controller-manager-d5fc64f85-nlv4k 1/1 Running 1 34mtidb-scheduler-566f48d4bd-82rrd 2/2 Running 0 34m

TiDB Operator 通过开启 AdvancedStatefulSet 特性,会将当前 StatefulSet 对象转换成 AdvancedStatefulSet 对象。但是,TiDB Operator 不支持在关闭 AdvancedStatefulSet 特性后,自动从 AdvancedStatefulSet 转换为 Kubernetes 内置的 StatefulSet 对象。


查看 AdvancedStatefulSet 对象

AdvancedStatefulSet 数据格式与 StatefulSet 完全一致,但以 CRD 方式实现,别名为 asts ,可通过以下方法查看命名空间下的对象。

[root@k8s-master tidb]# kubectl get asts -ntidbNAME                                DESIRED   CURRENT   AGElqb-pd                              1         1         38mlqb-tidb                            1         1         38mlqb-tikv                            1         1         38mtidbmonitor-monitor                 1                   38mtidbngmonitoring-yz-ng-monitoring   1                   38myz-pd                               1         1         38myz-tidb                             2         2         38myz-tiflash                          2         2         38myz-tikv                             3         3         38m[root@k8s-master tidb]# kubectl get sts -ntidbNAME                                READY   AGEtidbmonitor-monitor                 1/1     46htidbngmonitoring-yz-ng-monitoring   1/1     2d

对集群指定的 pod 进行缩容

使用 AdvancedStatefulSet 在对 TiDBCluster 进行缩容时,除了减少副本数,可同时通过配置 annotations()对指定的 PD、TiKV、TiDB 组件下任意一个 pod 的编号进行缩容

metadata: annotations: tikv.tidb.pingcap.com/delete-slots: '[1]'

如果想要缩容需同时配置修改 replicas 和 annotations 两个配置文件

下边已具体缩容 tikv1 和 pd1 为例




支持的 annoations 为:

  • pd.tidb.pingcap.com/delete-slots:指定 PD 组件需要删除的 Pod 序号。

  • tidb.tidb.pingcap.com/delete-slots:指定 TiDB 组件需要删除的 Pod 序号。

  • tikv.tidb.pingcap.com/delete-slots:指定 TiKV 组件需要删除的 Pod 序号。

其中 Annotation 值为 JSON 的整数数组,比如 [0], [0,1], [1,3] 等。


如果扩容的话需要对前面进行反向操作即修改 replicas 和把 annoations 修改的数组进行删除即可 ( delete-slots annotations 可留空,也可完全删除)。



