写点什么

k8s garbage collector 分析(2)- 处理逻辑分析

用户头像
良凯尔
关注
发布于: 2 小时前
k8s garbage collector分析(2)-处理逻辑分析

garbage collector 介绍

Kubernetes garbage collector 即垃圾收集器,存在于 kube-controller-manger 中,它负责回收 kubernetes 中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。


关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner时,会连带这个owner对象的dependent对象也一起删除掉。


关于对象的关联依赖关系,garbage collector 会监听资源对象事件,根据资源对象中ownerReference 的值,来构建对象间的关联依赖关系,也即ownerdependent之间的关系。

关于 owner 与 dependent 的介绍

以创建 deployment 对象为例进行讲解。


创建 deployment 对象后,kube-controller-manager 为其创建出 replicaset 对象,且自动将该 deployment 的信息设置到 replicaset 对象ownerReference值。如下面示例,即说明 replicaset 对象test-1-59d7f45ffbowner为 deployment 对象test-1,deployment 对象test-1dependent为 replicaset 对象test-1-59d7f45ffb


apiVersion: apps/v1kind: Deploymentmetadata:  name: test-1  namespace: test  uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce...
复制代码


apiVersion: apps/v1kind: ReplicaSetmetadata:  name: test-1-59d7f45ffb  namespace: test  ownerReferences:  - apiVersion: apps/v1    blockOwnerDeletion: true    controller: true    kind: Deployment    name: test-1    uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce  uid: 386c380b-490e-470b-a33f-7d5b0bf945fb...
复制代码


同理,replicaset 对象创建后,kube-controller-manager 为其创建出 pod 对象,这些 pod 对象也会将 replicaset 对象的信息设置到 pod 对象的ownerReference的值中,replicaset 是 pod 的owner,pod 是 replicaset 的dependent


对象中ownerReference 的值,指定了ownerdependent之间的关系。

garbage collector 架构图

garbage collectort 的详细架构与核心处理逻辑如下图。


garbage collector 中最关键的代码就是garbagecollector.gograph_builder.go两部分。


garbage collector 的主要组成为 1 个图(对象关联依赖关系图)、2 个处理器(GraphBuilderGarbageCollector)、3 个事件队列(graphChangesattemptToDeleteattemptToOrphan):


1 个图


(1)uidToNode:对象关联依赖关系图,由GraphBuilder维护,维护着所有对象间的关联依赖关系。在该图里,每一个 k8s 对象会对应着关系图里的一个node,而每个node都会维护一个owner列表以及dependent列表。


示例:现有一个 deployment A,replicaset B(owner 为 deployment A),pod C(owner 为 replicaset B),则对象关联依赖关系如下:


3个node,分别是A、B、C
A对应一个node,无owner,dependent列表里有B; B对应一个node,owner列表里有A,dependent列表里有C; C对应一个node,owner列表里有B,无dependent。
复制代码



2 个处理器


(1)GraphBuilder:负责维护所有对象的关联依赖关系图,并产生事件触发GarbageCollector执行对象回收删除操作。GraphBuildergraphChanges事件队列中获取事件进行消费,根据资源对象中ownerReference的值,来构建、更新、删除对象间的关联依赖关系图,也即ownerdependent之间的关系图,然后再作为生产者生产事件,放入attemptToDeleteattemptToOrphan队列中,触发GarbageCollector执行,看是否需要进行关联对象的回收删除操作,而GarbageCollector进行对象的回收删除操作时会依赖于uidToNode这个关系图。


(2)GarbageCollector:负责回收删除对象。GarbageCollector作为消费者,从attemptToDeleteattemptToOrphan队列中取出事件进行处理,若一个对象被删除,且其删除策略为级联删除,则进行关联对象的回收删除。关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner时,会连带这个owner对象的dependent对象也一起删除掉。


3 个事件队列


(1)graphChanges:list/watch apiserver,获取事件,由informer生产,由GraphBuilder消费;


(2)attemptToDelete:级联删除事件队列,由GraphBuilder生产,由GarbageCollector消费;


(3)attemptToOrphan:孤儿删除事件队列,由GraphBuilder生产,由GarbageCollector消费。

对象删除策略

kubernetes 中有三种对象删除策略:OrphanForeground Background,删除某个对象时,可以指定删除策略。下面对这三种策略进行介绍。

Foreground 前台删除

Foreground 即前台删除策略,属于级联删除策略,垃圾收集器会删除对象的所有dependent


使用前台删除策略删除某个对象时,该对象的 deletionTimestamp 字段被设置,且对象的 metadata.finalizers 字段包含值 foregroundDeletion,用于阻塞该对象删除,等到垃圾收集器在删除了该对象中所有有阻塞能力的dependent对象(对象的 ownerReference.blockOwnerDeletion=true) 之后,再去除该对象的 metadata.finalizers 字段中的值 foregroundDeletion,然后删除该owner对象。


以删除 deployment 为例,使用前台删除策略,则按照 Pod->ReplicaSet->Deployment 的顺序进行删除。

Background 后台删除

Background 即后台删除策略,属于级联删除策略,Kubernetes 会立即删除该owner对象,之后垃圾收集器会在后台自动删除其所有的dependent对象。


当删除一个对象时使用了Background后台删除策略时,该对象因没有相关的Finalizer设置(只有删除策略为foregroundOrphan时会设置相关Finalizer),会直接被删除,接着GraphBuilder会监听到该对象的 delete 事件,会将其dependents放入到attemptToDelete队列中去,触发GarbageCollectordependents对象的回收删除处理。


以删除 deployment 为例,使用后台删除策略,则按照 Deployment->ReplicaSet->Pod 的顺序进行删除。

Orphan 孤儿删除

Orphan 即孤儿删除策略,属于非级联删除策略,即删除某个对象时,不会自动删除它的dependent,这些dependent也被称作孤立对象。


当删除一个对象时使用了Orphan孤儿删除策略时,该对象的 metadata.finalizers 字段包含值 orphan,用于阻塞该对象删除,直至GarbageCollector将其所有dependentsOwnerReferences属性中的该owner的相关字段去除,再去除该owner对象的 metadata.finalizers 字段中的值 Orphan,最后才能删除该owner对象。


以删除 deployment 为例,使用孤儿删除策略,则只删除 Deployment,对应 ReplicaSet 和 Pod 不删除。

删除对象时指定删除策略

当删除对象时没有特别指定删除策略,将会使用默认删除策略:Background 即后台删除策略。


(1)指定后台删除策略


curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \  -H "Content-Type: application/json"
复制代码


(2)指定前台删除策略


curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \  -H "Content-Type: application/json"
复制代码


(3)指定孤儿删除策略


curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \  -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \  -H "Content-Type: application/json"
复制代码


garbage collector 的源码分析分成两部分进行,分别是:


(1)启动分析;


(2)核心处理逻辑分析。


上一篇博客已经对 garbage collector 的启动进行了分析,本篇博客对 garbage collector 的核心处理逻辑进行分析。

garbage collector 源码分析-处理逻辑分析

基于 tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4


前面讲过,garbage collector中最关键的代码就是garbagecollector.gograph_builder.go两部分,也即GarbageCollector structGraphBuilder struct,所以下面处理逻辑分析将分成两大块进行分析。

1.GraphBuilder

首先先看到GraphBuilder


GraphBuilder 主要有 2 个功能:


(1)基于 informers 中的资源事件在 uidToNode 属性中维护着所有对象的关联依赖关系;


(2)处理 graphChanges 中的事件,并作为生产者将事件放入到 attemptToDeleteattemptToOrphan 两个队列中,触发消费者GarbageCollector进行对象的回收删除操作。

1.1 GraphBuilder struct

先来简单的分析下GraphBuilder struct,里面最关键的几个属性及作用如下:


(1)graphChanges:informers 监听到的事件会放在 graphChanges 中,然后GraphBuilder会作为消费者,处理graphChanges队列中的事件;


(2)uidToNode(对象依赖关联关系图):根据对象 uid,维护所有对象的关联依赖关系,也即前面说的ownerdependent之间的关系,也可以理解为GraphBuilder会维护一张所有对象的关联依赖关系图,而GarbageCollector进行对象的回收删除操作时会依赖于这个关系图;


(3)attemptToDeleteattemptToOrphanGraphBuilder作为生产者往attemptToDeleteattemptToOrphan 两个队列中存放事件,然后GarbageCollector作为消费者会处理 attemptToDeleteattemptToOrphan 两个队列中的事件。


// pkg/controller/garbagecollector/graph_builder.gotype GraphBuilder struct {  ...    // monitors are the producer of the graphChanges queue, graphBuilder alters  // the in-memory graph according to the changes.  graphChanges workqueue.RateLimitingInterface  // uidToNode doesn't require a lock to protect, because only the  // single-threaded GraphBuilder.processGraphChanges() reads/writes it.  uidToNode *concurrentUIDToNode  // GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.  attemptToDelete workqueue.RateLimitingInterface  attemptToOrphan workqueue.RateLimitingInterface    ...}
复制代码


// pkg/controller/garbagecollector/graph.gotype concurrentUIDToNode struct {  uidToNodeLock sync.RWMutex  uidToNode     map[types.UID]*node}
复制代码


// pkg/controller/garbagecollector/graph.gotype node struct {  ...  dependents map[*node]struct{}  ...  owners []metav1.OwnerReference}
复制代码


从结构体定义中可以看到,一个 k8s 对象对应着对象关联依赖关系图里的一个node,而每个node都会维护一个owner列表以及dependent列表。

1.2 GraphBuilder-gb.processGraphChanges

接下来看到GraphBuilder的处理逻辑部分,从gb.processGraphChanges作为入口进行处理逻辑分析。


前面说过,informers 监听到的事件会放入到 graphChanges 队列中,然后GraphBuilder会作为消费者,处理graphChanges队列中的事件,而processGraphChanges方法就是GraphBuilder作为消费者处理graphChanges队列中事件地方。


所以在此方法中,GraphBuilder既是消费者又是生产者,消费处理graphChanges 中的所有事件并进行分类,再生产事件放入到 attemptToDeleteattemptToOrphan 两个队列中去,让GarbageCollector作为消费者去处理这两个队列中的事件。


主要逻辑:


(1)从graphChanges队列中取出事件进行处理;


(2)读取uidToNode,判断该对象是否已经存在于已构建的对象依赖关联关系图中;下面就开始根据对象是否存在于对象依赖关联关系图中以及事件类型来做不同的处理逻辑;


(3)若 uidToNode 中不存在该 node 且该事件是 addEventupdateEvent,则为该 object 创建对应的 node,并调用 gb.insertNode 将该 node 加到 uidToNode 中,然后将该 node 添加到其 ownerdependents 中;


然后再调用 gb.processTransitions 方法做处理,该方法的处理逻辑是判断该对象是否处于删除状态,若处于删除状态会判断该对象是以 orphan 模式删除还是以 foreground 模式删除(其实就是判断 deployment 对象的 finalizer 来区分删除模式,删除 deployment 的时候会带上删除策略,kube-apiserver 会根据删除策略给 deployment 对象打上相应的 finalizer),若以 orphan 模式删除,则将该 node 加入到 attemptToOrphan 队列中,若以 foreground 模式删除则将该对象以及其所有 dependents 都加入到 attemptToDelete 队列中;


(4)若 uidToNode 中存在该 node 且该事件是 addEventupdateEvent 时,则调用 referencesDiffs 方法检查该对象的 OwnerReferences 字段是否有变化,有变化则做相应处理,更新对象依赖关联关系图,最后调用 gb.processTransitions做处理;


(5)若事件为删除事件,则调用gb.removeNode,从uidToNode中删除该对象,然后从该node所有ownersdependents中删除该对象,再把该对象的dependents放入到attemptToDelete队列中,触发GarbageCollector处理;最后检查该 node 的所有 owners,若有处于删除状态的 owner,此时该 owner 可能处于删除阻塞状态正在等待该 node 的删除,将该 owner 加入到 attemptToDelete队列中,触发GarbageCollector处理。


// pkg/controller/garbagecollector/graph_builder.gofunc (gb *GraphBuilder) runProcessGraphChanges() {  for gb.processGraphChanges() {  }}
// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.func (gb *GraphBuilder) processGraphChanges() bool { item, quit := gb.graphChanges.Get() if quit { return false } defer gb.graphChanges.Done(item) event, ok := item.(*event) if !ok { utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item)) return true } obj := event.obj accessor, err := meta.Accessor(obj) if err != nil { utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err)) return true } klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType) // Check if the node already exists existingNode, found := gb.uidToNode.Read(accessor.GetUID()) if found { // this marks the node as having been observed via an informer event // 1. this depends on graphChanges only containing add/update events from the actual informer // 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events existingNode.markObserved() } switch { case (event.eventType == addEvent || event.eventType == updateEvent) && !found: newNode := &node{ identity: objectReference{ OwnerReference: metav1.OwnerReference{ APIVersion: event.gvk.GroupVersion().String(), Kind: event.gvk.Kind, UID: accessor.GetUID(), Name: accessor.GetName(), }, Namespace: accessor.GetNamespace(), }, dependents: make(map[*node]struct{}), owners: accessor.GetOwnerReferences(), deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor), beingDeleted: beingDeleted(accessor), } gb.insertNode(newNode) // the underlying delta_fifo may combine a creation and a deletion into // one event, so we need to further process the event. gb.processTransitions(event.oldObj, accessor, newNode) case (event.eventType == addEvent || event.eventType == updateEvent) && found: // handle changes in ownerReferences added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences()) if len(added) != 0 || len(removed) != 0 || len(changed) != 0 { // check if the changed dependency graph unblock owners that are // waiting for the deletion of their dependents. gb.addUnblockedOwnersToDeleteQueue(removed, changed) // update the node itself existingNode.owners = accessor.GetOwnerReferences() // Add the node to its new owners' dependent lists. gb.addDependentToOwners(existingNode, added) // remove the node from the dependent list of node that are no longer in // the node's owners list. gb.removeDependentFromOwners(existingNode, removed) }
if beingDeleted(accessor) { existingNode.markBeingDeleted() } gb.processTransitions(event.oldObj, accessor, existingNode) case event.eventType == deleteEvent: if !found { klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID()) return true } // removeNode updates the graph gb.removeNode(existingNode) existingNode.dependentsLock.RLock() defer existingNode.dependentsLock.RUnlock() if len(existingNode.dependents) > 0 { gb.absentOwnerCache.Add(accessor.GetUID()) } for dep := range existingNode.dependents { gb.attemptToDelete.Add(dep) } for _, owner := range existingNode.owners { ownerNode, found := gb.uidToNode.Read(owner.UID) if !found || !ownerNode.isDeletingDependents() { continue } // this is to let attempToDeleteItem check if all the owner's // dependents are deleted, if so, the owner will be deleted. gb.attemptToDelete.Add(ownerNode) } } return true}
复制代码


结合代码分析可以得知,当删除一个对象时使用了Background后台删除策略时,该对象因没有相关的Finalizer设置(只有删除策略为ForegroundOrphan时会设置相关Finalizer),会直接被删除,接着GraphBuilder会监听到该对象的 delete 事件,会将其dependents放入到attemptToDelete队列中去,触发GarbageCollectordependents对象的回收删除处理。

1.2.1 gb.insertNode

调用 gb.insertNodenode 加到 uidToNode 中,然后将该 node 添加到其 ownerdependents 中。


// pkg/controller/garbagecollector/graph_builder.gofunc (gb *GraphBuilder) insertNode(n *node) {  gb.uidToNode.Write(n)  gb.addDependentToOwners(n, n.owners)}
func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) { for _, owner := range owners { ownerNode, ok := gb.uidToNode.Read(owner.UID) if !ok { // Create a "virtual" node in the graph for the owner if it doesn't // exist in the graph yet. ownerNode = &node{ identity: objectReference{ OwnerReference: owner, Namespace: n.identity.Namespace, }, dependents: make(map[*node]struct{}), virtual: true, } klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity) gb.uidToNode.Write(ownerNode) } ownerNode.addDependent(n) if !ok { // Enqueue the virtual node into attemptToDelete. // The garbage processor will enqueue a virtual delete // event to delete it from the graph if API server confirms this // owner doesn't exist. gb.attemptToDelete.Add(ownerNode) } }}
复制代码

1.2.2 gb.processTransitions

gb.processTransitions 方法检查 k8s 对象是否处于删除状态(对象的deletionTimestamp属性不为空则处于删除状态),并且对象里含有删除策略对应的finalizer,然后做相应的处理。


因为只有删除策略为ForegroundOrphan时对象才会会设置相关Finalizer,所以该方法只会处理删除策略为ForegroundOrphan的对象,对于删除策略为Background的对象不做处理。


若对象的deletionTimestamp属性不为空,且有Orphaned删除策略对应的finalizer,则将对应的node放入到 attemptToOrphan 队列中,触发GarbageCollector去消费处理;


若对象的deletionTimestamp属性不为空,且有foreground删除策略对应的finalizer,则调用n.markDeletingDependents标记 node deletingDependents 属性为 true,代表该nodedependents正在被删除,并将对应的node及其dependents放入到 attemptToDelete 队列中,触发GarbageCollector去消费处理。


// pkg/controller/garbagecollector/graph_builder.gofunc (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {  if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {    klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)    gb.attemptToOrphan.Add(n)    return  }  if startsWaitingForDependentsDeleted(oldObj, newAccessor) {    klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)    // if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.    n.markDeletingDependents()    for dep := range n.dependents {      gb.attemptToDelete.Add(dep)    }    gb.attemptToDelete.Add(n)  }}
func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool { return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)}
func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool { return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)}
func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool { // if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) { return false }
// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true if oldObj == nil { return true } oldAccessor, err := meta.Accessor(oldObj) if err != nil { utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err)) return false } return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)}
func beingDeleted(accessor metav1.Object) bool { return accessor.GetDeletionTimestamp() != nil}
func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool { finalizers := accessor.GetFinalizers() for _, finalizer := range finalizers { if finalizer == matchingFinalizer { return true } } return false}
复制代码

1.2.3 gb.removeNode

调用gb.removeNode,从uidToNode中删除该对象,然后从该node所有ownersdependents中删除该对象,再把该对象的dependents放入到attemptToDelete队列中,触发GarbageCollector处理;最后检查该 node 的所有 owners,若有处于删除状态的 owner,此时该 owner 可能处于删除阻塞状态正在等待该 node 的删除,将该 owner 加入到 attemptToDelete队列中,触发GarbageCollector处理。


// pkg/controller/garbagecollector/graph_builder.gofunc (gb *GraphBuilder) removeNode(n *node) {  gb.uidToNode.Delete(n.identity.UID)  gb.removeDependentFromOwners(n, n.owners)}
func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) { for _, owner := range owners { ownerNode, ok := gb.uidToNode.Read(owner.UID) if !ok { continue } ownerNode.deleteDependent(n) }}
复制代码

2.GarbageCollector

再来看到GarbageCollector


GarbageCollector 主要有 2 个功能:


(1)处理 attemptToDelete队列中的事件,根据对象删除策略foregroundbackground做相应的回收逻辑处理,删除关联对象;


(2)处理 attemptToOrphan队列中的事件,根据对象删除策略Orphan,更新该owner的所有dependents对象,将对象的OwnerReferences属性中该owner的相关字段去除,接着再更新该owner对象,去除Orphan删除策略对应的finalizers


GarbageCollector 的 2 个关键处理方法:


(1)gc.runAttemptToDeleteWorker:主要负责处理attemptToDelete队列中的事件,负责删除策略为foregroundbackground的对象回收处理;


(2)gc.runAttemptToOrphanWorker:主要负责处理attemptToOrphan队列中的事件,负责删除策略为Orphan的对象回收处理。

2.1 GarbageCollector struct

先来简单的分析下GarbageCollector struct,里面最关键的几个属性及作用如下:


(1)attemptToDeleteattemptToOrphanGraphBuilder作为生产者往attemptToDeleteattemptToOrphan 两个队列中存放事件,然后GarbageCollector作为消费者会处理 attemptToDeleteattemptToOrphan 两个队列中的事件。


// pkg/controller/garbagecollector/garbagecollector.gotype GarbageCollector struct {  ...  attemptToDelete workqueue.RateLimitingInterface  attemptToOrphan workqueue.RateLimitingInterface  ...}
复制代码

2.2 GarbageCollector-gc.runAttemptToDeleteWorker

接下来看到GarbageCollector的处理逻辑部分,从gc.runAttemptToDeleteWorker作为入口进行处理逻辑分析。


runAttemptToDeleteWorker 主要逻辑为循环调用attemptToDeleteWorker方法。


attemptToDeleteWorker 方法主要逻辑:


(1)从attemptToDelete队列中取出对象;


(2)调用 gc.attemptToDeleteItem 尝试删除 node


(3)若删除失败则重新加入到 attemptToDelete 队列中进行重试。


// pkg/controller/garbagecollector/garbagecollector.gofunc (gc *GarbageCollector) runAttemptToDeleteWorker() {  for gc.attemptToDeleteWorker() {  }}
func (gc *GarbageCollector) attemptToDeleteWorker() bool { item, quit := gc.attemptToDelete.Get() gc.workerLock.RLock() defer gc.workerLock.RUnlock() if quit { return false } defer gc.attemptToDelete.Done(item) n, ok := item.(*node) if !ok { utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item)) return true } err := gc.attemptToDeleteItem(n) if err != nil { if _, ok := err.(*restMappingError); ok { // There are at least two ways this can happen: // 1. The reference is to an object of a custom type that has not yet been // recognized by gc.restMapper (this is a transient error). // 2. The reference is to an invalid group/version. We don't currently // have a way to distinguish this from a valid type we will recognize // after the next discovery sync. // For now, record the error and retry. klog.V(5).Infof("error syncing item %s: %v", n, err) } else { utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err)) } // retry if garbage collection of an object failed. gc.attemptToDelete.AddRateLimited(item) } else if !n.isObserved() { // requeue if item hasn't been observed via an informer event yet. // otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed. // see https://issue.k8s.io/56121 klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity) gc.attemptToDelete.AddRateLimited(item) } return true}
复制代码

2.2.1 gc.attemptToDeleteItem

主要逻辑:


(1)判断 node 是否处于删除状态;


(2)从 apiserver 获取该 node 对应的对象;


(3)调用item.isDeletingDependents方法:通过 nodedeletingDependents 字段判断该 node 当前是否正在删除 dependents,若是则调用 gc.processDeletingDependentsItem 方法对dependents做进一步处理:检查该nodeblockingDependents 是否被完全删除,若是则移除该 node 对应对象的相关 finalizer,若否,则将未删除的 blockingDependents 加入到 attemptToDelete队列中;


上面分析GraphBuilder时说到,在 GraphBuilder 处理 graphChanges 中的事件时,在processTransitions方法逻辑里,会调用n.markDeletingDependents,标记 node deletingDependents 属性为 true


(4)调用gc.classifyReferencesnodeowner分为 3 类,分别是solid(至少有一个 owner 存在且不处于删除状态)、danglingowner 均不存在)、waitingForDependentsDeletionowner 存在,处于删除状态且正在等待其 dependents 被删除);


(5)接下来将根据soliddanglingwaitingForDependentsDeletion的数量做不同的逻辑处理;


(6)第一种情况:当solid数量不为 0 时,即该node至少有一个 owner 存在且不处于删除状态,则说明该对象还不能被回收删除,此时将 danglingwaitingForDependentsDeletion 列表中的 ownernodeownerReferences 中删除;


(7)第二种情况:solid数量为 0,该 nodeowner 处于 waitingForDependentsDeletion 状态并且 nodedependents 未被完全删除,将使用foreground前台删除策略来删除该node对应的对象;


(8)当不满足以上两种情况时(即),进入该默认处理逻辑:按照删除对象时使用的删除策略,调用 apiserver 的接口删除对象。


// pkg/controller/garbagecollector/garbagecollector.gofunc (gc *GarbageCollector) attemptToDeleteItem(item *node) error {  klog.V(2).Infof("processing item %s", item.identity)  // "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.  if item.isBeingDeleted() && !item.isDeletingDependents() {    klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)    return nil  }  // TODO: It's only necessary to talk to the API server if this is a  // "virtual" node. The local graph could lag behind the real status, but in  // practice, the difference is small.  latest, err := gc.getObject(item.identity)  switch {  case errors.IsNotFound(err):    // the GraphBuilder can add "virtual" node for an owner that doesn't    // exist yet, so we need to enqueue a virtual Delete event to remove    // the virtual node from GraphBuilder.uidToNode.    klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)    gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)    // since we're manually inserting a delete event to remove this node,    // we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete    item.markObserved()    return nil  case err != nil:    return err  }
if latest.GetUID() != item.identity.UID { klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity) gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity) // since we're manually inserting a delete event to remove this node, // we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete item.markObserved() return nil }
// TODO: attemptToOrphanWorker() routine is similar. Consider merging // attemptToOrphanWorker() into attemptToDeleteItem() as well. if item.isDeletingDependents() { return gc.processDeletingDependentsItem(item) }
// compute if we should delete the item ownerReferences := latest.GetOwnerReferences() if len(ownerReferences) == 0 { klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity) return nil }
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences) if err != nil { return err } klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch { case len(solid) != 0: klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid) if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 { return nil } klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity) // waitingForDependentsDeletion needs to be deleted from the // ownerReferences, otherwise the referenced objects will be stuck with // the FinalizerDeletingDependents and never get deleted. ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...) patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...) _, err = gc.patch(item, patch, func(n *node) ([]byte, error) { return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...) }) return err case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0: deps := item.getDependents() for _, dep := range deps { if dep.isDeletingDependents() { // this circle detection has false positives, we need to // apply a more rigorous detection if this turns out to be a // problem. // there are multiple workers run attemptToDeleteItem in // parallel, the circle detection can fail in a race condition. klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity) patch, err := item.unblockOwnerReferencesStrategicMergePatch() if err != nil { return err } if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil { return err } break } } klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity) // the deletion event will be observed by the graphBuilder, so the item // will be processed again in processDeletingDependentsItem. If it // doesn't have dependents, the function will remove the // FinalizerDeletingDependents from the item, resulting in the final // deletion of the item. policy := metav1.DeletePropagationForeground return gc.deleteObject(item.identity, &policy) default: // item doesn't have any solid owner, so it needs to be garbage // collected. Also, none of item's owners is waiting for the deletion of // the dependents, so set propagationPolicy based on existing finalizers. var policy metav1.DeletionPropagation switch { case hasOrphanFinalizer(latest): // if an existing orphan finalizer is already on the object, honor it. policy = metav1.DeletePropagationOrphan case hasDeleteDependentsFinalizer(latest): // if an existing foreground finalizer is already on the object, honor it. policy = metav1.DeletePropagationForeground default: // otherwise, default to background. policy = metav1.DeletePropagationBackground } klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy) return gc.deleteObject(item.identity, &policy) }}
复制代码


gc.processDeletingDependentsItem


主要逻辑:检查该nodeblockingDependents(即阻塞owner删除的dpendents)是否被完全删除,若是则移除该 node 对应对象的相关 finalizer(finalizer 移除后,kube-apiserver 会删除该对象),若否,则将未删除的 blockingDependents 加入到 attemptToDelete队列中。


// pkg/controller/garbagecollector/garbagecollector.gofunc (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {  blockingDependents := item.blockingDependents()  if len(blockingDependents) == 0 {    klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)    return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)  }  for _, dep := range blockingDependents {    if !dep.isDeletingDependents() {      klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)      gc.attemptToDelete.Add(dep)    }  }  return nil}
复制代码


item.blockingDependents


item.blockingDependents 返回会阻塞node删除的dependents。一个dependents会不会阻塞owner的删除,主要看这个dependentsownerReferencesblockOwnerDeletion属性值是否为true,为true则代表该dependents会阻塞owner的删除。


// pkg/controller/garbagecollector/graph.gofunc (n *node) blockingDependents() []*node {  dependents := n.getDependents()  var ret []*node  for _, dep := range dependents {    for _, owner := range dep.owners {      if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {        ret = append(ret, dep)      }    }  }  return ret}
复制代码

2.3 GarbageCollector-gc.runAttemptToOrphanWorker

gc.runAttemptToOrphanWorker 方法是负责处理orphan删除策略删除的 node


gc.runAttemptToDeleteWorker 主要逻辑为循环调用gc.attemptToDeleteWorker方法。


下面来看一下gc.attemptToDeleteWorker方法的主要逻辑:


(1)从attemptToOrphan队列中取出对象;


(2)调用gc.orphanDependents方法:更新该owner的所有dependents对象,将对象的OwnerReferences属性中该owner的相关字段去除,失败则将该owner重新加入到attemptToOrphan队列中;


(3)调用gc.removeFinalizer方法:更新该owner对象,去除Orphan删除策略对应的finalizers


// pkg/controller/garbagecollector/garbagecollector.gofunc (gc *GarbageCollector) runAttemptToOrphanWorker() {  for gc.attemptToOrphanWorker() {  }}
func (gc *GarbageCollector) attemptToOrphanWorker() bool { item, quit := gc.attemptToOrphan.Get() gc.workerLock.RLock() defer gc.workerLock.RUnlock() if quit { return false } defer gc.attemptToOrphan.Done(item) owner, ok := item.(*node) if !ok { utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item)) return true } // we don't need to lock each element, because they never get updated owner.dependentsLock.RLock() dependents := make([]*node, 0, len(owner.dependents)) for dependent := range owner.dependents { dependents = append(dependents, dependent) } owner.dependentsLock.RUnlock()
err := gc.orphanDependents(owner.identity, dependents) if err != nil { utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err)) gc.attemptToOrphan.AddRateLimited(item) return true } // update the owner, remove "orphaningFinalizer" from its finalizers list err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents) if err != nil { utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err)) gc.attemptToOrphan.AddRateLimited(item) } return true}
复制代码

2.3.1 gc.orphanDependents

主要逻辑:更新指定owner的所有dependents对象,将对象的OwnerReferences属性中该owner的相关字段去除,对于每个dependents,分别起一个 goroutine 来处理,加快处理速度。


// pkg/controller/garbagecollector/garbagecollector.gofunc (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {  errCh := make(chan error, len(dependents))  wg := sync.WaitGroup{}  wg.Add(len(dependents))  for i := range dependents {    go func(dependent *node) {      defer wg.Done()      // the dependent.identity.UID is used as precondition      patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)      _, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {        return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)      })      // note that if the target ownerReference doesn't exist in the      // dependent, strategic merge patch will NOT return an error.      if err != nil && !errors.IsNotFound(err) {        errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)      }    }(dependents[i])  }  wg.Wait()  close(errCh)
var errorsSlice []error for e := range errCh { errorsSlice = append(errorsSlice, e) }
if len(errorsSlice) != 0 { return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error()) } klog.V(5).Infof("successfully updated all dependents of owner %s", owner) return nil}
复制代码

总结

先来回顾一下garbage collector的构架与核心处理逻辑。



garbage collector 的主要组成为 1 个图(对象关联依赖关系图)、2 个处理器(GraphBuilderGarbageCollector)、3 个事件队列(graphChangesattemptToDeleteattemptToOrphan)。


从 apiserver list/watch 的事件会放入到graphChanges队列,而GraphBuildergraphChanges队列中取出事件进行处理,构建对象关联依赖关系图,并根据对象删除策略将关联对象放入attemptToDeleteattemptToOrphan队列中,接着GarbageCollector会从attemptToDeleteattemptToOrphan队列中取出事件,再从对象关联依赖关系图中获取信息进行处理,最后回收删除对象。

对象删除策略

总结一下 3 种对象删除策略下,node及其对象的删除过程。

Foreground 前台删除

Foreground 即前台删除策略,属于级联删除策略,垃圾收集器会删除对象的所有dependent


使用前台删除策略删除某个对象时,该对象的 deletionTimestamp 字段被设置,且对象的 metadata.finalizers 字段包含值 foregroundDeletion,用于阻塞该对象删除,等到垃圾收集器在删除了该对象中所有有阻塞能力的dependent对象(对象的 ownerReference.blockOwnerDeletion=true) 之后,再去除该对象的 metadata.finalizers 字段中的值 foregroundDeletion,然后删除该owner对象。


以删除 deployment 为例,使用前台删除策略,则按照 Pod->ReplicaSet->Deployment 的顺序进行删除。

Background 后台删除

Background 即后台删除策略,属于级联删除策略,Kubernetes 会立即删除该owner对象,之后垃圾收集器会在后台自动删除其所有的dependent对象。


当删除一个对象时使用了Background后台删除策略时,该对象因没有相关的Finalizer设置(只有删除策略为foregroundOrphan时会设置相关Finalizer),会直接被删除,接着GraphBuilder会监听到该对象的 delete 事件,会将其dependents放入到attemptToDelete队列中去,触发GarbageCollectordependents对象的回收删除处理。


以删除 deployment 为例,使用后台删除策略,则按照 Deployment->ReplicaSet->Pod 的顺序进行删除。

Orphan 孤儿删除

Orphan 即孤儿删除策略,属于非级联删除策略,即删除某个对象时,不会自动删除它的dependent,这些dependent也被称作孤立对象。


当删除一个对象时使用了Orphan孤儿删除策略时,该对象的 metadata.finalizers 字段包含值 orphan,用于阻塞该对象删除,直至GarbageCollector将其所有dependentsOwnerReferences属性中的该owner的相关字段去除,再去除该owner对象的 metadata.finalizers 字段中的值 Orphan,最后才能删除该owner对象。


以删除 deployment 为例,使用孤儿删除策略,则只删除 Deployment,对应 ReplicaSet 和 Pod 不删除。

发布于: 2 小时前阅读数: 4
用户头像

良凯尔

关注

热爱的力量 2020.01.10 加入

kubernetes开发者

评论

发布
暂无评论
k8s garbage collector分析(2)-处理逻辑分析