garbage collector 介绍
Kubernetes garbage collector 即垃圾收集器,存在于 kube-controller-manger 中,它负责回收 kubernetes 中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。
关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner
时,会连带这个owner
对象的dependent
对象也一起删除掉。
关于对象的关联依赖关系,garbage collector 会监听资源对象事件,根据资源对象中ownerReference
的值,来构建对象间的关联依赖关系,也即owner
与dependent
之间的关系。
关于 owner 与 dependent 的介绍
以创建 deployment 对象为例进行讲解。
创建 deployment 对象后,kube-controller-manager 为其创建出 replicaset 对象,且自动将该 deployment 的信息设置到 replicaset 对象ownerReference
值。如下面示例,即说明 replicaset 对象test-1-59d7f45ffb
的owner
为 deployment 对象test-1
,deployment 对象test-1
的dependent
为 replicaset 对象test-1-59d7f45ffb
。
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-1
namespace: test
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...
复制代码
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: test-1-59d7f45ffb
namespace: test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: test-1
uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...
复制代码
同理,replicaset 对象创建后,kube-controller-manager 为其创建出 pod 对象,这些 pod 对象也会将 replicaset 对象的信息设置到 pod 对象的ownerReference
的值中,replicaset 是 pod 的owner
,pod 是 replicaset 的dependent
。
对象中ownerReference
的值,指定了owner
与dependent
之间的关系。
garbage collector 架构图
garbage collectort 的详细架构与核心处理逻辑如下图。
garbage collector 中最关键的代码就是garbagecollector.go
与graph_builder.go
两部分。
garbage collector 的主要组成为 1 个图(对象关联依赖关系图)、2 个处理器(GraphBuilder
与GarbageCollector
)、3 个事件队列(graphChanges
、attemptToDelete
与attemptToOrphan
):
1 个图
(1)uidToNode
:对象关联依赖关系图,由GraphBuilder
维护,维护着所有对象间的关联依赖关系。在该图里,每一个 k8s 对象会对应着关系图里的一个node
,而每个node
都会维护一个owner
列表以及dependent
列表。
示例:现有一个 deployment A,replicaset B(owner 为 deployment A),pod C(owner 为 replicaset B),则对象关联依赖关系如下:
3个node,分别是A、B、C
A对应一个node,无owner,dependent列表里有B;
B对应一个node,owner列表里有A,dependent列表里有C;
C对应一个node,owner列表里有B,无dependent。
复制代码
2 个处理器
(1)GraphBuilder
:负责维护所有对象的关联依赖关系图,并产生事件触发GarbageCollector
执行对象回收删除操作。GraphBuilder
从graphChanges
事件队列中获取事件进行消费,根据资源对象中ownerReference
的值,来构建、更新、删除对象间的关联依赖关系图,也即owner
与dependent
之间的关系图,然后再作为生产者生产事件,放入attemptToDelete
或attemptToOrphan
队列中,触发GarbageCollector
执行,看是否需要进行关联对象的回收删除操作,而GarbageCollector
进行对象的回收删除操作时会依赖于uidToNode
这个关系图。
(2)GarbageCollector
:负责回收删除对象。GarbageCollector
作为消费者,从attemptToDelete
与attemptToOrphan
队列中取出事件进行处理,若一个对象被删除,且其删除策略为级联删除,则进行关联对象的回收删除。关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner
时,会连带这个owner
对象的dependent
对象也一起删除掉。
3 个事件队列
(1)graphChanges
:list/watch apiserver,获取事件,由informer
生产,由GraphBuilder
消费;
(2)attemptToDelete
:级联删除事件队列,由GraphBuilder
生产,由GarbageCollector
消费;
(3)attemptToOrphan
:孤儿删除事件队列,由GraphBuilder
生产,由GarbageCollector
消费。
对象删除策略
kubernetes 中有三种对象删除策略:Orphan
、Foreground
和 Background
,删除某个对象时,可以指定删除策略。下面对这三种策略进行介绍。
Foreground 前台删除
Foreground 即前台删除策略,属于级联删除策略,垃圾收集器会删除对象的所有dependent
。
使用前台删除策略删除某个对象时,该对象的 deletionTimestamp
字段被设置,且对象的 metadata.finalizers
字段包含值 foregroundDeletion
,用于阻塞该对象删除,等到垃圾收集器在删除了该对象中所有有阻塞能力的dependent
对象(对象的 ownerReference.blockOwnerDeletion=true
) 之后,再去除该对象的 metadata.finalizers
字段中的值 foregroundDeletion
,然后删除该owner
对象。
以删除 deployment 为例,使用前台删除策略,则按照 Pod->ReplicaSet->Deployment 的顺序进行删除。
Background 后台删除
Background 即后台删除策略,属于级联删除策略,Kubernetes 会立即删除该owner
对象,之后垃圾收集器会在后台自动删除其所有的dependent
对象。
当删除一个对象时使用了Background
后台删除策略时,该对象因没有相关的Finalizer
设置(只有删除策略为foreground
或Orphan
时会设置相关Finalizer
),会直接被删除,接着GraphBuilder
会监听到该对象的 delete 事件,会将其dependents
放入到attemptToDelete
队列中去,触发GarbageCollector
做dependents
对象的回收删除处理。
以删除 deployment 为例,使用后台删除策略,则按照 Deployment->ReplicaSet->Pod 的顺序进行删除。
Orphan 孤儿删除
Orphan 即孤儿删除策略,属于非级联删除策略,即删除某个对象时,不会自动删除它的dependent
,这些dependent
也被称作孤立对象。
当删除一个对象时使用了Orphan
孤儿删除策略时,该对象的 metadata.finalizers
字段包含值 orphan
,用于阻塞该对象删除,直至GarbageCollector
将其所有dependents
的OwnerReferences
属性中的该owner
的相关字段去除,再去除该owner
对象的 metadata.finalizers
字段中的值 Orphan
,最后才能删除该owner
对象。
以删除 deployment 为例,使用孤儿删除策略,则只删除 Deployment,对应 ReplicaSet 和 Pod 不删除。
删除对象时指定删除策略
当删除对象时没有特别指定删除策略,将会使用默认删除策略:Background 即后台删除策略。
(1)指定后台删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' \
-H "Content-Type: application/json"
复制代码
(2)指定前台删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"
复制代码
(3)指定孤儿删除策略
curl -X DELETE localhost:8080/apis/apps/v1/namespaces/default/replicasets/my-repset \
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"
复制代码
garbage collector 的源码分析分成两部分进行,分别是:
(1)启动分析;
(2)核心处理逻辑分析。
上一篇博客已经对 garbage collector 的启动进行了分析,本篇博客对 garbage collector 的核心处理逻辑进行分析。
garbage collector 源码分析-处理逻辑分析
基于 tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
前面讲过,garbage collector
中最关键的代码就是garbagecollector.go
与graph_builder.go
两部分,也即GarbageCollector struct
与GraphBuilder struct
,所以下面处理逻辑分析将分成两大块进行分析。
1.GraphBuilder
首先先看到GraphBuilder
。
GraphBuilder 主要有 2 个功能:
(1)基于 informers 中的资源事件在 uidToNode
属性中维护着所有对象的关联依赖关系;
(2)处理 graphChanges
中的事件,并作为生产者将事件放入到 attemptToDelete
和 attemptToOrphan
两个队列中,触发消费者GarbageCollector
进行对象的回收删除操作。
1.1 GraphBuilder struct
先来简单的分析下GraphBuilder struct
,里面最关键的几个属性及作用如下:
(1)graphChanges
:informers 监听到的事件会放在 graphChanges
中,然后GraphBuilder
会作为消费者,处理graphChanges
队列中的事件;
(2)uidToNode
(对象依赖关联关系图):根据对象 uid,维护所有对象的关联依赖关系,也即前面说的owner
与dependent
之间的关系,也可以理解为GraphBuilder
会维护一张所有对象的关联依赖关系图,而GarbageCollector
进行对象的回收删除操作时会依赖于这个关系图;
(3)attemptToDelete
与attemptToOrphan
:GraphBuilder
作为生产者往attemptToDelete
和 attemptToOrphan
两个队列中存放事件,然后GarbageCollector
作为消费者会处理 attemptToDelete
和 attemptToOrphan
两个队列中的事件。
// pkg/controller/garbagecollector/graph_builder.go
type GraphBuilder struct {
...
// monitors are the producer of the graphChanges queue, graphBuilder alters
// the in-memory graph according to the changes.
graphChanges workqueue.RateLimitingInterface
// uidToNode doesn't require a lock to protect, because only the
// single-threaded GraphBuilder.processGraphChanges() reads/writes it.
uidToNode *concurrentUIDToNode
// GraphBuilder is the producer of attemptToDelete and attemptToOrphan, GC is the consumer.
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
复制代码
// pkg/controller/garbagecollector/graph.go
type concurrentUIDToNode struct {
uidToNodeLock sync.RWMutex
uidToNode map[types.UID]*node
}
复制代码
// pkg/controller/garbagecollector/graph.go
type node struct {
...
dependents map[*node]struct{}
...
owners []metav1.OwnerReference
}
复制代码
从结构体定义中可以看到,一个 k8s 对象对应着对象关联依赖关系图里的一个node
,而每个node
都会维护一个owner
列表以及dependent
列表。
1.2 GraphBuilder-gb.processGraphChanges
接下来看到GraphBuilder
的处理逻辑部分,从gb.processGraphChanges
作为入口进行处理逻辑分析。
前面说过,informers 监听到的事件会放入到 graphChanges
队列中,然后GraphBuilder
会作为消费者,处理graphChanges
队列中的事件,而processGraphChanges
方法就是GraphBuilder
作为消费者处理graphChanges
队列中事件地方。
所以在此方法中,GraphBuilder
既是消费者又是生产者,消费处理graphChanges
中的所有事件并进行分类,再生产事件放入到 attemptToDelete
和 attemptToOrphan
两个队列中去,让GarbageCollector
作为消费者去处理这两个队列中的事件。
主要逻辑:
(1)从graphChanges
队列中取出事件进行处理;
(2)读取uidToNode
,判断该对象是否已经存在于已构建的对象依赖关联关系图中;下面就开始根据对象是否存在于对象依赖关联关系图中以及事件类型来做不同的处理逻辑;
(3)若 uidToNode
中不存在该 node
且该事件是 addEvent
或 updateEvent
,则为该 object
创建对应的 node
,并调用 gb.insertNode
将该 node
加到 uidToNode
中,然后将该 node
添加到其 owner
的 dependents
中;
然后再调用 gb.processTransitions
方法做处理,该方法的处理逻辑是判断该对象是否处于删除状态,若处于删除状态会判断该对象是以 orphan
模式删除还是以 foreground
模式删除(其实就是判断 deployment 对象的 finalizer 来区分删除模式,删除 deployment 的时候会带上删除策略,kube-apiserver 会根据删除策略给 deployment 对象打上相应的 finalizer),若以 orphan
模式删除,则将该 node
加入到 attemptToOrphan
队列中,若以 foreground
模式删除则将该对象以及其所有 dependents
都加入到 attemptToDelete
队列中;
(4)若 uidToNode
中存在该 node
且该事件是 addEvent
或 updateEvent
时,则调用 referencesDiffs
方法检查该对象的 OwnerReferences
字段是否有变化,有变化则做相应处理,更新对象依赖关联关系图,最后调用 gb.processTransitions
做处理;
(5)若事件为删除事件,则调用gb.removeNode
,从uidToNode
中删除该对象,然后从该node
所有owners
的dependents
中删除该对象,再把该对象的dependents
放入到attemptToDelete
队列中,触发GarbageCollector
处理;最后检查该 node
的所有 owners
,若有处于删除状态的 owner
,此时该 owner
可能处于删除阻塞状态正在等待该 node
的删除,将该 owner
加入到 attemptToDelete
队列中,触发GarbageCollector
处理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) runProcessGraphChanges() {
for gb.processGraphChanges() {
}
}
// Dequeueing an event from graphChanges, updating graph, populating dirty_queue.
func (gb *GraphBuilder) processGraphChanges() bool {
item, quit := gb.graphChanges.Get()
if quit {
return false
}
defer gb.graphChanges.Done(item)
event, ok := item.(*event)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect a *event, got %v", item))
return true
}
obj := event.obj
accessor, err := meta.Accessor(obj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access obj: %v", err))
return true
}
klog.V(5).Infof("GraphBuilder process object: %s/%s, namespace %s, name %s, uid %s, event type %v", event.gvk.GroupVersion().String(), event.gvk.Kind, accessor.GetNamespace(), accessor.GetName(), string(accessor.GetUID()), event.eventType)
// Check if the node already exists
existingNode, found := gb.uidToNode.Read(accessor.GetUID())
if found {
// this marks the node as having been observed via an informer event
// 1. this depends on graphChanges only containing add/update events from the actual informer
// 2. this allows things tracking virtual nodes' existence to stop polling and rely on informer events
existingNode.markObserved()
}
switch {
case (event.eventType == addEvent || event.eventType == updateEvent) && !found:
newNode := &node{
identity: objectReference{
OwnerReference: metav1.OwnerReference{
APIVersion: event.gvk.GroupVersion().String(),
Kind: event.gvk.Kind,
UID: accessor.GetUID(),
Name: accessor.GetName(),
},
Namespace: accessor.GetNamespace(),
},
dependents: make(map[*node]struct{}),
owners: accessor.GetOwnerReferences(),
deletingDependents: beingDeleted(accessor) && hasDeleteDependentsFinalizer(accessor),
beingDeleted: beingDeleted(accessor),
}
gb.insertNode(newNode)
// the underlying delta_fifo may combine a creation and a deletion into
// one event, so we need to further process the event.
gb.processTransitions(event.oldObj, accessor, newNode)
case (event.eventType == addEvent || event.eventType == updateEvent) && found:
// handle changes in ownerReferences
added, removed, changed := referencesDiffs(existingNode.owners, accessor.GetOwnerReferences())
if len(added) != 0 || len(removed) != 0 || len(changed) != 0 {
// check if the changed dependency graph unblock owners that are
// waiting for the deletion of their dependents.
gb.addUnblockedOwnersToDeleteQueue(removed, changed)
// update the node itself
existingNode.owners = accessor.GetOwnerReferences()
// Add the node to its new owners' dependent lists.
gb.addDependentToOwners(existingNode, added)
// remove the node from the dependent list of node that are no longer in
// the node's owners list.
gb.removeDependentFromOwners(existingNode, removed)
}
if beingDeleted(accessor) {
existingNode.markBeingDeleted()
}
gb.processTransitions(event.oldObj, accessor, existingNode)
case event.eventType == deleteEvent:
if !found {
klog.V(5).Infof("%v doesn't exist in the graph, this shouldn't happen", accessor.GetUID())
return true
}
// removeNode updates the graph
gb.removeNode(existingNode)
existingNode.dependentsLock.RLock()
defer existingNode.dependentsLock.RUnlock()
if len(existingNode.dependents) > 0 {
gb.absentOwnerCache.Add(accessor.GetUID())
}
for dep := range existingNode.dependents {
gb.attemptToDelete.Add(dep)
}
for _, owner := range existingNode.owners {
ownerNode, found := gb.uidToNode.Read(owner.UID)
if !found || !ownerNode.isDeletingDependents() {
continue
}
// this is to let attempToDeleteItem check if all the owner's
// dependents are deleted, if so, the owner will be deleted.
gb.attemptToDelete.Add(ownerNode)
}
}
return true
}
复制代码
结合代码分析可以得知,当删除一个对象时使用了Background
后台删除策略时,该对象因没有相关的Finalizer
设置(只有删除策略为Foreground
或Orphan
时会设置相关Finalizer
),会直接被删除,接着GraphBuilder
会监听到该对象的 delete 事件,会将其dependents
放入到attemptToDelete
队列中去,触发GarbageCollector
做dependents
对象的回收删除处理。
1.2.1 gb.insertNode
调用 gb.insertNode
将 node
加到 uidToNode
中,然后将该 node
添加到其 owner
的 dependents
中。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) insertNode(n *node) {
gb.uidToNode.Write(n)
gb.addDependentToOwners(n, n.owners)
}
func (gb *GraphBuilder) addDependentToOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
// Create a "virtual" node in the graph for the owner if it doesn't
// exist in the graph yet.
ownerNode = &node{
identity: objectReference{
OwnerReference: owner,
Namespace: n.identity.Namespace,
},
dependents: make(map[*node]struct{}),
virtual: true,
}
klog.V(5).Infof("add virtual node.identity: %s\n\n", ownerNode.identity)
gb.uidToNode.Write(ownerNode)
}
ownerNode.addDependent(n)
if !ok {
// Enqueue the virtual node into attemptToDelete.
// The garbage processor will enqueue a virtual delete
// event to delete it from the graph if API server confirms this
// owner doesn't exist.
gb.attemptToDelete.Add(ownerNode)
}
}
}
复制代码
1.2.2 gb.processTransitions
gb.processTransitions 方法检查 k8s 对象是否处于删除状态(对象的deletionTimestamp
属性不为空则处于删除状态),并且对象里含有删除策略对应的finalizer
,然后做相应的处理。
因为只有删除策略为Foreground
或Orphan
时对象才会会设置相关Finalizer
,所以该方法只会处理删除策略为Foreground
或Orphan
的对象,对于删除策略为Background
的对象不做处理。
若对象的deletionTimestamp
属性不为空,且有Orphaned
删除策略对应的finalizer
,则将对应的node
放入到 attemptToOrphan
队列中,触发GarbageCollector
去消费处理;
若对象的deletionTimestamp
属性不为空,且有foreground
删除策略对应的finalizer
,则调用n.markDeletingDependents
标记 node
的 deletingDependents
属性为 true
,代表该node
的dependents
正在被删除,并将对应的node
及其dependents
放入到 attemptToDelete
队列中,触发GarbageCollector
去消费处理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) processTransitions(oldObj interface{}, newAccessor metav1.Object, n *node) {
if startsWaitingForDependentsOrphaned(oldObj, newAccessor) {
klog.V(5).Infof("add %s to the attemptToOrphan", n.identity)
gb.attemptToOrphan.Add(n)
return
}
if startsWaitingForDependentsDeleted(oldObj, newAccessor) {
klog.V(2).Infof("add %s to the attemptToDelete, because it's waiting for its dependents to be deleted", n.identity)
// if the n is added as a "virtual" node, its deletingDependents field is not properly set, so always set it here.
n.markDeletingDependents()
for dep := range n.dependents {
gb.attemptToDelete.Add(dep)
}
gb.attemptToDelete.Add(n)
}
}
func startsWaitingForDependentsOrphaned(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerOrphanDependents)
}
func startsWaitingForDependentsDeleted(oldObj interface{}, newAccessor metav1.Object) bool {
return deletionStartsWithFinalizer(oldObj, newAccessor, metav1.FinalizerDeleteDependents)
}
func deletionStartsWithFinalizer(oldObj interface{}, newAccessor metav1.Object, matchingFinalizer string) bool {
// if the new object isn't being deleted, or doesn't have the finalizer we're interested in, return false
if !beingDeleted(newAccessor) || !hasFinalizer(newAccessor, matchingFinalizer) {
return false
}
// if the old object is nil, or wasn't being deleted, or didn't have the finalizer, return true
if oldObj == nil {
return true
}
oldAccessor, err := meta.Accessor(oldObj)
if err != nil {
utilruntime.HandleError(fmt.Errorf("cannot access oldObj: %v", err))
return false
}
return !beingDeleted(oldAccessor) || !hasFinalizer(oldAccessor, matchingFinalizer)
}
func beingDeleted(accessor metav1.Object) bool {
return accessor.GetDeletionTimestamp() != nil
}
func hasFinalizer(accessor metav1.Object, matchingFinalizer string) bool {
finalizers := accessor.GetFinalizers()
for _, finalizer := range finalizers {
if finalizer == matchingFinalizer {
return true
}
}
return false
}
复制代码
1.2.3 gb.removeNode
调用gb.removeNode
,从uidToNode
中删除该对象,然后从该node
所有owners
的dependents
中删除该对象,再把该对象的dependents
放入到attemptToDelete
队列中,触发GarbageCollector
处理;最后检查该 node
的所有 owners
,若有处于删除状态的 owner
,此时该 owner
可能处于删除阻塞状态正在等待该 node
的删除,将该 owner
加入到 attemptToDelete
队列中,触发GarbageCollector
处理。
// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) removeNode(n *node) {
gb.uidToNode.Delete(n.identity.UID)
gb.removeDependentFromOwners(n, n.owners)
}
func (gb *GraphBuilder) removeDependentFromOwners(n *node, owners []metav1.OwnerReference) {
for _, owner := range owners {
ownerNode, ok := gb.uidToNode.Read(owner.UID)
if !ok {
continue
}
ownerNode.deleteDependent(n)
}
}
复制代码
2.GarbageCollector
再来看到GarbageCollector
。
GarbageCollector 主要有 2 个功能:
(1)处理 attemptToDelete
队列中的事件,根据对象删除策略foreground
或background
做相应的回收逻辑处理,删除关联对象;
(2)处理 attemptToOrphan
队列中的事件,根据对象删除策略Orphan
,更新该owner
的所有dependents
对象,将对象的OwnerReferences
属性中该owner
的相关字段去除,接着再更新该owner
对象,去除Orphan
删除策略对应的finalizers
。
GarbageCollector 的 2 个关键处理方法:
(1)gc.runAttemptToDeleteWorker
:主要负责处理attemptToDelete
队列中的事件,负责删除策略为foreground
或background
的对象回收处理;
(2)gc.runAttemptToOrphanWorker
:主要负责处理attemptToOrphan
队列中的事件,负责删除策略为Orphan
的对象回收处理。
2.1 GarbageCollector struct
先来简单的分析下GarbageCollector struct
,里面最关键的几个属性及作用如下:
(1)attemptToDelete
与attemptToOrphan
:GraphBuilder
作为生产者往attemptToDelete
和 attemptToOrphan
两个队列中存放事件,然后GarbageCollector
作为消费者会处理 attemptToDelete
和 attemptToOrphan
两个队列中的事件。
// pkg/controller/garbagecollector/garbagecollector.go
type GarbageCollector struct {
...
attemptToDelete workqueue.RateLimitingInterface
attemptToOrphan workqueue.RateLimitingInterface
...
}
复制代码
2.2 GarbageCollector-gc.runAttemptToDeleteWorker
接下来看到GarbageCollector
的处理逻辑部分,从gc.runAttemptToDeleteWorker
作为入口进行处理逻辑分析。
runAttemptToDeleteWorker 主要逻辑为循环调用attemptToDeleteWorker
方法。
attemptToDeleteWorker 方法主要逻辑:
(1)从attemptToDelete
队列中取出对象;
(2)调用 gc.attemptToDeleteItem
尝试删除 node
;
(3)若删除失败则重新加入到 attemptToDelete
队列中进行重试。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToDeleteWorker() {
for gc.attemptToDeleteWorker() {
}
}
func (gc *GarbageCollector) attemptToDeleteWorker() bool {
item, quit := gc.attemptToDelete.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToDelete.Done(item)
n, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
err := gc.attemptToDeleteItem(n)
if err != nil {
if _, ok := err.(*restMappingError); ok {
// There are at least two ways this can happen:
// 1. The reference is to an object of a custom type that has not yet been
// recognized by gc.restMapper (this is a transient error).
// 2. The reference is to an invalid group/version. We don't currently
// have a way to distinguish this from a valid type we will recognize
// after the next discovery sync.
// For now, record the error and retry.
klog.V(5).Infof("error syncing item %s: %v", n, err)
} else {
utilruntime.HandleError(fmt.Errorf("error syncing item %s: %v", n, err))
}
// retry if garbage collection of an object failed.
gc.attemptToDelete.AddRateLimited(item)
} else if !n.isObserved() {
// requeue if item hasn't been observed via an informer event yet.
// otherwise a virtual node for an item added AND removed during watch reestablishment can get stuck in the graph and never removed.
// see https://issue.k8s.io/56121
klog.V(5).Infof("item %s hasn't been observed via informer yet", n.identity)
gc.attemptToDelete.AddRateLimited(item)
}
return true
}
复制代码
2.2.1 gc.attemptToDeleteItem
主要逻辑:
(1)判断 node
是否处于删除状态;
(2)从 apiserver
获取该 node
对应的对象;
(3)调用item.isDeletingDependents
方法:通过 node
的 deletingDependents
字段判断该 node
当前是否正在删除 dependents
,若是则调用 gc.processDeletingDependentsItem
方法对dependents
做进一步处理:检查该node
的 blockingDependents
是否被完全删除,若是则移除该 node
对应对象的相关 finalizer
,若否,则将未删除的 blockingDependents
加入到 attemptToDelete
队列中;
上面分析GraphBuilder
时说到,在 GraphBuilder
处理 graphChanges
中的事件时,在processTransitions
方法逻辑里,会调用n.markDeletingDependents
,标记 node
的 deletingDependents
属性为 true
;
(4)调用gc.classifyReferences
将 node
的owner
分为 3 类,分别是solid
(至少有一个 owner
存在且不处于删除状态)、dangling
(owner
均不存在)、waitingForDependentsDeletion
(owner
存在,处于删除状态且正在等待其 dependents
被删除);
(5)接下来将根据solid
、dangling
与waitingForDependentsDeletion
的数量做不同的逻辑处理;
(6)第一种情况:当solid
数量不为 0 时,即该node
至少有一个 owner
存在且不处于删除状态,则说明该对象还不能被回收删除,此时将 dangling
和 waitingForDependentsDeletion
列表中的 owner
从 node
的 ownerReferences
中删除;
(7)第二种情况:solid
数量为 0,该 node
的 owner
处于 waitingForDependentsDeletion
状态并且 node
的 dependents
未被完全删除,将使用foreground
前台删除策略来删除该node
对应的对象;
(8)当不满足以上两种情况时(即),进入该默认处理逻辑:按照删除对象时使用的删除策略,调用 apiserver
的接口删除对象。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) attemptToDeleteItem(item *node) error {
klog.V(2).Infof("processing item %s", item.identity)
// "being deleted" is an one-way trip to the final deletion. We'll just wait for the final deletion, and then process the object's dependents.
if item.isBeingDeleted() && !item.isDeletingDependents() {
klog.V(5).Infof("processing item %s returned at once, because its DeletionTimestamp is non-nil", item.identity)
return nil
}
// TODO: It's only necessary to talk to the API server if this is a
// "virtual" node. The local graph could lag behind the real status, but in
// practice, the difference is small.
latest, err := gc.getObject(item.identity)
switch {
case errors.IsNotFound(err):
// the GraphBuilder can add "virtual" node for an owner that doesn't
// exist yet, so we need to enqueue a virtual Delete event to remove
// the virtual node from GraphBuilder.uidToNode.
klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
case err != nil:
return err
}
if latest.GetUID() != item.identity.UID {
klog.V(5).Infof("UID doesn't match, item %v not found, generating a virtual delete event", item.identity)
gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity)
// since we're manually inserting a delete event to remove this node,
// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete
item.markObserved()
return nil
}
// TODO: attemptToOrphanWorker() routine is similar. Consider merging
// attemptToOrphanWorker() into attemptToDeleteItem() as well.
if item.isDeletingDependents() {
return gc.processDeletingDependentsItem(item)
}
// compute if we should delete the item
ownerReferences := latest.GetOwnerReferences()
if len(ownerReferences) == 0 {
klog.V(2).Infof("object %s's doesn't have an owner, continue on next item", item.identity)
return nil
}
solid, dangling, waitingForDependentsDeletion, err := gc.classifyReferences(item, ownerReferences)
if err != nil {
return err
}
klog.V(5).Infof("classify references of %s.\nsolid: %#v\ndangling: %#v\nwaitingForDependentsDeletion: %#v\n", item.identity, solid, dangling, waitingForDependentsDeletion)
switch {
case len(solid) != 0:
klog.V(2).Infof("object %#v has at least one existing owner: %#v, will not garbage collect", item.identity, solid)
if len(dangling) == 0 && len(waitingForDependentsDeletion) == 0 {
return nil
}
klog.V(2).Infof("remove dangling references %#v and waiting references %#v for object %s", dangling, waitingForDependentsDeletion, item.identity)
// waitingForDependentsDeletion needs to be deleted from the
// ownerReferences, otherwise the referenced objects will be stuck with
// the FinalizerDeletingDependents and never get deleted.
ownerUIDs := append(ownerRefsToUIDs(dangling), ownerRefsToUIDs(waitingForDependentsDeletion)...)
patch := deleteOwnerRefStrategicMergePatch(item.identity.UID, ownerUIDs...)
_, err = gc.patch(item, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, ownerUIDs...)
})
return err
case len(waitingForDependentsDeletion) != 0 && item.dependentsLength() != 0:
deps := item.getDependents()
for _, dep := range deps {
if dep.isDeletingDependents() {
// this circle detection has false positives, we need to
// apply a more rigorous detection if this turns out to be a
// problem.
// there are multiple workers run attemptToDeleteItem in
// parallel, the circle detection can fail in a race condition.
klog.V(2).Infof("processing object %s, some of its owners and its dependent [%s] have FinalizerDeletingDependents, to prevent potential cycle, its ownerReferences are going to be modified to be non-blocking, then the object is going to be deleted with Foreground", item.identity, dep.identity)
patch, err := item.unblockOwnerReferencesStrategicMergePatch()
if err != nil {
return err
}
if _, err := gc.patch(item, patch, gc.unblockOwnerReferencesJSONMergePatch); err != nil {
return err
}
break
}
}
klog.V(2).Infof("at least one owner of object %s has FinalizerDeletingDependents, and the object itself has dependents, so it is going to be deleted in Foreground", item.identity)
// the deletion event will be observed by the graphBuilder, so the item
// will be processed again in processDeletingDependentsItem. If it
// doesn't have dependents, the function will remove the
// FinalizerDeletingDependents from the item, resulting in the final
// deletion of the item.
policy := metav1.DeletePropagationForeground
return gc.deleteObject(item.identity, &policy)
default:
// item doesn't have any solid owner, so it needs to be garbage
// collected. Also, none of item's owners is waiting for the deletion of
// the dependents, so set propagationPolicy based on existing finalizers.
var policy metav1.DeletionPropagation
switch {
case hasOrphanFinalizer(latest):
// if an existing orphan finalizer is already on the object, honor it.
policy = metav1.DeletePropagationOrphan
case hasDeleteDependentsFinalizer(latest):
// if an existing foreground finalizer is already on the object, honor it.
policy = metav1.DeletePropagationForeground
default:
// otherwise, default to background.
policy = metav1.DeletePropagationBackground
}
klog.V(2).Infof("delete object %s with propagation policy %s", item.identity, policy)
return gc.deleteObject(item.identity, &policy)
}
}
复制代码
gc.processDeletingDependentsItem
主要逻辑:检查该node
的 blockingDependents
(即阻塞owner
删除的dpendents
)是否被完全删除,若是则移除该 node
对应对象的相关 finalizer
(finalizer 移除后,kube-apiserver 会删除该对象),若否,则将未删除的 blockingDependents
加入到 attemptToDelete
队列中。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
blockingDependents := item.blockingDependents()
if len(blockingDependents) == 0 {
klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
}
for _, dep := range blockingDependents {
if !dep.isDeletingDependents() {
klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
gc.attemptToDelete.Add(dep)
}
}
return nil
}
复制代码
item.blockingDependents
item.blockingDependents 返回会阻塞node
删除的dependents
。一个dependents
会不会阻塞owner
的删除,主要看这个dependents
的ownerReferences
的blockOwnerDeletion
属性值是否为true
,为true
则代表该dependents
会阻塞owner
的删除。
// pkg/controller/garbagecollector/graph.go
func (n *node) blockingDependents() []*node {
dependents := n.getDependents()
var ret []*node
for _, dep := range dependents {
for _, owner := range dep.owners {
if owner.UID == n.identity.UID && owner.BlockOwnerDeletion != nil && *owner.BlockOwnerDeletion {
ret = append(ret, dep)
}
}
}
return ret
}
复制代码
2.3 GarbageCollector-gc.runAttemptToOrphanWorker
gc.runAttemptToOrphanWorker 方法是负责处理orphan
删除策略删除的 node
。
gc.runAttemptToDeleteWorker 主要逻辑为循环调用gc.attemptToDeleteWorker
方法。
下面来看一下gc.attemptToDeleteWorker
方法的主要逻辑:
(1)从attemptToOrphan
队列中取出对象;
(2)调用gc.orphanDependents
方法:更新该owner
的所有dependents
对象,将对象的OwnerReferences
属性中该owner
的相关字段去除,失败则将该owner
重新加入到attemptToOrphan
队列中;
(3)调用gc.removeFinalizer
方法:更新该owner
对象,去除Orphan
删除策略对应的finalizers
。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) runAttemptToOrphanWorker() {
for gc.attemptToOrphanWorker() {
}
}
func (gc *GarbageCollector) attemptToOrphanWorker() bool {
item, quit := gc.attemptToOrphan.Get()
gc.workerLock.RLock()
defer gc.workerLock.RUnlock()
if quit {
return false
}
defer gc.attemptToOrphan.Done(item)
owner, ok := item.(*node)
if !ok {
utilruntime.HandleError(fmt.Errorf("expect *node, got %#v", item))
return true
}
// we don't need to lock each element, because they never get updated
owner.dependentsLock.RLock()
dependents := make([]*node, 0, len(owner.dependents))
for dependent := range owner.dependents {
dependents = append(dependents, dependent)
}
owner.dependentsLock.RUnlock()
err := gc.orphanDependents(owner.identity, dependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("orphanDependents for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
return true
}
// update the owner, remove "orphaningFinalizer" from its finalizers list
err = gc.removeFinalizer(owner, metav1.FinalizerOrphanDependents)
if err != nil {
utilruntime.HandleError(fmt.Errorf("removeOrphanFinalizer for %s failed with %v", owner.identity, err))
gc.attemptToOrphan.AddRateLimited(item)
}
return true
}
复制代码
2.3.1 gc.orphanDependents
主要逻辑:更新指定owner
的所有dependents
对象,将对象的OwnerReferences
属性中该owner
的相关字段去除,对于每个dependents
,分别起一个 goroutine 来处理,加快处理速度。
// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) orphanDependents(owner objectReference, dependents []*node) error {
errCh := make(chan error, len(dependents))
wg := sync.WaitGroup{}
wg.Add(len(dependents))
for i := range dependents {
go func(dependent *node) {
defer wg.Done()
// the dependent.identity.UID is used as precondition
patch := deleteOwnerRefStrategicMergePatch(dependent.identity.UID, owner.UID)
_, err := gc.patch(dependent, patch, func(n *node) ([]byte, error) {
return gc.deleteOwnerRefJSONMergePatch(n, owner.UID)
})
// note that if the target ownerReference doesn't exist in the
// dependent, strategic merge patch will NOT return an error.
if err != nil && !errors.IsNotFound(err) {
errCh <- fmt.Errorf("orphaning %s failed, %v", dependent.identity, err)
}
}(dependents[i])
}
wg.Wait()
close(errCh)
var errorsSlice []error
for e := range errCh {
errorsSlice = append(errorsSlice, e)
}
if len(errorsSlice) != 0 {
return fmt.Errorf("failed to orphan dependents of owner %s, got errors: %s", owner, utilerrors.NewAggregate(errorsSlice).Error())
}
klog.V(5).Infof("successfully updated all dependents of owner %s", owner)
return nil
}
复制代码
总结
先来回顾一下garbage collector
的构架与核心处理逻辑。
garbage collector 的主要组成为 1 个图(对象关联依赖关系图)、2 个处理器(GraphBuilder
与GarbageCollector
)、3 个事件队列(graphChanges
、attemptToDelete
与attemptToOrphan
)。
从 apiserver list/watch 的事件会放入到graphChanges
队列,而GraphBuilder
从graphChanges
队列中取出事件进行处理,构建对象关联依赖关系图,并根据对象删除策略将关联对象放入attemptToDelete
或attemptToOrphan
队列中,接着GarbageCollector
会从attemptToDelete
与attemptToOrphan
队列中取出事件,再从对象关联依赖关系图中获取信息进行处理,最后回收删除对象。
对象删除策略
总结一下 3 种对象删除策略下,node
及其对象的删除过程。
Foreground 前台删除
Foreground 即前台删除策略,属于级联删除策略,垃圾收集器会删除对象的所有dependent
。
使用前台删除策略删除某个对象时,该对象的 deletionTimestamp
字段被设置,且对象的 metadata.finalizers
字段包含值 foregroundDeletion
,用于阻塞该对象删除,等到垃圾收集器在删除了该对象中所有有阻塞能力的dependent
对象(对象的 ownerReference.blockOwnerDeletion=true
) 之后,再去除该对象的 metadata.finalizers
字段中的值 foregroundDeletion
,然后删除该owner
对象。
以删除 deployment 为例,使用前台删除策略,则按照 Pod->ReplicaSet->Deployment 的顺序进行删除。
Background 后台删除
Background 即后台删除策略,属于级联删除策略,Kubernetes 会立即删除该owner
对象,之后垃圾收集器会在后台自动删除其所有的dependent
对象。
当删除一个对象时使用了Background
后台删除策略时,该对象因没有相关的Finalizer
设置(只有删除策略为foreground
或Orphan
时会设置相关Finalizer
),会直接被删除,接着GraphBuilder
会监听到该对象的 delete 事件,会将其dependents
放入到attemptToDelete
队列中去,触发GarbageCollector
做dependents
对象的回收删除处理。
以删除 deployment 为例,使用后台删除策略,则按照 Deployment->ReplicaSet->Pod 的顺序进行删除。
Orphan 孤儿删除
Orphan 即孤儿删除策略,属于非级联删除策略,即删除某个对象时,不会自动删除它的dependent
,这些dependent
也被称作孤立对象。
当删除一个对象时使用了Orphan
孤儿删除策略时,该对象的 metadata.finalizers
字段包含值 orphan
,用于阻塞该对象删除,直至GarbageCollector
将其所有dependents
的OwnerReferences
属性中的该owner
的相关字段去除,再去除该owner
对象的 metadata.finalizers
字段中的值 Orphan
,最后才能删除该owner
对象。
以删除 deployment 为例,使用孤儿删除策略,则只删除 Deployment,对应 ReplicaSet 和 Pod 不删除。
评论