《containerd 系列》【史上最全】带你全方位了解 containerd 的几种插件扩展模式

作者：公众号：云原生Serverless

2024-04-26
浙江
本文字数：7025 字
阅读完需：约 23 分钟

除了 snapshotter，containerd 的扩展机制你还了解哪些？
本文内容节选自 《containerd 原理剖析与实战》，本书正参加限时优惠内购，点击阅读原文，限时 69.9 元购买。

进入正题之前先看一下 containerd 的整体架构

1. containerd 架构

图 containerd 架构

架构分层介绍: containerd 总体架构分三层：ecosystem (生态层)、containerd (containerd 内部架构)、system (系统层)。

ecosystem (生态层) ecosystem (生态层) 分 Platfrom 和 Client 两层：
Platform: 平台层与 containerd 的设计理念相吻合（嵌入到更大的系统中），作为工业标准的容器运行时通过屏蔽底层差异向上支撑多个平台: 谷歌 GCP、亚马逊 Fargate、微软 Azure、Rancher 等
Client: 客户端是 ecosystem 层连接 containerd 的适配层，containerd 技术上还是经典的 CS 架构，containerd 客户端通过 gRPC 调用 containerd 服务端的 API 进行操作。containerd 暴露的接口有两类: 一类是 CRI 接口，该接口是 Kubernetes 定义的，用于对接不同容器运行时进行的规范与抽象，contaienrd 通过内置的 CRI Plugin 实现了 CRI 的接口，该接口主要是向上对接 Kubernetes 集群，或者 crictl；另一类是通过 containerd 提供的 Client SDK 来访问 containerd 自己定义的接口，该接口向上主要对接的是非 Kubernetes 类的上层 Paas 或更高级的运行时，如 Docker，BuildKit、ctr 等。
containerd（containerd 内部架构） containerd 这一层主要是 containerd 的 Server 实现层，逻辑上分三层：API 层、Core 层、Backend 层。
API：API 层提供北向服务 GRPC 调用接口和 Prometheus 数据采集接口，API 支持 Kuberntes CRI 标准和 containerd client 两种形式。
core：core 层是核心逻辑层，包含服务和元数据。
**Backend：**Backend 层主要是南向对接操作系统容器运行时，支持通过不同的 Plugin 来扩展，这里比较重要的是 containerd-shim ，containerd 通过 shim 对接不同的容器运行时，如 kata、runc 、runhs、 gVisor、firecracker 等。
system（系统层） system 层主要是 containerd 支持的底层操作系统及架构，当前支持 Windows 和 Linux, 架构上支持 x86 和 arm。

2. containerd Backend

在 containerd 的 API 层和 Core 层之下，有一层 Backend 层，该层主要对接操作系统容器运行时，该层也是 containerd 对接外部插件的扩展层。Backend 主要包括两大类，proxy plugin，以及 containerd shim。如下图所示。

图 containerd Backend 与扩展

如图所示， proxy plugin（代理插件）有三种类型: content、diff 以及 snapshotter。其中，containerd 的 snapshotter 在之前的文章《一文了解 containerd 中的 snapshot》中已经讲过。

接下来介绍 content、diff 两种 proxy plugin，以及 containerd 的 Runtime 和 shim 扩展机制。

3. containerd proxy plugin

containerd 中的微服务都是以插件的形式松耦合的联系在一起，例如 service plugin，grpc plugin，snapshot plugin 等。containerd 除了内置的插件之外，还提供了一种使用外部插件的方式，即代理插件 (proxy plugin)。

在 containerd 中支撑的代理插件类型有 content 和 snapshot,以及 diff (containerd 1.7.1 中新增的类型)，在 containerd 配置文件中配置代理插件的方式参见下面的示例：

#/etc/containerd/config.tomlversion = 2[proxy_plugins]  [proxy_plugins.<plugin name>]    type = "snapshot"    address = "/var/run/mysnapshotter.sock"

复制代码

proxy plugin 中可以配置多个代理插件，每个代理插件配置为 [proxy_plugins.<plugin name>] 其中, <plugin name> 表示插件的名称。插件的配置仅有两个参数：

type: 代理插件的类型，containerd 当前版本 (1.7.1) 支持三种，content、diff 和 snapshot
address: 代理插件监听的 socket 地址，containerd 通过该地址与代理插件通过 grpc 进行通信。代理插件注册后，可以跟内部插件一样使用，可以通过 ctr plugin ls 查看注册好的代理插件。接下来介绍 snapshotter、content、以及 diff 插件的配置。

1. snapshotter 插件的配置及使用

以 nydus 为例，介绍 nydus 代理插件的配置及使用。snapshotter 可以通过 ctr nerdctl 以及 cri 插件来使用，接下来的实例通过 cri 插件来演示。通过 cri 插件的配置参数 snapshotter = "nydus"。

...[plugins."io.containerd.grpc.v1.cri"]  [plugins."io.containerd.grpc.v1.cri".containerd]      snapshotter = "nydus"      disable_snapshot_annotations = false  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]      runtime_type = "io.containerd.runc.v2"  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]      runtime_type = "io.containerd.kata.v2"      privileged_without_host_devices = true...[proxy_plugins]  [proxy_plugins.nydus]     type = "snapshot"     address = "/var/lib/containerd/io.containerd.snapshotter.v1.nydus/containerd-nydus-grpc.sock"

复制代码

2. content 插件的配置及使用

content 接口用于管理数据以及数据对应的元信息，例如镜像数据 (config、manifest、targz 等原始数据)，元数据信息保存在 metadata 中，真正的二进制数据则保留在 /var/lib/containerd/io.containerd.content.v1.content 中。该接口关联的 ctr 命令如下。

ctr content command [command options] [arguments...]

复制代码

使用 content 的典型场景是拉取镜像时对镜像的保存，具体可参考本文 6.1.3 节中讲述的镜像拉取过程。如下图所示。

图 containerd 拉取镜像到准备容器 rootfs

镜像拉取过程中使用 content 的流程如下:

镜像拉取后，镜像的 manifest 文件和镜像层 targz 文件通过 content API 接口的 write 方法写入到宿主机上，同时更新 content 的元数据信息(metadata)。
镜像拉取过程中同时涉及 image API 的操作，通过 image API 更新 image 的元数据信息到 metadata 中。
镜像解压到 snapshot 的过程，则会调用 image 的 API 以及 content API 的 Read 接口，读取镜像的 manifest 文件和镜像层 targz 文件，解压到 snapshot 对应的挂载目录中。

不同于 snapshotter ，containerd 中仅支持一种 content 插件，即要么是 containerd 内置的 content plugin，要么是自行实现的 content plugin。自行实现 content plugin 需要实现 ContentServer 的接口，如下所示。

type ContentServer interface { Info(context.Context, *InfoRequest) (*InfoResponse, error) Update(context.Context, *UpdateRequest) (*UpdateResponse, error) List(*ListContentRequest, Content_ListServer) error Delete(context.Context, *DeleteContentRequest) (*types.Empty, error) Read(*ReadContentRequest, Content_ReadServer) error Status(context.Context, *StatusRequest) (*StatusResponse, error) ListStatuses(context.Context, *ListStatusesRequest) (*ListStatusesResponse, error) Write(Content_WriteServer) error Abort(context.Context, *AbortRequest) (*types.Empty, error)    mustEmbedUnimplementedContentServer()}

复制代码

接口实现可以参考如下代码

func main() {   socket := "/run/containerd/content.sock"   // 1. implement content server   svc := NewContentStorer()   // 2. registry content server   rpc := grpc.NewServer()   content.RegisterContentServer(rpc, svc)   l, err := net.Listen("unix", socket)   if err != nil {      log.Fatalf("listen to address %s failed:%s", socket, err)   }   if err := rpc.Serve(l); err != nil {      log.Fatalf("serve rpc on address %s failed:%s", socket, err)   }}type Mycontent struct {   content.UnimplementedContentServer}func (m Mycontent) Info(ctx context.Context, request *content.InfoRequest) (*content.InfoResponse, error) {   //TODO implement me}... 省略其他接口实现

复制代码

上述代码将监听 /run/containerd/content.sock 地址，在 containerd 中若想使用该 content plugin，需要禁用内置的 content plugin，配置如下。

...disabled_plugins = ["io.containerd.content.v1.content"]...[proxy_plugins]  [proxy_plugins.mycontent]     type = "content"     address = "/run/containerd/content.sock"

复制代码

❝
【注意】代理 content 插件用于远程存储的场景，不过使用远程存储更推荐使用 snapshotter 的方式，因为 containerd 代理 content 插件会带来巨大的开销。

3. Diff 插件的配置及使用

diff 接口用于镜像层内容和 rootfs 之间的转化操作，其中 Diff 函数用于将两个挂载目录（如 overlay 中的 upper 和 lower ）之间的差异生成符合 OCI 规范的 tar 文件并保存。Apply 函数则相反，将 Diff 生成的 tar 文件解压并挂载到指定目录。如图所示。

图 containerd diff 接口的操作

该接口关联的 ctr 命令为:

ctr snapshots diff [command options] [flags] <idA> [<idB>]

复制代码

相比 content 插件，Diff 代理插件就比较灵活了，类似 snapshotter 插件，可以配置多个 Diff 插件，containerd 会依次执行，如下配置, containerd 将会依次执行外置 proxydiff 插件和内置 walking 插件的相关方法。

...  [plugins."io.containerd.service.v1.diff-service"]    default = ["proxydiff", "walking"]...[proxy_plugins]  [proxy_plugins."proxydiff"]    type = "diff"    address = "/tmp/proxy.sock"

复制代码

Diff 插件同样需要实现特定的接口: DiffServer ,如下

···type DiffServer interface {   Apply(context.Context, *ApplyRequest) (*ApplyResponse, error)   Diff(context.Context, *DiffRequest) (*DiffResponse, error)   mustEmbedUnimplementedDiffServer()}

复制代码

具体实现可以参考示例 github.com/zhaojizhuang/containerd-diff-example

4. containerd 中的 Runtime 和 Shim

Contaienrd Backend 中除了三个 proxy plugin 之外，还有一个 containerd 中最重要的扩展插件------Shim。

启动 contianerd 中的 task 时，会启动 containerd 中对应的 Shim 来启动容器。如下图所示。

图 containerd Shim 与 OCI Runtime

如图所示，containerd 与底层 OCI Runtime 通过Shim 连接， containerd 中的 Runtime V2 模块（最早支持的 Runtime V1 已经在 1.7.1 版本中移除）负责 shim 的管理。

1. Shim 机制

Shim 机制是 containerd 中设计的用来扩展不同容器运行时的机制，不同运行时的开发者可以通过该机制，将自己的容器运行时集成在 containerd 中。当前 containerd 支持的是 V2 版本的 Runtime Shim。V1 版本的相关 Runtime Shim 已在 1.7.1 版本中废弃。通过 ctr 、nerdctl 或者 CRI Plugin 通过指定 runtime 字段来启动特定的容器运行时。如下 1) ctr 指定 runtime 启动容器

通过 ctr run --runtime 指定特定的容器运行时来启动容器，如下。

ctr run --runtime io.containerd.runc.v2 xxx

复制代码

2) nerdctl 指定 runtime 启动容器

通过 nerdctl run --runtime 来指定特定的容器运行时来启动容器，如下。

nerdctl run --runtime io.containerd.kata.v2 xxx

复制代码

3) CRI Plugin 中通过 runtime_type 字段指定 runtime

CRI Plugin 使用时通过会结合 RuntimeClass 一起使用, 例如使用 kata 时 CRI Plugin 的配置参数如下。

[plugins."io.containerd.grpc.v1.cri".containerd]  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]      runtime_type = "io.containerd.kata.v2"

复制代码

当 containerd 用户通过 runtime 指定时，containerd 在调用时会将 runtime 的名称解析为二进制文件，并在 $PATH 中查找对应的二进制文件。

例如 runtime io.containerd.runc.v2 会被解析成二进制文件 containerd-shim-runc-v2，客户端在创建容器时可以指定使用哪个 shim，如果不指定就使用默认的 shim (containerd 中默认的 runtime 为 io.containerd.runc.v2 )。

2. containerd 支持的 Shim

只要是符合 containerd Shim API 规范的 shim，containerd 都可以支持对接，当前 containerd 支持的 Shim 如表 7.8 所示。表 7.8 containerd 支持的 Shim

5. containerd Shim 规范

关于 Shim 机制，containerd 定义了一套完整的规范，来帮助容器运行时的作者来实现自己的 Shim。接下来介绍 containerd 中的 Shim API。 containerd 与 Shim 交互如图 7.16 所示。

图 containerd 调用 shim 的两种方式

Runtime Shim API 定义了两种调用方式：

二进制调用方式: 通过 shim start 命令直接启动 shim 二进制，shim 二进制启动后会启动对应的 ttrpc Server。启动命令示例如 containerd-shim-runc-v2 start -namespace xxx -address /run/containerd/containerd.sock -id xxx。
ttrpc 调用方式: shim 进程启动后便充当了 ttrpc Server 的角色，之后 containerd 与 shim 的交互都走 ttrpc 调用。

6. Shim 工作流程解析

下面通过一个具体的例子说明容器启动时 Shim 与 containerd 交互的流程。

以 ctr 启动 nginx 容器为例。命令如下。

ctr image pull docker.io/library/nginx:latestctr run docker.io/library/nginx:latest nginx

复制代码

注意这里 ctr run 启动容器时，containerd 启动时默认使用的 runtime 为 io.containerd.runc.v2。

启动容器时，containerd 与 shim 的交互机制如下图。

图通过 ctr 启动容器时 containerd 与 Shim 交互的流程

如上图所示，通过 ctr 创建容器时的相关调用流程如下:

ctr run 命令之后，首先会调用 containerd 的 Create Container 接口，将 container 数据保存在 metadb 中。
Container 创建成功后，返回对应的 Container ID。
Container 创建之后 ctr 会调用 containerd 的 Task Create 接口。
containerd 为容器运行准备 OCI Bundle，其中 Bundle 中的 rootfs 通过调用 snapshotter 来准备。
OCI Bundle 准备好之后，containerd 根据指定或默认的运行时名称解析 shim 二进制文件，例如：io.containerd.runc.v2 -> containerd-shim-runc-v2 ，containerd 通过 start 命令启动 shim 二进制文件，并加上一些额外的参数，用于定义命名空间、OCI bundle 路径、debug 模式，containerd 监听的 unix socket 地址等。在这一步调用中，当前工作目录 (OCI Bundle 路径) 设置为 shim 的工作路径。
调用 shim start 后，shim 启动 ttrpc server，并监听特定的 unix socket 地址，该 path 在<oci bundle path>/address 文件中的内容即为该 unix socket 的地址，为 unix:///run/containerd/s/xxxxx
ttrpc Server 正常启动后，shim start 命令正常返回，将 shim ttrpc server 监听的 unix socket 地址通过 stdout 返回给 containerd。
containerd 为每个 shim 准备 ttrpc 的 client，用于和该 shim ttrpc server 进行通信。
containerd 调用 shim 的 TaskServer.Create 接口, shim 负责将请求参数 CreateTaskRequest 中的 Mount 信息中的文件系统挂载到 OCI Bundle 中的 rootfs/ 目录。
对 shim 的 ttrpc 调用执行成功后返回 Task ID。
containerd 返回给 ctr Task 的 ID。
ctr 通过 Start Task 调用 containerd 来启动容器进程
contaienrd 通过 ttrpc 调用 shim 的 TaskServer.Start 方法，这一步是真正启动容器内的进程。
shim 执行 Start 成功后返回给 containerd
接下来 ctr 调用 containerd 的 task.Wait API
触发 containerd 调用 shim 的 TaskService.Wait API。该请求会一直阻塞，直到容器退出后才会返回。
shim 进程退出后会将进程退出码返回给 containerd
containerd 返回给 ctr 客户端进程退出状态。接下来是停止容器的流程。

图通过 ctr 停止容器时 containerd 与 Shim 交互的流程

如图所示，展示的是通过 ctr task kill 删除容器时的相关调用流程，即

ctr task kill nginx

复制代码

下面讲述下 kill 容器过程中的相关调用流程:。

执行 ctr kill 之后，ctr 调用 containerd 的 Task Kill API。
触发 containerd 通过 ttrpc 调用 shim 的 TaskService.Kill API, Shim 会通过给进程发送 SIGTERM（等同于 shell kill）信号来通知容器进程退出，在容器进程超时未结束时再发送 SIGKILL (等同于 shell kill -9)。
Kill 调用执行成功后返回给 containerd。
containerd 返回成功给 ctr 客户端。
ctr 继续调用 containerd 的 Task Delete API，该调用 containerd 会删除 task 记录，同时会调用 shim 的相关来清理 shim 资源。
containerd 首先会调用 Shim 的 TaskService.Delete API, shim 会删除容器对应的资源。
Shim 返回 Delete 成功信号给 containerd
containerd 继续调用 Shim 的 TaskService.Shutdown API, 该调用中 Shim 会停止 ttrpc Server 并退出 Shim 进程。
Shim 退出成功
containerd 关闭 shim 对应的 ttrpc Client。
containerd 通过二进制调用方式执行 delete，即执行 containerd-shim-runc-v2 delete xxx 操作。
二进制调用 delete 会删除对应的 OCI Bundle。
containerd 返回容器删除成功信号给 ctr 客户端。

以上内容节选自新书《containerd 原理剖析与实战》

本文使用文章同步助手同步

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/601addcd8908a55d10bb09a99】。文章转载请联系作者。

公众号：云原生Serverless

关注

just do it 2018-09-25 加入

赵吉壮，《containerd 原理剖析与实战》作者，曾就职于华为 Cloud BU，字节跳动 Data 团队，专注于 k8s， Serverless, Go 云原生

发布

暂无评论

创作场景

《containerd 系列》【史上最全】带你全方位了解 containerd 的几种插件扩展模式

1. containerd 架构

2. containerd Backend

3. containerd proxy plugin

1. snapshotter 插件的配置及使用

2. content 插件的配置及使用

3. Diff 插件的配置及使用

4. containerd 中的 Runtime 和 Shim

1. Shim 机制

2. containerd 支持的 Shim

5. containerd Shim 规范

6. Shim 工作流程解析

公众号：云原生Serverless

评论