写点什么

ceph-csi 源码分析(6)-rbd driver-nodeserver 分析(下)

用户头像
良凯尔
关注
发布于: 2021 年 05 月 09 日
ceph-csi源码分析(6)-rbd driver-nodeserver分析(下)

kubernetes ceph-csi 分析 - 目录导航:

https://xie.infoq.cn/article/4b1d3e32f124307a49cd9c1e3


当 ceph-csi 组件启动时指定的 driver type 为 rbd 时,会启动 rbd driver 相关的服务。然后再根据controllerservernodeserver的参数配置,决定启动ControllerServerIdentityServer,或NodeServerIdentityServer


基于 tag v3.0.0


https://github.com/ceph/ceph-csi/releases/tag/v3.0.0


rbd driver 分析将分为 4 个部分,分别是服务入口分析、controllerserver 分析、nodeserver 分析与 IdentityServer 分析。



nodeserver 主要包括了 NodeGetCapabilities(获取 driver 能力)、NodeGetVolumeStats(存储探测及 metrics 获取)、NodeStageVolume(map rbd 与 mount stagingPath)、NodePublishVolume(mount targetPath)、NodeUnpublishVolume(umount targetPath)、NodeUnstageVolume(umount stagingPath 与 unmap rbd)、NodeExpandVolume(node 端存储扩容)操作,将一一进行分析。这节进行 NodeStageVolume、NodePublishVolume、NodeUnpublishVolume、NodeUnstageVolume 的分析。

nodeserver 分析(下)

ceph rbd 挂载知识讲解

rbd image map 成块设备,主要有两种方式:(1)通过 RBD Kernel Module,(2)通过 RBD-NBD。参考:https://www.jianshu.com/p/bb9d14bd897chttp://xiaqunfeng.cc/2017/06/07/Map-RBD-Devices-on-NBD/


一个 ceph rbd image 挂载给 pod,一共分为 2 个步骤,分别如下:


1.kubelet 组件调用 rbdType-nodeserver-ceph-csi 的 NodeStageVolume 方法,将 rbd image map 到 node 上的 rbd/nbd device,然后将 rbd device 格式化并 mount 到 staging path;


2.kubelet 组件调用 rbdType-nodeserver-ceph-csi 的 NodePublishVolume 方法,将上一步骤中的 staging path mount 到 target path。

ceph rbd 解除挂载知识讲解

一个 ceph rbd image 从 pod 中解除挂载,一共分为 2 个步骤,分别如下:


1.kubelet 组件调用 rbdType-nodeserver-ceph-csi 的 NodeUnpublishVolume 方法,解除掉stagingPathtargetPath的挂载关系。


2.kubelet 组件调用 rbdType-nodeserver-ceph-csi 的 NodeUnstageVolume 方法,先解除掉targetPathrbd/nbd device的挂载关系,然后再 unmap 掉 rbd/nbd device(即解除掉 node 端 rbd/nbd device 与 ceph rbd image 的挂载)。


rbd image 挂载给 pod 后,node 上会出现 2 个 mount 关系,示例如下:


# mount | grep nbd/dev/nbd0 on /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e2104b0f-774e-420e-a388-1705344084a4/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-40b130e1-a630-11eb-8bea-246e968ec20c type xfs (rw,relatime,nouuid,attr2,inode64,noquota,_netdev)/dev/nbd0 on /home/cld/kubernetes/lib/kubelet/pods/80114f88-2b09-440c-aec2-54c16efe6923/volumes/kubernetes.io~csi/pvc-e2104b0f-774e-420e-a388-1705344084a4/mount type xfs (rw,relatime,nouuid,attr2,inode64,noquota,_netdev)
复制代码


其中/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-e2104b0f-774e-420e-a388-1705344084a4/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-40b130e1-a630-11eb-8bea-246e968ec20c为 staging path;而/home/cld/kubernetes/lib/kubelet/pods/80114f88-2b09-440c-aec2-54c16efe6923/volumes/kubernetes.io~csi/pvc-e2104b0f-774e-420e-a388-1705344084a4/mount为 target path,/dev/nbd0为 nbd device。


注意


一个 rbd image 挂载给一个 node 上的多个 pod 时,NodeStageVolume方法只会被调用一次,NodePublishVolume会被调用多次,即出现该情况时,staging path只有一个,而target path会有多个。你可以这样理解,staging path对应的是 rbd image,而target path对应的是 pod,所以当一个 rbd image 挂载给一个 node 上的多个 pod 时,staging path只有一个,而target path会有多个。


解除挂载也同理,当挂载了某个 rbd image 的所有 pod 都被删除,NodeUnstageVolume方法才会被调用。

(4)NodeStageVolume

简介

将 rbd image map 到 node 上的 rbd/nbd device,并格式化后挂载到 staging path。


NodeStageVolume mounts the volume to a staging path on the node.


  • Stash image metadata under staging path

  • Map the image (creates a device)

  • Create the staging file/directory under staging path

  • Stage the device (mount the device mapped for image)


主要步骤:


(1)将 rbd image map 到 node 上的 rbd/nbd device;


(2)将 rbd device 格式化(volumeMode 为 block 时,不用格式化),并 mount 到 staging path。

NodeStageVolume

NodeStageVolume 主体流程:


(1)校验请求参数、校验 AccessMode;


(2)从请求参数中获取 volID;


(3)根据 secret 构建 ceph 请求凭证(secret 由 kubelet 传入);


(4)检查 stagingPath 是否存在,是否已经 mount;


(5)根据 volID 从 volume journal 中获取 image name;


(6)在 stagingParentPath 下创建 image-meta.json,用于存储 image 的元数据;


(7)调用 ns.stageTransaction 做 map 与 mount 操作。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRequest) (*csi.NodeStageVolumeResponse, error) { // (1)校验请求参数 if err := util.ValidateNodeStageVolumeRequest(req); err != nil { return nil, err } // 校验AccessMode isBlock := req.GetVolumeCapability().GetBlock() != nil disableInUseChecks := false // MULTI_NODE_MULTI_WRITER is supported by default for Block access type volumes if req.VolumeCapability.AccessMode.Mode == csi.VolumeCapability_AccessMode_MULTI_NODE_MULTI_WRITER { if !isBlock { klog.Warningf(util.Log(ctx, "MULTI_NODE_MULTI_WRITER currently only supported with volumes of access type `block`, invalid AccessMode for volume: %v"), req.GetVolumeId()) return nil, status.Error(codes.InvalidArgument, "rbd: RWX access mode request is only valid for volumes with access type `block`") }
disableInUseChecks = true } // (2)从请求参数中获取volID volID := req.GetVolumeId() // (3)根据secret构建ceph请求凭证 cr, err := util.NewUserCredentials(req.GetSecrets()) if err != nil { return nil, status.Error(codes.Internal, err.Error()) } defer cr.DeleteCredentials()
if acquired := ns.VolumeLocks.TryAcquire(volID); !acquired { klog.Errorf(util.Log(ctx, util.VolumeOperationAlreadyExistsFmt), volID) return nil, status.Errorf(codes.Aborted, util.VolumeOperationAlreadyExistsFmt, volID) } defer ns.VolumeLocks.Release(volID)
stagingParentPath := req.GetStagingTargetPath() stagingTargetPath := stagingParentPath + "/" + volID
// check is it a static volume staticVol := false val, ok := req.GetVolumeContext()["staticVolume"] if ok { if staticVol, err = strconv.ParseBool(val); err != nil { return nil, status.Error(codes.InvalidArgument, err.Error()) } } // (4)检查stagingPath是否存在,是否已经mount var isNotMnt bool // check if stagingPath is already mounted isNotMnt, err = mount.IsNotMountPoint(ns.mounter, stagingTargetPath) if err != nil && !os.IsNotExist(err) { return nil, status.Error(codes.Internal, err.Error()) }
if !isNotMnt { util.DebugLog(ctx, "rbd: volume %s is already mounted to %s, skipping", volID, stagingTargetPath) return &csi.NodeStageVolumeResponse{}, nil }
volOptions, err := genVolFromVolumeOptions(ctx, req.GetVolumeContext(), req.GetSecrets(), disableInUseChecks) if err != nil { return nil, status.Error(codes.Internal, err.Error()) } // (5)根据volID从volume journal中获取image name // get rbd image name from the volume journal // for static volumes, the image name is actually the volume ID itself switch { case staticVol: volOptions.RbdImageName = volID default: var vi util.CSIIdentifier var imageAttributes *journal.ImageAttributes err = vi.DecomposeCSIID(volID) if err != nil { err = fmt.Errorf("error decoding volume ID (%s) (%s)", err, volID) return nil, status.Error(codes.Internal, err.Error()) }
j, err2 := volJournal.Connect(volOptions.Monitors, cr) if err2 != nil { klog.Errorf( util.Log(ctx, "failed to establish cluster connection: %v"), err2) return nil, status.Error(codes.Internal, err.Error()) } defer j.Destroy()
imageAttributes, err = j.GetImageAttributes( ctx, volOptions.Pool, vi.ObjectUUID, false) if err != nil { err = fmt.Errorf("error fetching image attributes for volume ID (%s) (%s)", err, volID) return nil, status.Error(codes.Internal, err.Error()) } volOptions.RbdImageName = imageAttributes.ImageName }
volOptions.VolID = volID transaction := stageTransaction{} // (6)在stagingParentPath下创建image-meta.json,用于存储image的元数据 // Stash image details prior to mapping the image (useful during Unstage as it has no // voloptions passed to the RPC as per the CSI spec) err = stashRBDImageMetadata(volOptions, stagingParentPath) if err != nil { return nil, status.Error(codes.Internal, err.Error()) } defer func() { if err != nil { ns.undoStagingTransaction(ctx, req, transaction) } }() // (7)调用ns.stageTransaction做map/mount操作 // perform the actual staging and if this fails, have undoStagingTransaction // cleans up for us transaction, err = ns.stageTransaction(ctx, req, volOptions, staticVol) if err != nil { return nil, status.Error(codes.Internal, err.Error()) }
util.DebugLog(ctx, "rbd: successfully mounted volume %s to stagingTargetPath %s", req.GetVolumeId(), stagingTargetPath)
return &csi.NodeStageVolumeResponse{}, nil}
复制代码
1.ValidateNodeStageVolumeRequest

ValidateNodeStageVolumeRequest 校验了如下内容:


(1)volume capability 参数不能为空;


(2)volume ID 参数不能为空;


(3)staging target path(临时目录)参数不能为空;


(4)stage secrets 参数不能为空;


(5)staging path(临时目录)是否存在于 dnode 上。


//ceph-csi/internal/util/validate.go
func ValidateNodeStageVolumeRequest(req *csi.NodeStageVolumeRequest) error { if req.GetVolumeCapability() == nil { return status.Error(codes.InvalidArgument, "volume capability missing in request") }
if req.GetVolumeId() == "" { return status.Error(codes.InvalidArgument, "volume ID missing in request") }
if req.GetStagingTargetPath() == "" { return status.Error(codes.InvalidArgument, "staging target path missing in request") }
if req.GetSecrets() == nil || len(req.GetSecrets()) == 0 { return status.Error(codes.InvalidArgument, "stage secrets cannot be nil or empty") }
// validate stagingpath exists ok := checkDirExists(req.GetStagingTargetPath()) if !ok { return status.Error(codes.InvalidArgument, "staging path does not exists on node") } return nil}
复制代码
2.stashRBDImageMetadata

stashRBDImageMetadata 在 stagingParentPath 下创建 image-meta.json,用于存储 image 的元数据。


//ceph-csi/internal/rbd/rbd_util.go
const stashFileName = "image-meta.json"
func stashRBDImageMetadata(volOptions *rbdVolume, path string) error { var imgMeta = rbdImageMetadataStash{ // there are no checks for this at present Version: 2, // nolint:gomnd // number specifies version. Pool: volOptions.Pool, ImageName: volOptions.RbdImageName, Encrypted: volOptions.Encrypted, }
imgMeta.NbdAccess = false if volOptions.Mounter == rbdTonbd && hasNBD { imgMeta.NbdAccess = true }
encodedBytes, err := json.Marshal(imgMeta) if err != nil { return fmt.Errorf("failed to marshall JSON image metadata for image (%s): (%v)", volOptions, err) }
fPath := filepath.Join(path, stashFileName) err = ioutil.WriteFile(fPath, encodedBytes, 0600) if err != nil { return fmt.Errorf("failed to stash JSON image metadata for image (%s) at path (%s): (%v)", volOptions, fPath, err) }
return nil}
复制代码


root@cld-dnode3-1091:/home/zhongjialiang# ls /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/
image-meta.json 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74/
复制代码
3.ns.stageTransaction

主要流程:


(1)调用 attachRBDImage 将 rbd device map 到 dnode;


(2)调用 ns.mountVolumeToStagePath 将 dnode 上的 rbd device 格式化后 mount 到 StagePath。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) stageTransaction(ctx context.Context, req *csi.NodeStageVolumeRequest, volOptions *rbdVolume, staticVol bool) (stageTransaction, error) { transaction := stageTransaction{}
var err error var readOnly bool var feature bool
var cr *util.Credentials cr, err = util.NewUserCredentials(req.GetSecrets()) if err != nil { return transaction, err } defer cr.DeleteCredentials()
err = volOptions.Connect(cr) if err != nil { klog.Errorf(util.Log(ctx, "failed to connect to volume %v: %v"), volOptions.RbdImageName, err) return transaction, err } defer volOptions.Destroy()
// Allow image to be mounted on multiple nodes if it is ROX if req.VolumeCapability.AccessMode.Mode == csi.VolumeCapability_AccessMode_MULTI_NODE_READER_ONLY { util.ExtendedLog(ctx, "setting disableInUseChecks on rbd volume to: %v", req.GetVolumeId) volOptions.DisableInUseChecks = true volOptions.readOnly = true }
if kernelRelease == "" { // fetch the current running kernel info kernelRelease, err = util.GetKernelVersion() if err != nil { return transaction, err } } if !util.CheckKernelSupport(kernelRelease, deepFlattenSupport) { if !skipForceFlatten { feature, err = volOptions.checkImageChainHasFeature(ctx, librbd.FeatureDeepFlatten) if err != nil { return transaction, err } if feature { err = volOptions.flattenRbdImage(ctx, cr, true, rbdHardMaxCloneDepth, rbdSoftMaxCloneDepth) if err != nil { return transaction, err } } } } // Mapping RBD image var devicePath string devicePath, err = attachRBDImage(ctx, volOptions, cr) if err != nil { return transaction, err } transaction.devicePath = devicePath util.DebugLog(ctx, "rbd image: %s/%s was successfully mapped at %s\n", req.GetVolumeId(), volOptions.Pool, devicePath)
if volOptions.Encrypted { devicePath, err = ns.processEncryptedDevice(ctx, volOptions, devicePath) if err != nil { return transaction, err } transaction.isEncrypted = true }
stagingTargetPath := getStagingTargetPath(req)
isBlock := req.GetVolumeCapability().GetBlock() != nil err = ns.createStageMountPoint(ctx, stagingTargetPath, isBlock) if err != nil { return transaction, err }
transaction.isStagePathCreated = true
// nodeStage Path readOnly, err = ns.mountVolumeToStagePath(ctx, req, staticVol, stagingTargetPath, devicePath) if err != nil { return transaction, err } transaction.isMounted = true
if !readOnly { // #nosec - allow anyone to write inside the target path err = os.Chmod(stagingTargetPath, 0777) } return transaction, err}
复制代码


3.1 attachRBDImage


attachRBDImage 主要流程:


(1)调用 waitForPath 判断 image 是否已经 map 到该 node 上;


(2)没有 map 到该 node 上时,调用 waitForrbdImage 判断 image 是否存在,是否已被使用;


(3)调用 createPath 将 image map 到 node 上。


//ceph-csi/internal/rbd/rbd_attach.go
func attachRBDImage(ctx context.Context, volOptions *rbdVolume, cr *util.Credentials) (string, error) { var err error
image := volOptions.RbdImageName useNBD := false if volOptions.Mounter == rbdTonbd && hasNBD { useNBD = true }
devicePath, found := waitForPath(ctx, volOptions.Pool, image, 1, useNBD) if !found { backoff := wait.Backoff{ Duration: rbdImageWatcherInitDelay, Factor: rbdImageWatcherFactor, Steps: rbdImageWatcherSteps, }
err = waitForrbdImage(ctx, backoff, volOptions)
if err != nil { return "", err } devicePath, err = createPath(ctx, volOptions, cr) }
return devicePath, err}
复制代码


createPath 拼接 ceph 命令,然后执行 map 命令,将 rbd image map 到 dnode 上成为 rbd device。


rbd-nbd 挂载模式,通过--device-type=nbd 指定。


func createPath(ctx context.Context, volOpt *rbdVolume, cr *util.Credentials) (string, error) {  isNbd := false  imagePath := volOpt.String()
util.TraceLog(ctx, "rbd: map mon %s", volOpt.Monitors)
// Map options mapOptions := []string{ "--id", cr.ID, "-m", volOpt.Monitors, "--keyfile=" + cr.KeyFile, "map", imagePath, }
// Choose access protocol accessType := accessTypeKRbd if volOpt.Mounter == rbdTonbd && hasNBD { isNbd = true accessType = accessTypeNbd }
// Update options with device type selection mapOptions = append(mapOptions, "--device-type", accessType)
if volOpt.readOnly { mapOptions = append(mapOptions, "--read-only") } // Execute map stdout, stderr, err := util.ExecCommand(ctx, rbd, mapOptions...) if err != nil { klog.Warningf(util.Log(ctx, "rbd: map error %v, rbd output: %s"), err, stderr) // unmap rbd image if connection timeout if strings.Contains(err.Error(), rbdMapConnectionTimeout) { detErr := detachRBDImageOrDeviceSpec(ctx, imagePath, true, isNbd, volOpt.Encrypted, volOpt.VolID) if detErr != nil { klog.Warningf(util.Log(ctx, "rbd: %s unmap error %v"), imagePath, detErr) } } return "", fmt.Errorf("rbd: map failed with error %v, rbd error output: %s", err, stderr) } devicePath := strings.TrimSuffix(stdout, "\n")
return devicePath, nil}
复制代码


3.2 mountVolumeToStagePath


主体流程:


(1)当 volumeMode 为 Filesystem 时,运行 mkfs 格式化 rbd device;


(2)将 rbd device 挂载到 stagingPath。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) mountVolumeToStagePath(ctx context.Context, req *csi.NodeStageVolumeRequest, staticVol bool, stagingPath, devicePath string) (bool, error) { readOnly := false fsType := req.GetVolumeCapability().GetMount().GetFsType() diskMounter := &mount.SafeFormatAndMount{Interface: ns.mounter, Exec: utilexec.New()} // rbd images are thin-provisioned and return zeros for unwritten areas. A freshly created // image will not benefit from discard and we also want to avoid as much unnecessary zeroing // as possible. Open-code mkfs here because FormatAndMount() doesn't accept custom mkfs // options. // // Note that "freshly" is very important here. While discard is more of a nice to have, // lazy_journal_init=1 is plain unsafe if the image has been written to before and hasn't // been zeroed afterwards (unlike the name suggests, it leaves the journal completely // uninitialized and carries a risk until the journal is overwritten and wraps around for // the first time). existingFormat, err := diskMounter.GetDiskFormat(devicePath) if err != nil { klog.Errorf(util.Log(ctx, "failed to get disk format for path %s, error: %v"), devicePath, err) return readOnly, err }
opt := []string{"_netdev"} opt = csicommon.ConstructMountOptions(opt, req.GetVolumeCapability()) isBlock := req.GetVolumeCapability().GetBlock() != nil rOnly := "ro"
if req.VolumeCapability.AccessMode.Mode == csi.VolumeCapability_AccessMode_MULTI_NODE_READER_ONLY || req.VolumeCapability.AccessMode.Mode == csi.VolumeCapability_AccessMode_SINGLE_NODE_READER_ONLY { if !csicommon.MountOptionContains(opt, rOnly) { opt = append(opt, rOnly) } } if csicommon.MountOptionContains(opt, rOnly) { readOnly = true }
if fsType == "xfs" { opt = append(opt, "nouuid") }
if existingFormat == "" && !staticVol && !readOnly { args := []string{} if fsType == "ext4" { args = []string{"-m0", "-Enodiscard,lazy_itable_init=1,lazy_journal_init=1", devicePath} } else if fsType == "xfs" { args = []string{"-K", devicePath} // always disable reflink // TODO: make enabling an option, see ceph/ceph-csi#1256 if ns.xfsSupportsReflink() { args = append(args, "-m", "reflink=0") } } if len(args) > 0 { cmdOut, cmdErr := diskMounter.Exec.Command("mkfs."+fsType, args...).CombinedOutput() if cmdErr != nil { klog.Errorf(util.Log(ctx, "failed to run mkfs error: %v, output: %v"), cmdErr, string(cmdOut)) return readOnly, cmdErr } } }
if isBlock { opt = append(opt, "bind") err = diskMounter.Mount(devicePath, stagingPath, fsType, opt) } else { err = diskMounter.FormatAndMount(devicePath, stagingPath, fsType, opt) } if err != nil { klog.Errorf(util.Log(ctx, "failed to mount device path (%s) to staging path (%s) for volume "+ "(%s) error: %s Check dmesg logs if required."), devicePath, stagingPath, req.GetVolumeId(), err) } return readOnly, err}
复制代码
ceph-csi 组件日志示例

操作:NodeStageVolume


来源:daemonset:csi-rbdplugin,container:csi-rbdplugin


I0828 06:25:07.604431 3316053 utils.go:159] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC call: /csi.v1.Node/NodeStageVolumeI0828 06:25:07.607979 3316053 utils.go:160] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC request: {"secrets":"***stripped***","staging_target_path":"/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["discard"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"0bba3be9-0a1c-41db-a619-26ffea20161e","imageFeatures":"layering","imageName":"csi-vol-1699e662-e83f-11ea-8e79-246e96907f74","journalPool":"kubernetes","pool":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1598236777786-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74"}I0828 06:25:07.608239 3316053 rbd_util.go:722] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 setting disableInUseChecks on rbd volume to: falseI0828 06:25:07.610528 3316053 omap.go:74] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 got omap values: (pool="kubernetes", namespace="", name="csi.volume.1699e662-e83f-11ea-8e79-246e96907f74"): map[csi.imageid:e583b827ec63 csi.imagename:csi-vol-1699e662-e83f-11ea-8e79-246e96907f74 csi.volname:pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893]E0828 06:25:07.610765 3316053 util.go:236] kernel 4.19.0-8-amd64 does not support required featuresI0828 06:25:07.786825 3316053 cephcmds.go:60] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 command succeeded: rbd [device list --format=json --device-type krbd]I0828 06:25:07.832097 3316053 rbd_attach.go:208] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 rbd: map mon 10.248.32.13:6789,10.248.32.14:6789,10.248.32.15:6789I0828 06:25:07.926180 3316053 cephcmds.go:60] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 command succeeded: rbd [--id kubernetes -m 10.248.32.13:6789,10.248.32.14:6789,10.248.32.15:6789 --keyfile=***stripped*** map kubernetes/csi-vol-1699e662-e83f-11ea-8e79-246e96907f74 --device-type krbd]I0828 06:25:07.926221 3316053 nodeserver.go:291] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 rbd image: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74/kubernetes was successfully mapped at /dev/rbd0I0828 06:25:08.157777 3316053 nodeserver.go:230] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 rbd: successfully mounted volume 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 to stagingTargetPath /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74I0828 06:25:08.158588 3316053 utils.go:165] ID: 12008 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC response: {}
复制代码

(5)NodePublishVolume

简介

将 NodeStageVolume 方法中的 staging path,mount 到 target path。


NodeStageVolume 将 rbd image map 到 dnode 上成为 device 后,随即将 device mount 到了一个 staging path。


NodePublishVolume 将 stagingPath mount 到 target path。


stagingPath示例:/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74  
targetPath示例: /home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mount
复制代码
NodePublishVolume

主体流程:


(1)校验请求参数;


(2)检查 target path 是否存在,不存在则创建;


(3)将 staging path 挂载到 target path。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublishVolumeRequest) (*csi.NodePublishVolumeResponse, error) { err := util.ValidateNodePublishVolumeRequest(req) if err != nil { return nil, err } targetPath := req.GetTargetPath() isBlock := req.GetVolumeCapability().GetBlock() != nil stagingPath := req.GetStagingTargetPath() volID := req.GetVolumeId() stagingPath += "/" + volID
if acquired := ns.VolumeLocks.TryAcquire(volID); !acquired { klog.Errorf(util.Log(ctx, util.VolumeOperationAlreadyExistsFmt), volID) return nil, status.Errorf(codes.Aborted, util.VolumeOperationAlreadyExistsFmt, volID) } defer ns.VolumeLocks.Release(volID)
// Check if that target path exists properly notMnt, err := ns.createTargetMountPath(ctx, targetPath, isBlock) if err != nil { return nil, err }
if !notMnt { return &csi.NodePublishVolumeResponse{}, nil }
// Publish Path err = ns.mountVolume(ctx, stagingPath, req) if err != nil { return nil, err }
util.DebugLog(ctx, "rbd: successfully mounted stagingPath %s to targetPath %s", stagingPath, targetPath) return &csi.NodePublishVolumeResponse{}, nil}
复制代码


1.ValidateNodePublishVolumeRequest


ValidateNodePublishVolumeRequest 主要是校验部分请求参数,校验 volume capability/volume ID/target path/staging target path 不能为空。


//ceph-csi/internal/util/validate.go
func ValidateNodePublishVolumeRequest(req *csi.NodePublishVolumeRequest) error { if req.GetVolumeCapability() == nil { return status.Error(codes.InvalidArgument, "volume capability missing in request") }
if req.GetVolumeId() == "" { return status.Error(codes.InvalidArgument, "volume ID missing in request") }
if req.GetTargetPath() == "" { return status.Error(codes.InvalidArgument, "target path missing in request") }
if req.GetStagingTargetPath() == "" { return status.Error(codes.InvalidArgument, "staging target path missing in request") }
return nil}
复制代码


2.createTargetMountPath


createTargetMountPath 主要是检查 mount path 是否存在,不存在则创建


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) createTargetMountPath(ctx context.Context, mountPath string, isBlock bool) (bool, error) { // Check if that mount path exists properly notMnt, err := mount.IsNotMountPoint(ns.mounter, mountPath) if err != nil { if os.IsNotExist(err) { if isBlock { // #nosec pathFile, e := os.OpenFile(mountPath, os.O_CREATE|os.O_RDWR, 0750) if e != nil { util.DebugLog(ctx, "Failed to create mountPath:%s with error: %v", mountPath, err) return notMnt, status.Error(codes.Internal, e.Error()) } if err = pathFile.Close(); err != nil { util.DebugLog(ctx, "Failed to close mountPath:%s with error: %v", mountPath, err) return notMnt, status.Error(codes.Internal, err.Error()) } } else { // Create a directory if err = util.CreateMountPoint(mountPath); err != nil { return notMnt, status.Error(codes.Internal, err.Error()) } } notMnt = true } else { return false, status.Error(codes.Internal, err.Error()) } } return notMnt, err}
复制代码


3.mountVolume


mountVolume 主要是拼凑 mount 命令,将 staging path 挂载到 target path


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) mountVolume(ctx context.Context, stagingPath string, req *csi.NodePublishVolumeRequest) error { // Publish Path fsType := req.GetVolumeCapability().GetMount().GetFsType() readOnly := req.GetReadonly() mountOptions := []string{"bind", "_netdev"} isBlock := req.GetVolumeCapability().GetBlock() != nil targetPath := req.GetTargetPath()
mountOptions = csicommon.ConstructMountOptions(mountOptions, req.GetVolumeCapability())
util.DebugLog(ctx, "target %v\nisBlock %v\nfstype %v\nstagingPath %v\nreadonly %v\nmountflags %v\n", targetPath, isBlock, fsType, stagingPath, readOnly, mountOptions)
if readOnly { mountOptions = append(mountOptions, "ro") } if err := util.Mount(stagingPath, targetPath, fsType, mountOptions); err != nil { return status.Error(codes.Internal, err.Error()) }
return nil}
复制代码
ceph-csi 组件日志示例

操作:NodePublishVolume


来源:daemonset:csi-rbdplugin,container:csi-rbdplugin


I0828 06:25:08.172901 3316053 utils.go:159] ID: 12010 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC call: /csi.v1.Node/NodePublishVolumeI0828 06:25:08.176683 3316053 utils.go:160] ID: 12010 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC request: {"staging_target_path":"/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount","target_path":"/home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["discard"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"0bba3be9-0a1c-41db-a619-26ffea20161e","imageFeatures":"layering","imageName":"csi-vol-1699e662-e83f-11ea-8e79-246e96907f74","journalPool":"kubernetes","pool":"kubernetes","storage.kubernetes.io/csiProvisionerIdentity":"1598236777786-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74"}I0828 06:25:08.177363 3316053 nodeserver.go:518] ID: 12010 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 target /home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mountisBlock falsefstype ext4stagingPath /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74readonly falsemountflags [bind _netdev discard]I0828 06:25:08.191877 3316053 nodeserver.go:426] ID: 12010 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 rbd: successfully mounted stagingPath /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 to targetPath /home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mountI0828 06:25:08.192653 3316053 utils.go:165] ID: 12010 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC response: {}
复制代码


从日志中可以看出 mount 命令的部分参数值


target /home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mountisBlock falsefstype ext4stagingPath /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74readonly falsemountflags [bind _netdev discard]
复制代码

(6)NodeUnpublishVolume

简介

解除掉stagingPathtargetPath的挂载。


NodeUnpublishVolume unmounts the volume from the target path.

NodeUnpublishVolume

主体流程:


(1)校验请求参数;


(2)判断指定路径是否为挂载点;


(3)解除掉stagingPathtargetPath的挂载;


(4)删除targetPath目录及其包含的任何子目录。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) NodeUnpublishVolume(ctx context.Context, req *csi.NodeUnpublishVolumeRequest) (*csi.NodeUnpublishVolumeResponse, error) { // (1)校验请求参数; err := util.ValidateNodeUnpublishVolumeRequest(req) if err != nil { return nil, err }
targetPath := req.GetTargetPath() volID := req.GetVolumeId()
if acquired := ns.VolumeLocks.TryAcquire(volID); !acquired { klog.Errorf(util.Log(ctx, util.VolumeOperationAlreadyExistsFmt), volID) return nil, status.Errorf(codes.Aborted, util.VolumeOperationAlreadyExistsFmt, volID) } defer ns.VolumeLocks.Release(volID) // (2)判断指定路径是否为mountpoint notMnt, err := mount.IsNotMountPoint(ns.mounter, targetPath) if err != nil { if os.IsNotExist(err) { // targetPath has already been deleted util.DebugLog(ctx, "targetPath: %s has already been deleted", targetPath) return &csi.NodeUnpublishVolumeResponse{}, nil } return nil, status.Error(codes.NotFound, err.Error()) } if notMnt { if err = os.RemoveAll(targetPath); err != nil { return nil, status.Error(codes.Internal, err.Error()) } return &csi.NodeUnpublishVolumeResponse{}, nil } // (3)unmount targetPath; if err = ns.mounter.Unmount(targetPath); err != nil { return nil, status.Error(codes.Internal, err.Error()) } // (4)删除targetPath目录及其包含的任何子目录。 if err = os.RemoveAll(targetPath); err != nil { return nil, status.Error(codes.Internal, err.Error()) }
util.DebugLog(ctx, "rbd: successfully unbound volume %s from %s", req.GetVolumeId(), targetPath)
return &csi.NodeUnpublishVolumeResponse{}, nil}
复制代码


RemoveAll


删除 targetPath 目录及其包含的任何子目录。


//GO/src/os/path.go
// RemoveAll removes path and any children it contains.// It removes everything it can but returns the first error// it encounters. If the path does not exist, RemoveAll// returns nil (no error).// If there is an error, it will be of type *PathError.func RemoveAll(path string) error { return removeAll(path)}
复制代码
ceph-csi 组件日志示例

操作:NodeUnpublishVolume 来源:daemonset:csi-rbdplugin,container:csi-rbdplugin


I0828 07:14:25.117004 3316053 utils.go:159] ID: 12123 GRPC call: /csi.v1.Node/NodeGetVolumeStatsI0828 07:14:25.117825 3316053 utils.go:160] ID: 12123 GRPC request: {"volume_id":"0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74","volume_path":"/home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mount"}I0828 07:14:25.128161 3316053 utils.go:165] ID: 12123 GRPC response: {"usage":[{"available":1003900928,"total":1023303680,"unit":1,"used":2625536},{"available":65525,"total":65536,"unit":2,"used":11}]}I0828 07:14:40.863935 3316053 utils.go:159] ID: 12124 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC call: /csi.v1.Node/NodeUnpublishVolumeI0828 07:14:40.864889 3316053 utils.go:160] ID: 12124 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC request: {"target_path":"/home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mount","volume_id":"0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74"}I0828 07:14:40.908930 3316053 nodeserver.go:601] ID: 12124 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 rbd: successfully unbound volume 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 from /home/cld/kubernetes/lib/kubelet/pods/c14de522-0679-44b6-af8b-e1ba08b5b004/volumes/kubernetes.io~csi/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/mountI0828 07:14:40.909906 3316053 utils.go:165] ID: 12124 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC response: {}
复制代码

(7)NodeUnstageVolume

简介

先解除掉targetPathrbd/nbd device的挂载,然后再 unmap 掉 rbd/nbd device(即解除掉 node 端 rbd/nbd device 与 ceph rbd image 的挂载)。


NodeUnstageVolume unstages the volume from the staging path.

NodeUnstageVolume

主体流程:


(1)校验请求参数;


(2)判断 stagingTargetPath 是否存在;


(3)将 stagingTargetPath unmount rbd device;


(4)删除 stagingTargetPath;


(5)从 stagingParentPath 的 image-meta.json 文件中读取 image 的元数据;


(6)unmap rbd device;


(7)删除该 image 对应的元数据,即 image-meta.json 文件。


//ceph-csi/internal/rbd/nodeserver.go
func (ns *NodeServer) NodeUnstageVolume(ctx context.Context, req *csi.NodeUnstageVolumeRequest) (*csi.NodeUnstageVolumeResponse, error) { // (1)校验请求参数; var err error if err = util.ValidateNodeUnstageVolumeRequest(req); err != nil { return nil, err }
volID := req.GetVolumeId()
if acquired := ns.VolumeLocks.TryAcquire(volID); !acquired { klog.Errorf(util.Log(ctx, util.VolumeOperationAlreadyExistsFmt), volID) return nil, status.Errorf(codes.Aborted, util.VolumeOperationAlreadyExistsFmt, volID) } defer ns.VolumeLocks.Release(volID)
stagingParentPath := req.GetStagingTargetPath() stagingTargetPath := getStagingTargetPath(req) // (2)判断stagingTargetPath是否存在; notMnt, err := mount.IsNotMountPoint(ns.mounter, stagingTargetPath) if err != nil { if !os.IsNotExist(err) { return nil, status.Error(codes.NotFound, err.Error()) } // Continue on ENOENT errors as we may still have the image mapped notMnt = true } if !notMnt { // (3)将stagingTargetPath unmount rbd device; // Unmounting the image err = ns.mounter.Unmount(stagingTargetPath) if err != nil { util.ExtendedLog(ctx, "failed to unmount targetPath: %s with error: %v", stagingTargetPath, err) return nil, status.Error(codes.Internal, err.Error()) } } // (4)删除stagingTargetPath; // 示例:/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 if err = os.Remove(stagingTargetPath); err != nil { // Any error is critical as Staging path is expected to be empty by Kubernetes, it otherwise // keeps invoking Unstage. Hence any errors removing files within this path is a critical // error if !os.IsNotExist(err) { klog.Errorf(util.Log(ctx, "failed to remove staging target path (%s): (%v)"), stagingTargetPath, err) return nil, status.Error(codes.Internal, err.Error()) } } // (5)从stagingParentPath的image-meta.json文件中读取image的元数据; imgInfo, err := lookupRBDImageMetadataStash(stagingParentPath) if err != nil { util.UsefulLog(ctx, "failed to find image metadata: %v", err) // It is an error if it was mounted, as we should have found the image metadata file with // no errors if !notMnt { return nil, status.Error(codes.Internal, err.Error()) }
// If not mounted, and error is anything other than metadata file missing, it is an error if !errors.Is(err, ErrMissingStash) { return nil, status.Error(codes.Internal, err.Error()) }
// It was not mounted and image metadata is also missing, we are done as the last step in // in the staging transaction is complete return &csi.NodeUnstageVolumeResponse{}, nil } // (6)unmap rbd device; // Unmapping rbd device imageSpec := imgInfo.String() if err = detachRBDImageOrDeviceSpec(ctx, imageSpec, true, imgInfo.NbdAccess, imgInfo.Encrypted, req.GetVolumeId()); err != nil { klog.Errorf(util.Log(ctx, "error unmapping volume (%s) from staging path (%s): (%v)"), req.GetVolumeId(), stagingTargetPath, err) return nil, status.Error(codes.Internal, err.Error()) }
util.DebugLog(ctx, "successfully unmounted volume (%s) from staging path (%s)", req.GetVolumeId(), stagingTargetPath) // (7)删除该image对应的元数据,即image-meta.json文件。 if err = cleanupRBDImageMetadataStash(stagingParentPath); err != nil { klog.Errorf(util.Log(ctx, "failed to cleanup image metadata stash (%v)"), err) return nil, status.Error(codes.Internal, err.Error()) }
return &csi.NodeUnstageVolumeResponse{}, nil}
复制代码


root@cld-dnode3-1091:/home/zhongjialiang# ls /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/
image-meta.json 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74/
复制代码


1.lookupRBDImageMetadataStash


从 stagingParentPath 的 image-meta.json 文件中读取 image 的元数据。


//ceph-csi/internal/rbd/rbd_util.go
// file name in which image metadata is stashed.const stashFileName = "image-meta.json"
func lookupRBDImageMetadataStash(path string) (rbdImageMetadataStash, error) { var imgMeta rbdImageMetadataStash
fPath := filepath.Join(path, stashFileName) encodedBytes, err := ioutil.ReadFile(fPath) // #nosec - intended reading from fPath if err != nil { if !os.IsNotExist(err) { return imgMeta, fmt.Errorf("failed to read stashed JSON image metadata from path (%s): (%v)", fPath, err) }
return imgMeta, util.JoinErrors(ErrMissingStash, err) }
err = json.Unmarshal(encodedBytes, &imgMeta) if err != nil { return imgMeta, fmt.Errorf("failed to unmarshall stashed JSON image metadata from path (%s): (%v)", fPath, err) }
return imgMeta, nil}
复制代码


root@cld-dnode3-1091:/home/zhongjialiang# cat /home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/image-meta.json{"Version":2,"pool":"kubernetes","image":"csi-vol-1699e662-e83f-11ea-8e79-246e96907f74","accessType":false,"encrypted":false}
复制代码


2.detachRBDImageOrDeviceSpec


拼凑 unmap 命令,进行 unmap rbd/nbd device。


//ceph-csi/internal/rbd/rbd_attach.go
func detachRBDImageOrDeviceSpec(ctx context.Context, imageOrDeviceSpec string, isImageSpec, ndbType, encrypted bool, volumeID string) error { if encrypted { mapperFile, mapperPath := util.VolumeMapper(volumeID) mappedDevice, mapper, err := util.DeviceEncryptionStatus(ctx, mapperPath) if err != nil { klog.Errorf(util.Log(ctx, "error determining LUKS device on %s, %s: %s"), mapperPath, imageOrDeviceSpec, err) return err } if len(mapper) > 0 { // mapper found, so it is open Luks device err = util.CloseEncryptedVolume(ctx, mapperFile) if err != nil { klog.Errorf(util.Log(ctx, "error closing LUKS device on %s, %s: %s"), mapperPath, imageOrDeviceSpec, err) return err } imageOrDeviceSpec = mappedDevice } }
accessType := accessTypeKRbd if ndbType { accessType = accessTypeNbd } options := []string{"unmap", "--device-type", accessType, imageOrDeviceSpec}
_, stderr, err := util.ExecCommand(ctx, rbd, options...) if err != nil { // Messages for krbd and nbd differ, hence checking either of them for missing mapping // This is not applicable when a device path is passed in if isImageSpec && (strings.Contains(stderr, fmt.Sprintf(rbdUnmapCmdkRbdMissingMap, imageOrDeviceSpec)) || strings.Contains(stderr, fmt.Sprintf(rbdUnmapCmdNbdMissingMap, imageOrDeviceSpec))) { // Devices found not to be mapped are treated as a successful detach util.TraceLog(ctx, "image or device spec (%s) not mapped", imageOrDeviceSpec) return nil } return fmt.Errorf("rbd: unmap for spec (%s) failed (%v): (%s)", imageOrDeviceSpec, err, stderr) }
return nil}
复制代码


3.cleanupRBDImageMetadataStash


删除该 image 对应的元数据,即 image-meta.json 文件。


//ceph-csi/internal/rbd/rbd_util.go
func cleanupRBDImageMetadataStash(path string) error { fPath := filepath.Join(path, stashFileName) if err := os.Remove(fPath); err != nil { return fmt.Errorf("failed to cleanup stashed JSON data (%s): (%v)", fPath, err) }
return nil}
复制代码
ceph-csi 组件日志示例

操作:NodeUnstageVolume


来源:daemonset:csi-rbdplugin,container:csi-rbdplugin


I0828 07:14:40.972279 3316053 utils.go:159] ID: 12126 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC call: /csi.v1.Node/NodeUnstageVolumeI0828 07:14:40.973139 3316053 utils.go:160] ID: 12126 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC request: {"staging_target_path":"/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount","volume_id":"0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74"}I0828 07:14:41.186119 3316053 cephcmds.go:60] ID: 12126 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 command succeeded: rbd [unmap --device-type krbd kubernetes/csi-vol-1699e662-e83f-11ea-8e79-246e96907f74]I0828 07:14:41.186171 3316053 nodeserver.go:690] ID: 12126 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 successfully unmounted volume (0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74) from staging path (/home/cld/kubernetes/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-14ee5002-9d60-4ba3-a1d2-cc3800ee0893/globalmount/0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74)I0828 07:14:41.187119 3316053 utils.go:165] ID: 12126 Req-ID: 0001-0024-0bba3be9-0a1c-41db-a619-26ffea20161e-0000000000000004-1699e662-e83f-11ea-8e79-246e96907f74 GRPC response: {}
复制代码


至此,rbd driver-nodeserver 的分析已经全部完成,下面做个总结。

rbd driver-nodeserver 分析总结

(1)nodeserver 主要包括了NodeGetCapabilitiesNodeGetVolumeStatsNodeStageVolumeNodePublishVolumeNodeUnpublishVolumeNodeUnstageVolumeNodeExpandVolume方法,作用分别如下:


NodeGetCapabilities:获取 ceph-csi driver 的能力。


NodeGetVolumeStats:探测挂载存储的状态,并返回该存储的相关 metrics 给 kubelet。


NodeExpandVolume:在 node 上做相应操作,将存储的扩容信息同步到 node 上。


NodeStageVolume:将 rbd image map 到 node 上的 rbd/nbd device,并格式化后挂载到 staging path。


NodePublishVolume:将 NodeStageVolume 方法中的 staging path,mount 到 target path。


NodeUnpublishVolume:解除掉 stagingPath 到 targetPath 的挂载。


NodeUnstageVolume:先解除掉 targetPath 到 rbd/nbd device 的挂载,然后再 unmap 掉 rbd/nbd device(即解除掉 node 端 rbd/nbd device 与 ceph rbd image 的挂载)。


(2)在 kubelet 调用NodeExpandVolumeNodeStageVolumeNodeUnstageVolume等方法前,会先调用NodeGetCapabilities来获取该 ceph-csi driver 的能力,看是否支持对这些方法的调用。


(3)kubelet 定时循环调用NodeGetVolumeStats,获取 volume 相关指标。


(4)存储扩容分为两大步骤,第一步是 csi 的ControllerExpandVolume,主要负责将底层存储扩容;第二步是 csi 的NodeExpandVolume,当 volumemode 是 filesystem 时,主要负责将底层 rbd image 的扩容信息同步到 rbd/nbd device,对 xfs/ext 文件系统进行扩展;当 volumemode 是 block,则不用进行 node 端扩容操作。


(5)一个 rbd image 挂载给一个 node 上的多个 pod 时,NodeStageVolume方法只会被调用一次,NodePublishVolume会被调用多次,即出现该情况时,staging path只有一个,而target path会有多个。你可以这样理解,staging path对应的是 rbd image,而target path对应的是 pod,所以当一个 rbd image 挂载给一个 node 上的多个 pod 时,staging path只有一个,而target path会有多个。解除挂载也同理,当挂载了某个 rbd image 的所有 pod 都被删除,NodeUnstageVolume方法才会被调用。

发布于: 2021 年 05 月 09 日阅读数: 13
用户头像

良凯尔

关注

热爱的力量 2020.01.10 加入

kubernetes开发者

评论

发布
暂无评论
ceph-csi源码分析(6)-rbd driver-nodeserver分析(下)