写点什么

一文囊括 Ceph 所有利器 (工具)

作者:hs_wu
  • 2023-07-07
    浙江
  • 本文字数:17340 字

    阅读完需:约 57 分钟

https://www.zhihu.com/column/c_1267088333848641536

前言

ceph 的工具很多,包括集群管理与运维,还有性能分析等等。


所以本文期望应收尽收所有的工具,也当做自己的一个梳理与总结,当自己需要的时候知道有哪些利器可以使用。


由于水平与能力有限,所以可能还有很多好用的利器是漏网之鱼,期待大家不吝告知。

ceph rados 相关工具

通用


ceph auth [ add | caps | del | export | get | get-key | get-or-create | get-or-create-key | import | list | print-key | print_key ] …ceph compactceph config [ dump | ls | help | get | show | show-with-defaults | set | rm | log | reset | assimilate-conf | generate-minimal-conf ] …ceph config-key [ rm | exists | get | ls | dump | set ] …ceph daemon <name> | <path> <command> …ceph daemonperf <name> | <path> [ interval [ count ] ]ceph df {detail}ceph fs [ ls | new | reset | rm | authorize ] …ceph fsidceph health {detail}ceph injectargs <injectedargs> [ <injectedargs>… ]ceph log <logtext> [ <logtext>… ]ceph mds [ compat | fail | rm | rmfailed | set_state | stat | repaired ] …ceph mon [ add | dump | getmap | remove | stat ] …ceph osd [ blocklist | blocked-by | create | new | deep-scrub | df | down | dump | erasure-code-profile | find | getcrushmap | getmap | getmaxosd | in | ls | lspools | map | metadata | ok-to-stop | out | pause | perf | pg-temp | force-create-pg | primary-affinity | primary-temp | repair | reweight | reweight-by-pg | rm | destroy | purge | safe-to-destroy | scrub | set | setcrushmap | setmaxosd | stat | tree | unpause | unset ] …ceph osd crush [ add | add-bucket | create-or-move | dump | get-tunable | link | move | remove | rename-bucket | reweight | reweight-all | reweight-subtree | rm | rule | set | set-tunable | show-tunables | tunables | unlink ] …ceph osd pool [ create | delete | get | get-quota | ls | mksnap | rename | rmsnap | set | set-quota | stats ] …ceph osd pool application [ disable | enable | get | rm | set ] …ceph osd tier [ add | add-cache | cache-mode | remove | remove-overlay | set-overlay ] …ceph pg [ debug | deep-scrub | dump | dump_json | dump_pools_json | dump_stuck | getmap | ls | ls-by-osd | ls-by-pool | ls-by-primary | map | repair | scrub | stat ] …ceph quorum_statusceph report { <tags> [ <tags>… ] }ceph statusceph sync force {–yes-i-really-mean-it} {–i-know-what-i-am-doing}ceph tell <name (type.id)> <command> [options…]ceph version
复制代码


ceph 是一个控制工具,用于手动==部署和维护==ceph 集群。它提供了一组不同的命令,允许部署监视器、osd、放置组、MDS 以及==集群的整体维护和管理==


这里额外说明一个有用的命令ceph osd dump 指定版本号, 该命令可以打出指定版本号的 osdmap



rados 是一个与 Ceph 对象存储集群(rados)交互的实用程序,rados 是 Ceph 分布式存储系统的一部分


osd daemon 指令(含 osd 性能/资源)

ceph osd perf

ceph osd df tree

ceph daemon

usrname@hostname:~$ sudo ceph daemon osd.148  help{    "bluestore allocator dump block": "dump allocator free regions",    "bluestore allocator dump bluefs-db": "dump allocator free regions",    "bluestore allocator score block": "give score on allocator fragmentation (0-no fragmentation, 1-absolute fragmentation)",    "bluestore allocator score bluefs-db": "give score on allocator fragmentation (0-no fragmentation, 1-absolute fragmentation)",    "bluestore bluefs available": "Report available space for bluefs. If alloc_size set, make simulation.",    "calc_objectstore_db_histogram": "Generate key value histogram of kvdb(rocksdb) which used by bluestore",    "compact": "Commpact object store's omap. WARNING: Compaction probably slows your requests",    "config diff": "dump diff of current config and default config",    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",    "config get": "config get <field>: get the config value",    "config help": "get config setting schema and descriptions",    "config set": "config set <field> <val> [<val> ...]: set a config variable",    "config show": "dump current config settings",    "dump_blacklist": "dump blacklisted clients and times",    "dump_blocked_ops": "show the blocked ops currently in flight",    "dump_historic_ops": "show recent ops",    "dump_historic_ops_by_duration": "show slowest recent ops, sorted by duration",    "dump_historic_slow_ops": "show slowest recent ops",    "dump_mempools": "get mempool stats",    "dump_objectstore_kv_stats": "print statistics of kvdb which used by bluestore",    "dump_op_pq_state": "dump op priority queue state",    "dump_ops_in_flight": "show the ops currently in flight",    "dump_osd_network": "Dump osd heartbeat network ping times",    "dump_pgstate_history": "show recent state history",    "dump_reservations": "show recovery reservations",    "dump_scrubs": "print scheduled scrubs",    "dump_watchers": "show clients which have active watches, and on which objects",    "flush_journal": "flush the journal to permanent store",    "flush_store_cache": "Flush bluestore internal cache",    "get_command_descriptions": "list available commands",    "get_heap_property": "get malloc extension heap property",    "get_latest_osdmap": "force osd to update the latest map from the mon",    "getomap": "output entire object map",    "git_version": "get git sha1",    "heap": "show heap usage info (available only if compiled with tcmalloc)",    "help": "list available commands",    "injectdataerr": "inject data error to an object",    "injectfull": "Inject a full disk (optional count times)",    "injectmdataerr": "inject metadata error to an object",    "log dump": "dump recent log entries to log file",    "log flush": "flush log entries to log file",    "log reopen": "reopen log file",    "objecter_requests": "show in-progress osd requests",    "ops": "show the ops currently in flight",    "perf dump": "dump perfcounters value",    "perf histogram dump": "dump perf histogram values",    "perf histogram schema": "dump perf histogram schema",    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",    "perf schema": "dump perfcounters schema",    "rmomapkey": "remove omap key",    "set_heap_property": "update malloc extension heap property",    "set_recovery_delay": "Delay osd recovery by specified seconds",    "setomapheader": "set omap header",    "setomapval": "set omap key",    "status": "high-level status of OSD",    "trigger_deep_scrub": "Trigger a scheduled deep scrub ",    "trigger_scrub": "Trigger a scheduled scrub ",    "truncobj": "truncate object to length",    "version": "get ceph version"}
复制代码

ceph balancer

ceph balancer


ceph balancer 工具可以在 osd 中优化 pg 的位置,以实现平衡分布,可以自动或以手动的方式管理该工具

pg upmap

ceph osd pg-upmap <pgid> <osdname (id|osd.id)> [<osdname (id|osd.id)>...]ceph osd pg-upmap-items <pgid> <osdname (id|osd.id)> [<osdname (id|osd.id)>...]
复制代码


该工具允许显式地将特定的 pg 映射到特定的 osd



ceph-volume [-h] [–cluster CLUSTER] [–log-level LOG_LEVEL] [–log-path LOG_PATH]
ceph-volume inventory
ceph-volume lvm [ trigger | create | activate | prepare | zap | list | batch | new-wal | new-db | migrate ]
ceph-volume simple [ trigger | scan | activate ]
复制代码


ceph-volume是一个单一用途的命令行工具,用于将逻辑卷部署为 osd,在准备、激活和创建 osd 时试图维护与ceph -disk类似的 API



ceph-mon -i monid [ –mon-data mondatapath ]


启动 mon 进程



ceph-osd -i osdnum [ –osd-data datapath ] [ –osd-journal journal ] [ –mkfs ] [ –mkjournal ] [–flush-journal] [–check-allows-journal] [–check-wants-journal] [–check-needs-journal] [ –mkkey ] [ –osdspec-affinity ]
复制代码


ceph-syn – ceph synthetic workload generator


ceph-syn是 Ceph 分布式文件系统的一个简单的合成工作负载生成器。它使用用户空间客户端库针对当前运行的文件系统生成简单的工作负载。文件系统不需要通过ceph-fuse(8)或内核客户端挂载



crushdiff [ –osdmap osdmap ] [ –pg-dump pg-dump ] [ –compiled ] [ –verbose ] command crushmap
复制代码


crushdiff 是一个实用工具,可以让你测试 crushmap 更改的效果:pgs 的数量,对象,移动的字节数。这是==osdmaptool 的包装器==,依赖于它的-test-map-pgs-dump 选项来获得更改的 pgs 列表。此外,它使用 pg 统计来计算移动的对象和字节的数量

几个 tools


ceph-kvstore-tool <leveldb|rocksdb|bluestore-kv> <store path> command [args…]
复制代码


ceph-kvstore-tool是一个 kvstore 操作工具。它允许用户离线操作leveldb/rocksdb的数据(如 OSD 的 omap)



monmaptool 是一个为 Ceph 分布式存储系统创建、查看和修改监视集群映射的实用工具。监视映射指定 Ceph 分布式系统中唯一的固定地址。所有其他守护进程都绑定到任意地址,并向监视器注册自己


这篇文章使用 monmaptool 以及 ceph-mon 工具处理了 ceph 线上的一个疑难问题:从不健康的集群中删除mon



ceph-authtool keyringfile [ -l | –list ] [ -p | –print-key ] [ -C | –create-keyring ] [ -g | –gen-key ] [ –gen-print-key ] [ –import-keyring otherkeyringfile ] [ -n | –name entityname ] [ -a | –add-key base64_key ] [ –cap subsystem capability ] [ –caps capfile ] [ –mode mode ]
复制代码


ceph-authtool是一个用于创建、查看和修改 Ceph 密匙环文件的实用程序。密匙环文件存储一个或多个 Ceph 身份验证密钥,可能还存储一个相关的能力规范。每个键都与一个实体名相关联,形式为{client,mon,mds,osd}.name



crushtool ( -d map | -c map.txt | –build –num_osds numosds layer1 … | –test ) [ -o outfile ]
复制代码


crushtool 是一个实用工具,可以让你创建、编译、反编译和测试 CRUSH 映射文件



osdmaptool mapfilename [–print] [–createsimple numosd [–pgbits bitsperosd ] ] [–clobber]
osdmaptool mapfilename [–import-crush crushmap]
osdmaptool mapfilename [–export-crush crushmap]
osdmaptool mapfilename [–upmap file] [–upmap-max max-optimizations] [–upmap-deviation max-deviation][–upmap-pool poolname] [–save] [–upmap-active]
osdmaptool mapfilename [–upmap-cleanup] [–upmap file]
复制代码


osdmaptool 是一个实用工具,允许你从 Ceph 分布式存储系统中创建、查看和操作 OSD 集群映射。值得注意的是,它允许您提取嵌入的 CRUSH 映射或导入新的 CRUSH 映射。它还可以模拟升级平衡器模式,这样你就可以了解平衡 pg 需要什么


  • ceph-object-tool


ceph-objectstore-tool 是一个修改 OSD 状态的工具。它便于操作对象的内容、删除对象、列出 omap、操作 omap 标头、操作 omap 键、列出对象属性和操作对象属性键


ceph-objectstore-tool –data-path path to osd [–op list ]Possible object operations:
(get|set)-bytes [file]
set-(attr|omap) [file]
(get|rm)-attr|omap)
get-omaphdr
set-omaphdr [file]
list-attrs
list-omap
remove|removeall
dump
set-size
clear-data-digest
remove-clone-metadata
复制代码


以下是一个应用实例:由于 bluestore 中已经看不到 filestore 那样的目录结构了,所以文件系统的 attr 也不可见了,只能通过 objectstore-tool 把 attr 属性从 db 中导出来,然后再通过 dencoder 解码,才能看到.


1. 确定对象所在osd(ceph osd map <poolname> <objectname>),停止对应osd(可以通过osd set noout防止迁移).2. 通过ceph-objectstore-tool 列出对象的属性.[root@node01 cephtools]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore rbd_data.81.a8c6bd6b8b4567.0000000000003249 list-attrs_                 //"_为对象的attr属性"hinfo_keysnapset
3.把对象属性导出为一个文件.[root@node01 cephtools]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore rbd_data.81.a8c6bd6b8b4567.0000000000003249 get-attr hinfo_key > /home/yg/cephtools/attrhinfo_key.dat[root@node01 cephtools]# lltotal 8-rw-r--r-- 1 root root 298 Oct 10 14:39 attr_.dat-rw-r--r-- 1 root root 18 Oct 10 14:40 attrhinfo_key.dat

4.利用ceph-dencoder解码二进制属性[root@node01 cephtools]# ceph-dencoder import attr_.dat type object_info_t decode dump_json //type属性通过ceph-dencoder list_types查看{ "oid": { "oid": "rbd_data.81.a8c6bd6b8b4567.0000000000003249", "key": "", "snapid": -2, "hash": 1909457131, "max": 0, "pool": 83, "namespace": "" }, "version": "22364'14561", "prior_version": "22364'13751", ......
复制代码


ceph luminous-Bluestore,查看对象信息


ceph-objectstore-tool – modify or examine the state of an OSD


  • ceph-bluestore-tool


ceph-bluestore-tool command [ –dev device … ] [ –path osd path ] [ –out-dir dir ] [ –log-file | -l filename ] [ –deep ]
// 运行一致性检查并尽可能修复错误ceph-bluestore-tool fsck|repair –path osd path [ –deep ]// 显示设备标签ceph-bluestore-tool show-label –dev deviceceph-bluestore-tool prime-osd-dir –dev device –path osd path// 将BlueFS(即rocksdb文件)的内容导出到输出目录ceph-bluestore-tool bluefs-export –path osd path –out-dir dir// 添加WAL设备到BlueFS中,如果WAL设备已经存在则失败ceph-bluestore-tool bluefs-bdev-new-wal –path osd path –dev-target new-device// 添加db设备到BlueFS中,如果db设备已经存在则失败ceph-bluestore-tool bluefs-bdev-new-db –path osd path –dev-target new-device// 将BlueFS数据从源设备移动到目标设备,成功后将删除源设备(主设备除外)。目标设备可以是已经附加的,也可以是新设备。在后一种情况下,它被添加到OSD中,取代其中一个源设备。应用以下替换规则(按照优先级顺序,在第一次匹配时停止)ceph-bluestore-tool bluefs-bdev-migrate –path osd path –dev-target new-device –devs-source device1 [–devs-source device2]ceph-bluestore-tool free-dump|free-score –path osd path [ –allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
复制代码


使用该工具的一个例子:Bluestore 使用工具分析 kvstore 里的元数据信息


// kv store prefixes// bluestore元数据前缀const string PREFIX_SUPER = "S";   // field -> valueconst string PREFIX_STAT = "T";    // field -> value(int64 array)const string PREFIX_COLL = "C";    // collection name -> cnode_tconst string PREFIX_OBJ = "O";     // object name -> onode_tconst string PREFIX_OMAP = "M";    // u64 + keyname -> valueconst string PREFIX_DEFERRED = "L";  // id -> deferred_transaction_tconst string PREFIX_ALLOC = "B";   // u64 offset -> u64 length (freelist)const string PREFIX_SHARED_BLOB = "X"; // u64 offset -> shared_blob_t
// 通过bluestore导出元数据// 并通过ceph-kvstore-tool以及ceph-dencoder分析sudo ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-41 --out-dir /ceph-1

hzwuhongsong@pubt2-ceph1:~$ sudo ceph-kvstore-tool rocksdb /ceph-1/db/ list|head -100B blocksB blocks_per_keyB bytes_per_blockB sizeC 10.5_headC 14.2_headC 15.5_headC 2.6e8_headO %7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffoO %7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%00%00%00xO %7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%01%00%00xO %7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%02%00%00xO %7f%80%00%00%00%00%00%00%02%19%d0%8fi%21rbd_data.174b9a6b8b4567.00000000000016d9%21%3d%ff%ff%ff%ff%ff%ff%ff%fe%ff%ff%ff%ff%ff%ff%ff%ffo%00%03%00%00x
复制代码



hzwuhongsong@pubt2-ceph1:~$ sudo ceph-kvstore-tool rocksdb /ceph-1/db/ get C 2.6e8_head out 1.txthzwuhongsong@pubt2-ceph1:~$ ceph-dencoder import 1.txt type bluestore_cnode_t decode dump_json{ "bits": 11}
/// collection metadatastruct bluestore_cnode_t { uint32_t bits; ///< how many bits of coll pgid are significant
explicit bluestore_cnode_t(int b=0) : bits(b) {}
DENC(bluestore_cnode_t, v, p) { DENC_START(1, 1, p); denc(v.bits, p); DENC_FINISH(p); }
复制代码


sudo  ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-30/ list B > list-Bhzwuhongsong@pubt1-ceph72:~/txt2$ cat list-BB  blocksB  blocks_per_keyB  bytes_per_blockB  size
sudo ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-30/ list S > list-Shzwuhongsong@pubt1-ceph72:~/txt2$ cat list-SS blobid_maxS bluefs_extentsS freelist_typeS min_alloc_sizeS min_compat_ondisk_formatS nid_maxS ondisk_format
复制代码


ceph-bluestore-tool是一个在 BlueStore 实例上执行低级管理操作的实用程序

rbd 相关工具

通用工具


rbd [ -c ceph.conf ] [ -m monaddr ] [–cluster cluster-name] [ -p | –pool pool ] [ command … ]
复制代码



rbd-fuse [ -p pool ] [-c conffile] mountpoint [ fuse options ]


rbd-fuse是用于 RADOS 块设备(rbd)映像的 FUSE(“用户空间中的文件系统”)客户端。给定一个包含 rbd 映像的池,它将挂载一个用户空间文件系统,允许在挂载点将这些映像作为常规文件访问


rbd-fuse 不建议在生产环境使用.



rbd-nbd [-c conf] [–read-only] [–device nbd device] [–nbds_max limit] [–max_part limit] [–exclusive] [–notrim] [–encryption-format format] [–encryption-passphrase-file passphrase-file] [–io-timeout seconds] [–reattach-timeout seconds] map image-spec | snap-specrbd-nbd unmap nbd device | image-spec | snap-specrbd-nbd list-mappedrbd-nbd attach –device nbd device image-spec | snap-specrbd-nbd detach nbd device | image-spec | snap-spec
复制代码


rbd-nbd 是 RADOS 块设备(rbd)映像的客户端,类似于 rbd 内核模块。它将 rbd 映像映射到 nbd(网络块设备)设备,允许将其作为常规的本地块设备进行访问.



rbd-replay是一个用于重放 rbd 工作负载的实用程序

性能分析工具

  • rbd perf image iostat & rbd perf image iotop


通过调用“rbd perf image iostat”或“rbd perf image iotop”命令,无需任何选项或位置参数,就可以将所有 RBD 池混合到一个视图中。在 v15.2.14 中,这样的调用意外地局限于默认池(rbd_default_pool)


https://ceph.com/en/news/blog/2022/v15-2-17-octopus-released/


  • N 版本


新的 rbd perf image iotop 和 rbd perf image iostat 命令为所有 rbd 图像提供一个 iotop 和 iostat 类 IO 监视器


https://docs.ceph.com/en/quincy/releases/nautilus/#v14-2-0-nautilus


rbd perf image iostat.NAME                      WR RD WR_BYTES RD_BYTES WR_LAT RD_LATceph/vm-152-disk-0 1/s   0/s   71 KiB/s     0 B/s   13.04 ms   0.00 nsceph/vm-136-disk-0 0/s   0/s   819 B/s       0 B/s   919.79 us   0.00 ns
复制代码

cephfs 相关工具

mds 负载情况

ceph daemonperf mds.a--------------mds---------------- --mds_cache--- ------mds_log------ -mds_mem- ----mds_server----- mds_ -----objecter------ purgreq  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs repl|ino  dn  |hcr  hcs  hsr  cre |sess|actv rd   wr   rdwr|purg|  1    0    0  3.3M 3.5M   0    0 |928    0    0 |  1  112k 133    0 |3.3M 3.3M|  1    0    0    0 |516 |  6    0    1    0 |  1588    1    0  3.3M 3.5M   0    0 |929    0    0 |353  112k 132    0 |3.3M 3.3M|589   32    0    0 |516 |  1   17   25    0 |  01.0k   1    0  3.3M 3.5M   0    0 |929    0    0 |365  113k 132    0 |3.3M 3.3M|1.0k  24    0    0 |516 |  0   19   15    0 |  0
复制代码

daemon 指令(含性能/资源)

  • mds


hzwuhongsong@dl-ceph2:~$ sudo ceph daemon mds.dl-ceph2   help{    "cache drop": "drop cache",    // 查看 Ceph MDS 缓存的使用情况    "cache status": "show cache status",    "config diff": "dump diff of current config and default config",    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",    "config get": "config get <field>: get the config value",    "config help": "get config setting schema and descriptions",    "config set": "config set <field> <val> [<val> ...]: set a config variable",    "config show": "dump current config settings",// 查看文件系统某个目录下是否有脏数据    "dirfrag ls": "List fragments in directory",    "dirfrag merge": "De-fragment directory by path",    "dirfrag split": "Fragment directory by path",    "dump cache": "dump metadata cache (optionally to a file)",    "dump loads": "dump metadata loads",    "dump tree": "dump metadata cache for subtree",// 各种op        "ops": "show the ops currently in flight",    "dump_blocked_ops": "show the blocked ops currently in flight",    "dump_historic_ops": "show slowest recent ops",    "dump_historic_ops_by_duration": "show slowest recent ops, sorted by op duration",    "dump_ops_in_flight": "show the ops currently in flight",        "dump_mempools": "get mempool stats",    "export dir": "migrate a subtree to named MDS",    "flush journal": "Flush the journal to the backing store",    "flush_path": "flush an inode (and its dirfrags)",    "force_readonly": "Force MDS to read-only mode",    "get subtrees": "Return the subtree map",    "get_command_descriptions": "list available commands",    "git_version": "get git sha1",    "help": "list available commands",    "log dump": "dump recent log entries to log file",    "log flush": "flush log entries to log file",    "log reopen": "reopen log file",    "objecter_requests": "show in-progress osd requests",    "osdmap barrier": "Wait until the MDS has this OSD map epoch",    // MDS 的性能指标    "perf dump": "dump perfcounters value",    "perf histogram dump": "dump perf histogram values",    "perf histogram schema": "dump perf histogram schema",    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",    "perf schema": "dump perfcounters schema",    "scrub_path": "scrub an inode and output results",    // 查看 CephFS 的 session 连接    "session evict": "Evict a CephFS client",    "session ls": "Enumerate connected CephFS clients",        "status": "high-level status of MDS",    "tag path": "Apply scrub tag recursively",    "version": "get ceph version"}
复制代码


  • Client


root@pubt2-k8s-for-iaas1:/var/run/ceph# sudo ceph --admin-daemon=/var/run/ceph/ceph-client.10137.asok help{    "config diff": "dump diff of current config and default config",    "config diff get": "dump diff get <field>: dump diff of current and default config setting <field>",    "config get": "config get <field>: get the config value",    "config help": "get config setting schema and descriptions",    "config set": "config set <field> <val> [<val> ...]: set a config variable",    "config show": "dump current config settings",    "dump_cache": "show in-memory metadata cache contents",    "dump_mempools": "get mempool stats",    "get_command_descriptions": "list available commands",    "git_version": "get git sha1",    "help": "list available commands",    "kick_stale_sessions": "kick sessions that were remote reset",    "log dump": "dump recent log entries to log file",    "log flush": "flush log entries to log file",    "log reopen": "reopen log file",    "mds_requests": "show in-progress mds requests",    "mds_sessions": "show mds session state",    "objecter_requests": "show in-progress osd requests",    "perf dump": "dump perfcounters value",    "perf histogram dump": "dump perf histogram values",    "perf histogram schema": "dump perf histogram schema",    "perf reset": "perf reset <name>: perf reset all or one perfcounter name",    "perf schema": "dump perfcounters schema",    "status": "show overall client status",    "version": "get ceph version"}
复制代码


  • 客户端与 mds 的连接 session


root@pubt2-ceph13:/home/hzwuhongsong#  sudo ceph --admin-daemon=/var/run/ceph/ceph-client.admin.1684307.94420974780416.asok mds_sessions{    "id": 284139,    "inst": {        "name": {            "type": "client",            "num": 284139        },        "addr": {            "nonce": 2584420922,            "addr": "10.182.30.13:0"        }    },    "inst_str": "client.284139 10.182.30.13:0/2584420922",    "addr_str": "10.182.30.13:0/2584420922",    "sessions": [        {            "mds": 0,            "addr": "10.182.30.13:6810/1219491430",            "seq": 0,            "cap_gen": 0,            "cap_ttl": "2019-11-29 17:17:44.358388",            "last_cap_renew_request": "2019-11-29 17:16:44.358388",            "cap_renew_seq": 8861,            "num_caps": 14,            "state": "open"        }    ],    "mdsmap_epoch": 49}
复制代码


  • 查看客户端发往 mds 的请求


ceph daemon /var/run/ceph/ceph-client.${id}.asok mds_requests 
复制代码


  • 查看客户端发往 osd 的请求:


ceph daemon /var/run/ceph/ceph-client.${id}.asok objecter_requests  
复制代码

ceph fs/mds 指令


/*获取mdsmap,继而可以获取到session_timeout以及session_autoclose等信息*/COMMAND_WITH_FLAG("mds dump " "name=epoch,type=CephInt,req=false,range=0", \ "dump legacy MDS cluster info, optionally from epoch","mds", "r", "cli,rest", FLAG(DEPRECATED))
COMMAND("fs dump " "name=epoch,type=CephInt,req=false,range=0", \ "dump all CephFS status, optionally from epoch", "mds", "r", "cli,rest") /*session_timeout以及session_autoclose等是通过命令设置的,没有配置参数,默认分别是60和300s,*//* 通过ceph mds dump命令可看到这两个参数的当前值*/COMMAND("fs set " \ "name=fs_name,type=CephString " \ "name=var,type=CephChoices,strings=max_mds|max_file_size""|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer" \"|standby_count_wanted|session_timeout|sesion_autoclose " \ "name=val,type=CephString " \ "name=confirm,type=CephString,req=false", \ "set fs parameter <var> to <val>", "mds", "rw", "cli,rest")
COMMAND_WITH_FLAG("mds set_max_mds " \ "name=maxmds,type=CephInt,range=0", \ "set max MDS index", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND_WITH_FLAG("mds set " \ "name=var,type=CephChoices,strings=max_mds|max_file_size" "|allow_new_snaps|inline_data|allow_multimds|allow_dirfrags " \ "name=val,type=CephString " \ "name=confirm,type=CephString,req=false", \ "set mds parameter <var> to <val>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))
COMMAND("mds stat", "show MDS status", "mds", "r", "cli,rest") COMMAND_WITH_FLAG("mds getmap " \ "name=epoch,type=CephInt,req=false,range=0", \ "get MDS map, optionally from epoch", "mds", "r", "cli,rest", FLAG(DEPRECATED))COMMAND("mds metadata name=who,type=CephString,req=false", "fetch metadata for mds <who>", "mds", "r", "cli,rest")COMMAND("mds count-metadata name=property,type=CephString", "count MDSs by metadata field property", "mds", "r", "cli,rest")COMMAND("mds versions", "check running versions of MDSs", "mds", "r", "cli,rest")COMMAND_WITH_FLAG("mds tell " \ "name=who,type=CephString " \ "name=args,type=CephString,n=N", \ "send command to particular mds", "mds", "rw", "cli,rest", FLAG(OBSOLETE))COMMAND("mds compat show", "show mds compatibility settings", \ "mds", "r", "cli,rest")COMMAND_WITH_FLAG("mds stop name=who,type=CephString", "stop mds", \ "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND("mds deactivate name=who,type=CephString", "clean up specified MDS rank (use with `set max_mds` to shrink cluster)", \ "mds", "rw", "cli,rest")COMMAND("mds set_state " \ "name=gid,type=CephInt,range=0 " \ "name=state,type=CephInt,range=0|20", \ "set mds state of <gid> to <numeric-state>", "mds", "rw", "cli,rest")COMMAND("mds fail name=who,type=CephString", \ "Mark MDS failed: trigger a failover if a standby is available", "mds", "rw", "cli,rest")COMMAND("mds repaired name=rank,type=CephString", \ "mark a damaged MDS rank as no longer damaged", "mds", "rw", "cli,rest")COMMAND("mds rm " \ "name=gid,type=CephInt,range=0", \ "remove nonactive mds", "mds", "rw", "cli,rest")COMMAND("mds rmfailed name=who,type=CephString name=confirm,type=CephString,req=false", \ "remove failed mds", "mds", "rw", "cli,rest")COMMAND_WITH_FLAG("mds cluster_down", "take MDS cluster down", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND_WITH_FLAG("mds cluster_up", "bring MDS cluster up", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND("mds compat rm_compat " \ "name=feature,type=CephInt,range=0", \ "remove compatible feature", "mds", "rw", "cli,rest")COMMAND("mds compat rm_incompat " \ "name=feature,type=CephInt,range=0", \ "remove incompatible feature", "mds", "rw", "cli,rest")COMMAND_WITH_FLAG("mds add_data_pool " \ "name=pool,type=CephString", \ "add data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND_WITH_FLAG("mds remove_data_pool " \ "name=pool,type=CephString", \ "remove data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND_WITH_FLAG("mds rm_data_pool " \ "name=pool,type=CephString", \ "remove data pool <pool>", "mds", "rw", "cli,rest", FLAG(DEPRECATED))COMMAND_WITH_FLAG("mds newfs " \ "name=metadata,type=CephInt,range=0 " \ "name=data,type=CephInt,range=0 " \ "name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \ "make new filesystem using pools <metadata> and <data>", \ "mds", "rw", "cli,rest", FLAG(OBSOLETE))COMMAND("fs new " \ "name=fs_name,type=CephString " \ "name=metadata,type=CephString " \ "name=data,type=CephString " \ "name=force,type=CephChoices,strings=--force,req=false " \ "name=sure,type=CephChoices,strings=--allow-dangerous-metadata-overlay,req=false", \ "make new filesystem using named pools <metadata> and <data>", \ "fs", "rw", "cli,rest")COMMAND("fs rm " \ "name=fs_name,type=CephString " \ "name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \ "disable the named filesystem", \ "fs", "rw", "cli,rest")COMMAND("fs reset " \ "name=fs_name,type=CephString " \ "name=sure,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \ "disaster recovery only: reset to a single-MDS map", \ "fs", "rw", "cli,rest")COMMAND("fs ls ", \ "list filesystems", \ "fs", "r", "cli,rest")COMMAND("fs get name=fs_name,type=CephString", \ "get info about one filesystem", \ "fs", "r", "cli,rest") COMMAND("fs flag set name=flag_name,type=CephChoices,strings=enable_multiple " "name=val,type=CephString " \ "name=confirm,type=CephChoices,strings=--yes-i-really-mean-it,req=false", \ "Set a global CephFS flag", \ "fs", "rw", "cli,rest")COMMAND("fs add_data_pool name=fs_name,type=CephString " \ "name=pool,type=CephString", \ "add data pool <pool>", "mds", "rw", "cli,rest")COMMAND("fs rm_data_pool name=fs_name,type=CephString " \ "name=pool,type=CephString", \ "remove data pool <pool>", "mds", "rw", "cli,rest")COMMAND_WITH_FLAG("fs set_default name=fs_name,type=CephString", \ "set the default to the named filesystem", \ "fs", "rw", "cli,rest", \ FLAG(DEPRECATED))COMMAND("fs set-default name=fs_name,type=CephString", \ "set the default to the named filesystem", \ "fs", "rw", "cli,rest")
复制代码


hzwuhongsong@music-data-k8s-0:~$ sudo ceph fs statusceph_fs - 161 clients=======+------+--------+------------------+---------------+-------+-------+| Rank | State  |       MDS        |    Activity   |  dns  |  inos |+------+--------+------------------+---------------+-------+-------+|  0   | active | music-data-k8s-0 | Reqs:    0 /s |  241k |  241k |+------+--------+------------------+---------------+-------+-------++-------------+----------+-------+-------+|     Pool    |   type   |  used | avail |+-------------+----------+-------+-------+| cephfs_meta | metadata |  210M |  706G || cephfs_data |   data   | 2561G |  118T |+-------------+----------+-------+-------+
+------------------+| Standby MDS |+------------------+| music-data-k8s-2 || music-data-k8s-1 |+------------------+MDS version: ceph version 12.2.12+netease+1.0 (4a72ccba99ce63500c90e875d211ad04e8ec15a9) luminous (stable)
复制代码

其它

查看会话

ceph tell mds.0 session ls
复制代码

查看或者取消客户端

ceph tell mds.0 client lsceph tell mds.0 client evict id=25085
// 恢复ceph osd blacklist lsceph osd blacklist rm 192.168.0.26:0/265326503
复制代码


http://www.yangguanjun.com/2018/09/28/cephfs-client-evict-intro/

cephfs 的 stripe 配置以及 file location

 getfattr -n ceph.file.layout 4Mfile  setfattr -n ceph.dir.layout -v "stripe_unit=524288 stripe_count=8 object_size=4194304 pool=cephfs_data2" /mnt/tstfs2/mike512K/
复制代码

cephfs-shell

可以使用这个新的 cephfs -shell 工具在不挂载的情况下操作 CephFS 文件系统(curd 文件或者目录)。


cephfs shell

性能分析工具

cephfs-top& ceph fs perf stats

  • cephfs-topP 版本:


cephfs-top 提供了类似于 top 的工具来实时显示各种 Ceph 文件系统指标. It is development preview quality and will have bugs


https://docs.ceph.com/en/quincy/releases/pacific/


https://docs.ceph.com/en/quincy/cephfs/cephfs-top/#cephfs-top



1、客户端数量,统计 FUSE、内核、libcephfs 使用者。2、客户端 id;3、客户端对应 cephfs 挂载目录;4、客户端本地目录,IP 地址;5、chit cap 的命中率。6、rlat 读操作总延迟(单位 s)7、wlat 写操作总延迟(单位 s)。8、mlat 元数据操作总延迟(单位 s)。9、dlease dentry lease 即客户端 dentry 可用率。10、oicaps 该客户端持有 caps 的数量。11、oinodes 该客户端打开文件持有 inode 的数量。


  • cepg perf stat


ceph fs perf stats [<mds_rank>] [<client_id>] [<client_ip>]


不加 mds_rank 查看到的是整体文件系统监控指标。利用 json 形式查看


ceph fs perf stats | python3 -m json.tool{    "version": 1,         //stat格式的版本号    "global_counters": [   //全局的性能统计参数:8个        "cap_hit",                    "read_latency",             "write_latency",        "metadata_latency",        "dentry_lease",        "opened_files",        "pinned_icaps",        "opened_inodes"    ],    "counters": [],           //每个mds的性能统计参数     "client_metadata": {   //客户端的元数据信息(这里的元数据非文件元数据)        "client.6459": {            "IP": "10.101.17.11",            "hostname": "jtfast01",            "root": "/",            "mount_point": "/mnt/cephfs",            "valid_metrics": [                "cap_hit",                "read_latency",                "write_latency",                "metadata_latency",                "dentry_lease",                "opened_files",                "pinned_icaps",                "opened_inodes"            ]        }    },    "global_metrics": {  //全局的性能统计情况        "client.6459": [            [                0,                0            ],            [                0,                31697            ],            .....            [                0,                1003            ]        ]    },    "metrics": {        //每个mds的统计情况        "delayed_ranks": [],        "mds.0": {            "client.6459": []        },        "mds.1": {            "client.6459": []        }    }}
复制代码


其中属于客户端几个主要的参数:



几个延迟是总延迟,除以操作量才是平均延迟。


rgw 相关工具

rgw 接触不多,暂未更新

性能/io/资源等分析相关工具

ceph osd perfceph daemon osd.* perf dump


hzwuhongsong@dl-ceph-easyai1:~$ sudo ceph daemon osd.148 dump
// op请求相关dump_historic_ops_by_duration {<filterstr> [<filterstr>...]}dump_historic_ops {<filterstr> [<filterstr>...]}dump_blocked_ops {<filterstr> [<filterstr>...]}dump_historic_slow_ops {<filterstr> [<filterstr>...]}
dump_blacklistdump_scrubsdump_watchers

// osd的内存占用dump_mempools
dump_objectstore_kv_statsdump_op_pq_state
复制代码

参考文献

cephfs-top尝鲜体验


ceph官方文档01


ceph官方文档02


Ceph高级工具介绍之ceph-objectstore-tool的使用

用户头像

hs_wu

关注

just do it 2019-07-07 加入

分布式存储&开源&linux后端 https://www.zhihu.com/people/wu-gen-52-4 wojiaowugen@163.com

评论

发布
暂无评论
一文囊括Ceph所有利器(工具)_云计算_hs_wu_InfoQ写作社区