Prometheus 运维工具 Promtool （四）TSDB 功能

2022 年 7 月 27 日
本文字数：8653 字
阅读完需：约 28 分钟

Promtool 在 TSDB 方面一个有 6 个子命令，分别用来进行写性能测试、分析、列出其中的块、dump、从 OpenMetric 导入数据块、为新的记录规则创建数据块，接下来我们依次看一下。

这大概是第一篇写 Promtool TSDB 方面的文章了。在网络上都没有找到相关资料。

写性能测试

Promtool 可以对 Prometheus 进行写的性能测试，命令参数如下：

[root@Erdong-Test ~]# ./promtool tsdb bench write --helpusage: promtool tsdb bench write [<flags>] [<file>]
Run a write performance benchmark.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more                             details.      --out="benchout"       Set the output path.      --metrics=10000        Number of metrics to read.      --scrapes=3000         Number of scrapes to simulate.
Args:  [<file>]  Input file with samples data, default is (../../tsdb/testdata/20kseries.json).

复制代码

首先从官网下载测试所需要的数据 20kseries.json 文件，这个文件在源码仓库的 /tsdb/testdata/ 下，可以通过这个命令来下载

wget https://raw.githubusercontent.com/prometheus/prometheus/main/tsdb/testdata/20kseries.json

复制代码

下载好以后，指定 metrics 和 scrapes 两个参数进行测试

[root@Erdong-Test ~]# ./promtool tsdb bench write --metrics=10000 --scrapes=3000 ./20kseries.jsonlevel=info ts=2022-07-27T13:21:25.546626055Z caller=head.go:493 msg="Replaying on-disk memory mappable chunks if any"level=info ts=2022-07-27T13:21:25.546734166Z caller=head.go:536 msg="On-disk memory mappable chunks replay completed" duration=11.815µslevel=info ts=2022-07-27T13:21:25.546766084Z caller=head.go:542 msg="Replaying WAL, this may take a while"level=info ts=2022-07-27T13:21:25.547101874Z caller=head.go:613 msg="WAL segment loaded" segment=0 maxSegment=0level=info ts=2022-07-27T13:21:25.547131383Z caller=head.go:619 msg="WAL replay completed" checkpoint_replay_duration=38.26µs wal_replay_duration=315.491µs total_replay_duration=409.177µslevel=info ts=2022-07-27T13:21:25.549132675Z caller=db.go:1467 msg="Compactions disabled">> start stage=readData>> completed stage=readData duration=145.973395ms>> start stage=ingestScrapesingestion completed>> completed stage=ingestScrapes duration=3.628682202s > total samples: 30000000 > samples/sec: 8.267435217071414e+06>> start stage=stopStorage>> completed stage=stopStorage duration=1.522008202s

复制代码

这个测试会在当前目录生成一个 benchout 的文件夹，里边是测试生成的文件。这个路径也可以通过参数来进行修改，生成到其他地方或者其他名称。

简单来看，这次读数据消耗了 145ms ，存储数据消耗了 1.5s 。

关于这个结果解读，我们后续在进行专门的讲解。

TSDB 分析

Promtool 可以对 Prometheus 的数据块进行分析，分析 churn 、label 对技术和压缩效率，命令参数如下：

[root@Erdong-Test ~]# ./promtool tsdb analyze --helpusage: promtool tsdb analyze [<flags>] [<db path>] [<block id>]
Analyze churn, label pair cardinality and compaction efficiency.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more                             details.      --limit=20             How many items to show in each list.      --extended             Run extended analysis.
Args:  [<db path>]   Database path (default is data/).  [<block id>]  Block to analyze (default is the last block).

复制代码

在分析的时候可以指定每个列表最多显示多少个项，必须要指定数据的 data 目录，block id 可以指定也可以不指定，不指定的话会分析最新的那个 block 。执行命令以后可以看到如下输出。

[root@Erdong-Test ~]# ./promtool tsdb analyze ./prometheus/data/Block ID: 01G8ZXKAHRGF7BV0FTQYZ18FH5Duration: 59m55.713sSeries: 606677Label names: 26Postings (unique label pairs): 13190Postings entries (total label pairs): 7819776
Label pairs most involved in churning:329487 workload_user_cattle_io_workloadselector=deployment-appops-pushgateway-flink329487 pod_template_hash=6d695f75cf329487 kubernetes_namespace=appops62797 kubernetes_pod_name=pushgateway-flink-6d695f75cf-2xhfh58524 kubernetes_pod_name=pushgateway-flink-6d695f75cf-8fq5852495 kubernetes_pod_name=pushgateway-flink-6d695f75cf-d76n751544 kubernetes_pod_name=pushgateway-flink-6d695f75cf-z4bjb22074 kubernetes_pod_name=pushgateway-flink-6d695f75cf-9jgdk21516 kubernetes_pod_name=pushgateway-flink-6d695f75cf-96hpq20426 kubernetes_pod_name=pushgateway-flink-6d695f75cf-9twr220047 kubernetes_pod_name=pushgateway-flink-6d695f75cf-ngzwz11819 kubernetes_pod_name=pushgateway-flink-6d695f75cf-8hv9z8242 kubernetes_pod_name=pushgateway-flink-6d695f75cf-pddrc3783 __name__=flink_taskmanager_Status_JVM_Memory_Heap_Used3783 __name__=flink_taskmanager_job_task_checkpointAlignmentTime3783 __name__=flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded3783 __name__=flink_taskmanager_Status_JVM_Threads_Count3783 __name__=flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments3783 __name__=flink_taskmanager_Status_Flink_Memory_Managed_Used3783 __name__=flink_taskmanager_job_task_Shuffle_Netty_Output_Buffers_outputQueueLength
Label names most involved in churning:329487 kubernetes_namespace329487 kubernetes_pod_name329487 workload_user_cattle_io_workloadselector329487 __name__329487 pod_template_hash329487 job314352 instance314050 host314050 tm_id185403 task_attempt_num185403 task_name185403 subtask_index185403 job_name185403 task_id185403 task_attempt_id185403 job_id15134 operator_name15134 operator_id66 method55 quantile
Most common label pairs:606565 kubernetes_namespace=appops606565 workload_user_cattle_io_workloadselector=deployment-pushgateway-flink606565 pod_template_hash=6d695f75cf87063 kubernetes_pod_name=pushgateway-flink-6d695f75cf-ngzwz87063 kubernetes_pod_name=pushgateway-flink-6d695f75cf-9jgdk87062 kubernetes_pod_name=pushgateway-flink-6d695f75cf-96hpq87062 kubernetes_pod_name=pushgateway-flink-6d695f75cf-9twr267920 kubernetes_pod_name=pushgateway-flink-6d695f75cf-2xhfh62700 kubernetes_pod_name=pushgateway-flink-6d695f75cf-8fq5854174 kubernetes_pod_name=pushgateway-flink-6d695f75cf-d76n753304 kubernetes_pod_name=pushgateway-flink-6d695f75cf-z4bjb11892 kubernetes_pod_name=pushgateway-flink-6d695f75cf-8hv9z8325 kubernetes_pod_name=pushgateway-flink-6d695f75cf-pddrc6965 __name__=flink_taskmanager_job_task_Shuffle_Netty_Input_Buffers_inputFloatingBuffersUsage6965 __name__=flink_taskmanager_job_task_numBuffersOut6965 __name__=flink_taskmanager_job_task_isBackPressured6965 __name__=flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded6965 __name__=flink_taskmanager_job_task_numBuffersOutPerSecond6965 __name__=push_time_seconds6965 __name__=flink_taskmanager_job_task_buffers_inPoolUsage
Label names with highest cumulative label value length:50890 task_name42890 tm_id34890 operator_id34890 task_id34890 task_attempt_id34890 job_id21890 job_name18928 job18890 operator_name13890 host5972 __name__3890 task_attempt_num3890 subtask_index3100 instance340 kubernetes_pod_name40 revision38 handler35 workload_user_cattle_io_workloadselector19 quantile16 method
Highest cardinality labels:1012 instance1002 job1000 operator_id1000 task_id1000 task_attempt_num1000 host1000 task_name1000 tm_id1000 job_id1000 job_name1000 task_attempt_id1000 subtask_index1000 operator_name137 __name__10 kubernetes_pod_name7 handler7 quantile4 method3 code2 version
Highest cardinality metric names:6965 my_batch_job_duration_seconds6965 flink_taskmanager_Status_Flink_Memory_Managed_Used6965 flink_taskmanager_Status_JVM_CPU_Load6965 flink_taskmanager_Status_JVM_CPU_Time6965 flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded6965 flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded6965 flink_taskmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Count6965 flink_taskmanager_Status_JVM_GarbageCollector_PS_MarkSweep_Time6965 flink_taskmanager_Status_JVM_GarbageCollector_PS_Scavenge_Count6965 flink_taskmanager_Status_JVM_GarbageCollector_PS_Scavenge_Time6965 flink_taskmanager_Status_JVM_Memory_Direct_Count6965 flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed6965 flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity6965 flink_taskmanager_Status_JVM_Memory_Heap_Committed6965 flink_taskmanager_Status_JVM_Memory_Heap_Max6965 flink_taskmanager_Status_JVM_Memory_Heap_Used6965 flink_taskmanager_Status_JVM_Memory_Mapped_Count6965 flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed6965 flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity6965 flink_taskmanager_Status_JVM_Memory_Metaspace_Committed

复制代码

列出 TSDB 数据块

使用 Promtool 还可以列出当前 Prometheus 的所有数据块。命令的参数如下：

[root@Erdong-Test ~]# ./promtool tsdb list --helpusage: promtool tsdb list [<flags>] [<db path>]
List tsdb blocks.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more                             details.  -r, --human-readable       Print human readable values.
Args:  [<db path>]  Database path (default is data/).

复制代码

执行这个命令我们来看一下效果，建议加上 -r 参数

[root@Erdong-Test ~]# ./promtool tsdb list -r ./prometheus-pushgateway/data/BLOCK ULID                  MIN TIME                       MAX TIME                       DURATION     NUM SAMPLES  NUM CHUNKS   NUM SERIES   SIZE01G8XB6J42S6FTDMAPYV30CTR3  2022-07-26 12:00:03 +0000 UTC  2022-07-26 13:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       92MiB217KiB450B01G8XEMDR374M816NZ2NMNS5AQ  2022-07-26 13:00:03 +0000 UTC  2022-07-26 14:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       92MiB217KiB825B01G8XJ29C2JVV8PEQFR748C1QR  2022-07-26 14:00:03 +0000 UTC  2022-07-26 15:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       94MiB125KiB158B01G8XNG502K46CF1960R1FYEAA  2022-07-26 15:00:03 +0000 UTC  2022-07-26 16:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       92MiB388KiB448B01G8XRY0M2CV8J323VWM393JTZ  2022-07-26 16:00:03 +0000 UTC  2022-07-26 17:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       93MiB978KiB543B01G8XWBW82NGHBPP6QPZMJ6XAM  2022-07-26 17:00:03 +0000 UTC  2022-07-26 18:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       93MiB979KiB590B01G8XZSQW2WY7SG70Z40CW2CZV  2022-07-26 18:00:03 +0000 UTC  2022-07-26 19:00:00 +0000 UTC  59m56.018s   78421200     435860       435412       94MiB719KiB630B

复制代码

通过这个命令我们可以看到 Block ID、数据存储的开始时间和结束时间，一共存储了多长时间段数据，Sample 的数量，Chunk 的数量，Serie 的数量、以及 Block 的大小。

Dump

Promtool 还行可以进行 dump，只不过是从 TSDB 中 dump Sample 数据出来。

[root@Erdong-Test ~]# ./promtool tsdb dump --helpusage: promtool tsdb dump [<flags>] [<db path>]
Dump samples from a TSDB.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more                             details.      --min-time=-9223372036854775808                             Minimum timestamp to dump.      --max-time=9223372036854775807                             Maximum timestamp to dump.
Args:  [<db path>]  Database path (default is data/).

复制代码

对于这个命令一定要指定数据目录 data 的路径，另外最大最小时间也指定一下，否则会 dump 数据库中所有的 Sample 数据。

我们执行一下这个命令

[root@Erdong-Test ~]# ./promtool tsdb dump --min-time=1658927217000 --max-time=1658930818000 ../prometheus-pushgateway/data/

复制代码

这个命令 dump 出来的数据好像只会在屏幕输出，并不会输出到文件，dump 的时候记得人为导入到某个文件。

从 OpenMetric 导入数据块

Promtool 工具还可以从 OpenMetrics 输入导入 sample 数据并生成 TSDB 数据块。这个命令的使用场景是在不同的监控系统或者时序数据库之间迁移数据使用的，先将对应的数据转换成 OpenMetric 格式，然后将 OpenMetric 格式的数据导入到 Prometheus 的 TSDB 数据库中。

这个命令的参数如下:

[root@Erdong-Test ~]# ./promtool tsdb create-blocks-from openmetrics --helpusage: promtool tsdb create-blocks-from openmetrics <input file> [<output directory>]
Import samples from OpenMetrics input and produce TSDB blocks. Please refer to the storage docs for more details.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and more                             details.  -r, --human-readable       Print human readable values.  -q, --quiet                Do not print created blocks.
Args:  <input file>          OpenMetrics file to read samples from.  [<output directory>]  Output directory for generated blocks.

复制代码

这个我没有对应的实验环境，参数提示让参考存储文档了解更多细节，就是这个文档 https://prometheus.io/docs/prometheus/latest/storage/ 。大家自行学习一下，我有环境了在写文章。

为新的记录规则创建数据块

当创建新的记录规则以后，这个新的记录规则是没有历史数据的。记录规则所产生的数据只在创建时开始存储到数据库中，Promtool 可以创建记录规则的历史数据。

[root@Erdong-Test ~]# ./promtool tsdb create-blocks-from rules --helpusage: promtool tsdb create-blocks-from rules --start=START [<flags>] <rule-files>...
Create blocks of data for new recording rules.
Flags:  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).      --version              Show application version.      --enable-feature= ...  Comma separated feature names to enable (only PromQL related). See                             https://prometheus.io/docs/prometheus/latest/feature_flags/ for the options and                             more details.  -r, --human-readable       Print human readable values.  -q, --quiet                Do not print created blocks.      --url=http://localhost:9090                             The URL for the Prometheus API with the data where the rule will be backfilled                             from.      --start=START          The time to start backfilling the new rule from. Must be a RFC3339 formatted date                             or Unix timestamp. Required.      --end=END              If an end time is provided, all recording rules in the rule files provided will                             be backfilled to the end time. Default will backfill up to 3 hours ago. Must be a                             RFC3339 formatted date or Unix timestamp.      --output-dir="data/"   Output directory for generated blocks.      --eval-interval=60s    How frequently to evaluate rules when backfilling if a value is not set in the                             recording rule files.
Args:  <rule-files>  A list of one or more files containing recording rules to be backfilled. All recording rules                listed in the files will be backfilled. Alerting rules are not evaluated.

复制代码

我们来看一下这个命令，命令中提供的记录规则文件应该是一个普通的 Prometheus 规则文件，然后指定要生成的历史数据的时间范围，

[root@Erdong-Test ~]# ./promtool tsdb create-blocks-from rules \    --start 1617079873 \    --end 1617097873 \    --url http://mypromserver.com:9090 \    rules.yaml rules2.yaml

复制代码

命令 promtool tsdb create-blocks-from rules 的输出结果为一个目录，该目录中包含记录规则文件中所有规则的历史规则数据块。默认情况下，输出目录为 data/。为了利用这些新的块数据，必须将这些块移动到正在运行的 Prometheus 实例数据目录下，然后在 Prometheus 实例启动的时候将参数 --storage.tsdb.allow-overlapping-blocks 设为启用。之后 Prometheus 会在在下一次压缩运行时，将新块与现有块合并。这样记录规则文件生成的历史数据就可以在 Prometheus 中进行查询了。

创建记录规则的历史数据有一些限制，

如果多次执行生成历史数据，且开始时间和结束时间重叠，则每次运行记录规则生成历史数据时都会创建包含相同数据的块。
记录规则文件中的所有规则都将被评估。
如果在记录规则文件中设置了时间间隔，该时间间隔优先于 tsdb create-blocks-from rules 命令中的 --eval-interval 参数。
如果告警规则在记录规则文件中，那么告警规则会被忽略。
同一组中的规则无法看到之前规则的结果。这意味着不支持引用正在生成历史数据的其他规则的规则。一种解决方法是多次生成历史数据并首先创建依赖数据(并将依赖数据移动到 Prometheus 服务器数据目录，以便从 Prometheus API 访问它)。