写点什么

drainer 异常宕机原因分析

  • 2024-01-12
    北京
  • 本文字数:4492 字

    阅读完需:约 15 分钟

作者: Raymond 原文来源:https://tidb.net/blog/a8f3e65c

一、测试背景

最近在笔者个人的测试环境中突然发现一个 drainer 节点宕机了,且无法拉起,通过查看 drainer 的日志的关键字


table xxx not found,于是笔者猜测这应该和 tidb 节点的 ddl 变更有关系,于是看了下 ddl 的变更历史记录发现是对某个表做了交换分区的处理,难道是 drainer 不兼容交换分区功能导致?通过查阅官方文档 https://docs.pingcap.com/zh/tidb/v6.5/basic-features# 分区 发现交换分区是在 6.5 的版本中正式引入的,恰巧笔者的环境也是 6.5.3,所以笔者猜测还真有可能是交换分区导致的,于是带着这个好奇心重新测试了下,在这里简单说明下交换分区的作用,交换分区的作用是交换分区和非分区表,类似于 rename table, 具体的说明可以查阅官方文档 https://docs.pingcap.com/zh/tidb/v6.5/partitioned-table# 分区管理

二、测试过程

测试版本:v6.5.3


测试过程:

1. 新建 1 个分区表 e

表结构来源于https://dev.mysql.com/doc/refman/8.0/en/partitioning-management-exchange.htmlCREATE TABLE e (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30))    PARTITION BY RANGE (id) (        PARTITION p0 VALUES LESS THAN (50),        PARTITION p1 VALUES LESS THAN (100),        PARTITION p2 VALUES LESS THAN (150),        PARTITION p3 VALUES LESS THAN (MAXVALUE));查询该表的table id 为93,然后查询每个分区的分区idmysql> select * from information_schema.tables where table_name = 'e'\G*************************** 1. row ***************************             TABLE_CATALOG: def              TABLE_SCHEMA: test                TABLE_NAME: e                TABLE_TYPE: BASE TABLE                    ENGINE: InnoDB                   VERSION: 10                ROW_FORMAT: Compact                TABLE_ROWS: 4            AVG_ROW_LENGTH: 19               DATA_LENGTH: 77           MAX_DATA_LENGTH: 0              INDEX_LENGTH: 0                 DATA_FREE: 0            AUTO_INCREMENT: NULL               CREATE_TIME: 2023-09-03 10:27:19               UPDATE_TIME: NULL                CHECK_TIME: NULL           TABLE_COLLATION: utf8mb4_bin                  CHECKSUM: NULL            CREATE_OPTIONS: partitioned             TABLE_COMMENT:             TIDB_TABLE_ID: 93 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED              TIDB_PK_TYPE: NONCLUSTEREDTIDB_PLACEMENT_POLICY_NAME: NULL1 row in set (0.00 sec)
mysql> select * from information_schema.PARTITIONS where TABLE_NAME ='e'\G*************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: e PARTITION_NAME: p0 SUBPARTITION_NAME: NULL PARTITION_ORDINAL_POSITION: 1SUBPARTITION_ORDINAL_POSITION: NULL PARTITION_METHOD: RANGE SUBPARTITION_METHOD: NULL PARTITION_EXPRESSION: `id` SUBPARTITION_EXPRESSION: NULL PARTITION_DESCRIPTION: 50 TABLE_ROWS: 1 AVG_ROW_LENGTH: 20 DATA_LENGTH: 20 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 0 CREATE_TIME: 2023-09-03 10:27:19 UPDATE_TIME: NULL CHECK_TIME: NULL CHECKSUM: NULL PARTITION_COMMENT: NODEGROUP: NULL TABLESPACE_NAME: NULL TIDB_PARTITION_ID: 94 TIDB_PLACEMENT_POLICY_NAME: NULL 由此可知e表p0分区的id为94
复制代码

2. 新建 1 个非分区表 e2

CREATE TABLE e2 (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30)
);mysql> select * from information_schema.tables where table_name = 'e2'\G*************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: e2 TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Compact TABLE_ROWS: 0 AVG_ROW_LENGTH: 0 DATA_LENGTH: 0 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 0 AUTO_INCREMENT: NULL CREATE_TIME: 2023-09-03 10:32:47 UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: utf8mb4_bin CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: TIDB_TABLE_ID: 105 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED TIDB_PK_TYPE: NONCLUSTEREDTIDB_PLACEMENT_POLICY_NAME: NULL1 row in set (0.01 sec)由此可知e2的table id为105
复制代码

3. 分区表与非分区表进行交换

INSERT INTO e VALUES    (1669, "Jim", "Smith"),    (337, "Mary", "Jones"),    (16, "Frank", "White"),    (2005, "Linda", "Black");     
mysql> select * from e;+------+-------+-------+| id | fname | lname |+------+-------+-------+| 1669 | Jim | Smith || 337 | Mary | Jones || 2005 | Linda | Black || 1669 | Jim | Smith || 337 | Mary | Jones || 2005 | Linda | Black || 16 | Frank | White || 16 | Frank | White |+------+-------+-------+8 rows in set, 1 warning (0.05 sec) ALTER TABLE e EXCHANGE PARTITION p0 WITH TABLE e2;将e表的p0分区的数据交换到e2表mysql> select * from e2;+----+-------+-------+| id | fname | lname |+----+-------+-------+| 16 | Frank | White || 16 | Frank | White |+----+-------+-------+2 rows in set (0.00 sec)
mysql> select * from e;+------+-------+-------+| id | fname | lname |+------+-------+-------+| 1669 | Jim | Smith || 337 | Mary | Jones || 2005 | Linda | Black || 1669 | Jim | Smith || 337 | Mary | Jones || 2005 | Linda | Black |+------+-------+-------+6 rows in set, 1 warning (0.01 sec)
由此可知,交换分区成功
复制代码

4. 交换后 table id 发生的变化

mysql> select * from information_schema.tables where table_name = 'e2'\G*************************** 1. row ***************************             TABLE_CATALOG: def              TABLE_SCHEMA: test                TABLE_NAME: e2                TABLE_TYPE: BASE TABLE                    ENGINE: InnoDB                   VERSION: 10                ROW_FORMAT: Compact                TABLE_ROWS: 2            AVG_ROW_LENGTH: 20               DATA_LENGTH: 40           MAX_DATA_LENGTH: 0              INDEX_LENGTH: 0                 DATA_FREE: 0            AUTO_INCREMENT: NULL               CREATE_TIME: 2023-09-03 10:38:24               UPDATE_TIME: NULL                CHECK_TIME: NULL           TABLE_COLLATION: utf8mb4_bin                  CHECKSUM: NULL            CREATE_OPTIONS:             TABLE_COMMENT:             TIDB_TABLE_ID: 94 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED              TIDB_PK_TYPE: NONCLUSTEREDTIDB_PLACEMENT_POLICY_NAME: NULL1 row in set (0.00 sec)mysql> select * from information_schema.PARTITIONS where TABLE_NAME ='e'\G*************************** 1. row ***************************                TABLE_CATALOG: def                 TABLE_SCHEMA: test                   TABLE_NAME: e               PARTITION_NAME: p0            SUBPARTITION_NAME: NULL   PARTITION_ORDINAL_POSITION: 1SUBPARTITION_ORDINAL_POSITION: NULL             PARTITION_METHOD: RANGE          SUBPARTITION_METHOD: NULL         PARTITION_EXPRESSION: `id`      SUBPARTITION_EXPRESSION: NULL        PARTITION_DESCRIPTION: 50                   TABLE_ROWS: 0               AVG_ROW_LENGTH: 0                  DATA_LENGTH: 0              MAX_DATA_LENGTH: 0                 INDEX_LENGTH: 0                    DATA_FREE: 0                  CREATE_TIME: 2023-09-03 10:27:19                  UPDATE_TIME: NULL                   CHECK_TIME: NULL                     CHECKSUM: NULL            PARTITION_COMMENT:                    NODEGROUP: NULL              TABLESPACE_NAME: NULL            TIDB_PARTITION_ID: 105   TIDB_PLACEMENT_POLICY_NAME: NULL由此可知,交换分区后,e2的table id变成了原来e表p0分区的table id 94,而e表p0分区的tidb_partition_id   变成了原来e2表的105,也就是分区交换后,table id也进行了互换                                         
复制代码

三、测试结果

那么进行分区交换后导致的 tabld id 也发生了变化会带来什么后果呢?当对 e2 进行一个 ddl 时,比如说删除 e2 表


drop table e2;
复制代码


那么 drainer 是否可以正常处理这个 ddl,通过查看的 drainer.log, 可以看到 drainer 已经开始报错了,而报错的原因是因为 table 94 not found, 这也是就是 drainer 暂时无法支持这种如果表是交换分区后产生的特殊处理,而且会导致 drainer 进程无法正常运行


[2023/09/03 10:41:04.223 +08:00] [ERROR] [main.go:69] ["start drainer server failed"] [error="handle ddl job ID:108, Type:drop table, State:synced, SchemaState:none, SchemaID:85, TableID:94, RowCount:0, ArgLen:0, start time: 2023-09-03 10:41:02.807 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0 failed, the schema info: xxxx table 94 not found"]
复制代码

四、结论

通过以上的测试我们可以得出目前 drainer 组件其实是不兼容交换分区的处理,所以在日常的使用过程中我们要注意这点,在官方没有完善 drainer 的这个兼容性后,如果应用层面无法避免使用交换分区的逻辑,那么在 TiDB 层面应该使用 br log 来代替 drainer 的使用 https://docs.pingcap.com/zh/tidb/v6.5/br-pitr-guide


发布于: 21 分钟前阅读数: 4
用户头像

TiDB 社区官网:https://tidb.net/ 2021-12-15 加入

TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/

评论

发布
暂无评论
drainer 异常宕机原因分析_集群管理_TiDB 社区干货传送门_InfoQ写作社区