写点什么

终止分区表变更操作时误删数据字典缓存导致 MySQL 崩溃分析

作者:GreatSQL
  • 2025-06-25
    福建
  • 本文字数:4832 字

    阅读完需:约 16 分钟

终止分区表变更操作时误删数据字典缓存导致 MySQL 崩溃分析

1. 问题简述

在 MySQL 中,当终止一个处于 committing alter table to storage engine 阶段的分区表操作时,InnoDB 会尝试进行回滚并清理数据字典缓存。不幸的是,过程中发生了误删表缓存对象的情况 —— InnoDB 错误地移除了另一张非目标表的缓存条目,导致引用计数不为 0,触发断言失败并导致 MySQL 崩溃。

2. 复现步骤

环境说明:


系统:CentOS 7


数据库:MySQL 8.0.32

2.1 建表准备

CREATE DATABASE TEST;CREATE TABLE TEST.A ( X INT)PARTITION BY RANGE (X) (    PARTITION P0 VALUES LESS THAN (10000),    PARTITION PMAX VALUES LESS THAN MAXVALUE);CREATE TABLE TEST.A_1 LIKE TEST.A;SELECT COUNT(*) FROM TEST.A_1;
复制代码

2.2 反复尝试终止 ALTER TABLE 操作

在一个 Shell 中持续执行终止 "committing alter table to storage engine" 状态(KILL 相应 SQL 请求)的线程:


while true; do  {   mysql --login-path=mylogin -BNe  'SELECT CONCAT("KILL ",ID ,";") FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE = "COMMITTING ALTER TABLE TO STORAGE ENGINE";' | mysql --login-path=mylogin  -vvv ; } ; done
复制代码

2.3 循环执行 ALTER 操作

在另一个 Shell 中循环执行添加与删除分区的操作:


while true; do  {   mysql --login-path=mylogin -BNe  "ALTER TABLE test.a ADD PARTITION (PARTITION pmax VALUES LESS THAN MAXVALUE);" ;   mysql --login-path=mylogin -BNe " ALTER TABLE test.a DROP PARTITION pmax;" ; }; done
复制代码

3. 崩溃日志与原因分析

DROP PARTITION 操作在关键阶段被 KILL 时,崩溃发生:


2025-06-05T17:03:19.270698+08:00 2975 [ERROR] [MY-013183] [InnoDB] Assertion failure: dict0dict.cc:1885:table->get_ref_count() == 0 thread 140327459395328InnoDB: We intentionally generate a memory trap.InnoDB: Submit a detailed bug report to http://bugs.mysql.com.InnoDB: If you get repeated assertion failures or crashes, evenInnoDB: immediately after the mysqld startup, there may beInnoDB: corruption in the InnoDB tablespace. Please refer toInnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.htmlInnoDB: about forcing recovery.2025-06-05T09:03:19Z UTC - mysqld got signal 6 ;Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.BuildID[sha1]=7afc1fad28c808c287fa7599451d3355e1b3be73Thread pointer: 0x7fa054000f40Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong...stack_bottom = 7fa0885e8a30 thread_stack 0x100000/usr/local/mysql-debug/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x48da47f]/usr/local/mysql-debug/bin/mysqld(print_fatal_signal(int)+0x3a2) [0x34cf2af]/usr/local/mysql-debug/bin/mysqld(my_server_abort()+0x6b) [0x34cf561]/usr/local/mysql-debug/bin/mysqld(my_abort()+0xd) [0x48d0eaf]/usr/local/mysql-debug/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4d9b051]/usr/local/mysql-debug/bin/mysqld() [0x4f15f0b]/usr/local/mysql-debug/bin/mysqld(dict_table_remove_from_cache(dict_table_t*)+0x1d) [0x4f166cd]/usr/local/mysql-debug/bin/mysqld(dict_partitioned_table_remove_from_cache(char const*)+0x159) [0x4f16829]/usr/local/mysql-debug/bin/mysqld() [0x4a07dc2]/usr/local/mysql-debug/bin/mysqld() [0x337ee49]/usr/local/mysql-debug/bin/mysqld(mysql_alter_table(THD*, char const*, char const*, HA_CREATE_INFO*, Table_ref*, Alter_info*)+0x3bf6) [0x338a12a]/usr/local/mysql-debug/bin/mysqld(Sql_cmd_alter_table::execute(THD*)+0x5d4) [0x39af486]/usr/local/mysql-debug/bin/mysqld(mysql_execute_command(THD*, bool)+0x54c8) [0x32a5677]/usr/local/mysql-debug/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x756) [0x32a785f]/usr/local/mysql-debug/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x15a8) [0x329d7c3]/usr/local/mysql-debug/bin/mysqld(do_command(THD*)+0x5bd) [0x329b853]/usr/local/mysql-debug/bin/mysqld() [0x34ba2df]/usr/local/mysql-debug/bin/mysqld() [0x51ed236]/lib64/libpthread.so.0(+0x7ea5) [0x7fa13cf41ea5]/lib64/libc.so.6(clone+0x6d) [0x7fa13b55bb0d]
Trying to get some variables.Some pointers may be invalid and cause the dump to abort.Query (7fa054010390): ALTER TABLE test.a DROP PARTITION pmaxConnection ID (thread ID): 2975Status: KILL_CONNECTION
复制代码

堆栈信息

(gdb) bt#0  0x00007fa13cf46aa1 in pthread_kill () from /lib64/libpthread.so.0#1  0x00000000048da549 in my_write_core (sig=6) at /software/db/mysql-8.0.32/mysys/stacktrace.cc:295#2  0x00000000034cf4ec in handle_fatal_signal (sig=6) at /software/db/mysql-8.0.32/sql/signal_handler.cc:230#3  <signal handler called>#4  0x00007fa13b493387 in raise () from /lib64/libc.so.6#5  0x00007fa13b494a78 in abort () from /lib64/libc.so.6#6  0x00000000034cf671 in my_server_abort () at /software/db/mysql-8.0.32/sql/signal_handler.cc:286#7  0x00000000048d0eaf in my_abort () at /software/db/mysql-8.0.32/mysys/my_init.cc:258#8  0x0000000004d9b051 in ut_dbg_assertion_failed (expr=0x6d821d3 "table->get_ref_count() == 0",     file=0x6d81ab8 "/software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc", line=1885) at /software/db/mysql-8.0.32/storage/innobase/ut/ut0dbg.cc:99#9  0x0000000004f15f0b in dict_table_remove_from_cache_low (table=0x7fa0445c6e08, lru_evict=false)    at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1885#10 0x0000000004f166cd in dict_table_remove_from_cache (table=0x7fa0445c6e08) at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1969#11 0x0000000004f16829 in dict_partitioned_table_remove_from_cache (name=0x7fa0885e2bd0 "test/a")    at /software/db/mysql-8.0.32/storage/innobase/dict/dict0dict.cc:1999#12 0x0000000004a07dc2 in innobase_dict_cache_reset (schema_name=0x7fa054010868 "test", table_name=0x7fa054010880 "a")    at /software/db/mysql-8.0.32/storage/innobase/handler/ha_innodb.cc:3991#13 0x000000000337ee49 in mysql_inplace_alter_table (thd=0x7fa054000f40, schema=..., new_schema=..., table_def=0x7fa0445af410,     altered_table_def=0x7fa054012d60, table_list=0x7fa054011020, table=0x0, altered_table=0x7fa05403e230, ha_alter_info=0x7fa0885e3710,     inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, alter_ctx=0x7fa0885e4630, columns=std::set with 0 elements, fk_key_info=0x7fa05403b720,     fk_key_count=0, fk_invalidator=0x7fa0885e4560) at /software/db/mysql-8.0.32/sql/sql_table.cc:13744#14 0x000000000338a12a in mysql_alter_table (thd=0x7fa054000f40, new_db=0x7fa054010868 "test", new_name=0x0, create_info=0x7fa0885e6170,     table_list=0x7fa054011020, alter_info=0x7fa0885e6000) at /software/db/mysql-8.0.32/sql/sql_table.cc:17405#15 0x00000000039af486 in Sql_cmd_alter_table::execute (this=0x7fa054011670, thd=0x7fa054000f40) at /software/db/mysql-8.0.32/sql/sql_alter.cc:349#16 0x00000000032a5677 in mysql_execute_command (thd=0x7fa054000f40, first_level=true) at /software/db/mysql-8.0.32/sql/sql_parse.cc:4688#17 0x00000000032a785f in dispatch_sql_command (thd=0x7fa054000f40, parser_state=0x7fa0885e7910) at /software/db/mysql-8.0.32/sql/sql_parse.cc:5322#18 0x000000000329d7c3 in dispatch_command (thd=0x7fa054000f40, com_data=0x7fa0885e8a00, command=COM_QUERY)    at /software/db/mysql-8.0.32/sql/sql_parse.cc:2036#19 0x000000000329b853 in do_command (thd=0x7fa054000f40) at /software/db/mysql-8.0.32/sql/sql_parse.cc:1439#20 0x00000000034ba2df in handle_connection (arg=0xbe191d0) at /software/db/mysql-8.0.32/sql/conn_handler/connection_handler_per_thread.cc:302#21 0x00000000051ed236 in pfs_spawn_thread (arg=0xbe23890) at /software/db/mysql-8.0.32/storage/perfschema/pfs.cc:2986#22 0x00007fa13cf41ea5 in start_thread () from /lib64/libpthread.so.0#23 0x00007fa13b55bb0d in clone () from /lib64/libc.so.6
复制代码

崩溃核心栈:

dict_partitioned_table_remove_from_cache -> strncmp("test/a", "test/a_1#p#p0", 6) == 0  // 误判为匹配 -> dict_table_remove_from_cache() 导致 ref_count != 0 触发断言
复制代码

GDB 调试关键值:

(gdb) p name$1 = "test/a"(gdb) p prev_table->name.m_name$2 = "test/a_1#p#p0"(gdb) p strncmp("test/a", "test/a_1#p#p0", 6)$3 = 0   // 返回0,误认为匹配
复制代码


由于 "test/a""test/a_1#p#p0" 的前缀,使用 strncmp(name, m_name, name_len) 判断时发生误判,错误地将 test.a_1 的分区子表从缓存中移除。

4. 关键源码解读

崩溃函数在 InnoDB 源码中的定义如下(文件 dict0dict.cc):


void dict_partitioned_table_remove_from_cache(const char *name) {  ut_ad(dict_sys_mutex_own());
size_t name_len = strlen(name);
for (uint32_t i = 0; i < hash_get_n_cells(dict_sys->table_id_hash); ++i) { dict_table_t *table = static_cast<dict_table_t *>(hash_get_first(dict_sys->table_hash, i));
while (table != nullptr) { dict_table_t *prev_table = table; table = static_cast<dict_table_t *>(HASH_GET_NEXT(name_hash, prev_table));
if (prev_table->is_dd_table) { continue; }
if ((strncmp(name, prev_table->name.m_name, name_len) == 0) && dict_table_is_partition(prev_table)) { btr_drop_ahi_for_table(prev_table); dict_table_remove_from_cache(prev_table); // 错误删除 } } }}
复制代码

核心问题:

  • 使用 strncmp(name, m_name, name_len) 导致前缀匹配误判为全匹配

  • 应该采用 精确表名匹配,如:


if (strcmp(name, prev_table->name.m_name) == 0)
复制代码


  • 或使用更严格的逻辑,确保仅匹配当前表或其合法分区。

5. 结论与建议

问题根因:

  • 缓存移除逻辑基于前缀匹配,未验证完整表名,误删其他表结构对象;

  • 被删表正在被引用,引用计数 ref_count ≠ 0,触发 InnoDB 保护性断言;

  • 最终导致 MySQL 进程崩溃。

潜在影响:

  • 表结构类似(同名前缀)的表,在 DDL 并发或异常终止场景下可能被误删;

  • 可能存在更广泛的 InnoDB 缓存一致性漏洞,特别在并发和 kill 介入场景中。

建议修复方向:

  • strncmp(name, m_name, name_len) 替换为更安全的 strcmp 或引入完整名判断;

  • 限制 dict_partitioned_table_remove_from_cache() 中删除的表名范围,仅处理严格匹配的表;

  • 增强 DDL 异常中断处理的健壮性,确保引用关系、缓存清理逻辑的一致性。

发布于: 刚刚阅读数: 2
用户头像

GreatSQL

关注

GreatSQL社区 2023-01-31 加入

GreatSQL是由万里数据库维护的MySQL分支,专注于提升MGR可靠性及性能,支持InnoDB并行查询特性,是适用于金融级应用的MySQL分支版本。 社区:https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL

评论

发布
暂无评论
终止分区表变更操作时误删数据字典缓存导致MySQL崩溃分析_GreatSQL_InfoQ写作社区