写点什么

MySQL 死锁套路:一次诡异的批量插入死锁问题分析

用户头像
AI乔治
关注
发布于: 2021 年 04 月 26 日
MySQL 死锁套路:一次诡异的批量插入死锁问题分析

线上最近出现了批量 insert 的死锁,百思不得解。死锁记录如下:

2018-10-26T11:04:41.759589Z 8530809 [Note] InnoDB: *** (1) TRANSACTION:
TRANSACTION 1202026765, ACTIVE 0 sec insertingmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1MySQL thread id 8532863, OS thread handle 139858337453824, query id 16231472122 10.111.10.143 seewo updateINSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time) VALUES ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now()) , ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())2018-10-26T11:04:41.759635Z 8530809 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026765 lock_mode X locks gap before rec insert intention waiting2018-10-26T11:04:41.759674Z 8530809 [Note] InnoDB: *** (2) TRANSACTION:
TRANSACTION 1202026764, ACTIVE 0 sec insertingmysql tables in use 1, locked 13 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1MySQL thread id 8530809, OS thread handle 139858469242624, query id 16231472119 10.111.10.153 seewo updateINSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time) VALUES ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now()) , ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now()) , ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())2018-10-26T11:04:41.759713Z 8530809 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock mode S2018-10-26T11:04:41.759753Z 8530809 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock_mode X locks gap before rec insert intention waiting2018-10-26T11:04:41.759784Z 8530809 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
复制代码

第一反应是批量 insert,insert 的顺序不一样导致的死锁。但是这个在这里是不成立的。原因有两点

出现问题的批量插入 SQL 中顺序是一模一样的,在顺序一样的情况下,只会进行插入等待(implicit lock 转 explicit X 锁)下面有实验如果是因为批量插入顺序不一致带来的死锁日志,打印的结果不是等待插入意向锁(insert intention waiting),下面有实验

现在采用一个简化的表,做实验

CREATE TABLE `t1` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `a` varchar(5)  NOT NULL DEFAULT '',  `b` varchar(5)  NOT NULL DEFAULT '',  PRIMARY KEY (`id`),  UNIQUE KEY `uk_name` (`a`,`b`)) ENGINE=InnoDB;
复制代码

实验 01

在记录不存在的情况下,两个同样顺序的批量 insert 同时执行,第二个会进行锁等待状态首先 truncate t1;

可以看到目前锁的状态

mysql> select * from information_schema.innodb_locks;+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+| lock_id     | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+| 31AE:54:4:2 | 31AE        | S         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  || 31AD:54:4:2 | 31AD        | X         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  |+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
复制代码

在我们执行事务 t1 的 insert 时,没有在任何锁的断点处出现,这跟 MySQL 插入的原理有关系 insert 加的是隐式锁。什么是隐式锁?隐式锁的意思就是没有锁在 t1 插入记录时,是不加锁的。这个时候事务 t1 还未提交的情况下,事务 t2 尝试插入的时候,发现有这条记录,t2 尝试获取 S 锁,会判定记录上的事务 id 是否活跃,如果活跃的话,说明事务未结束,会帮 t1 把它的隐式锁提升为显式锁(X 锁)

源码如下:

t2 获取 S 锁的结果:DB_LOCK_WAIT


实验 02

批量插入顺序不一致的导致的死锁日志不是等待插入意向锁


------------------------LATEST DETECTED DEADLOCK------------------------181101  9:48:36*** (1) TRANSACTION:TRANSACTION 3309, ACTIVE 215 sec insertingmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2MySQL thread id 2, OS thread handle 0x70000a845000, query id 58 localhost root updateinsert into t1(a, b)values("2", "2")*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 3309 lock mode S waitingRecord lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 32; asc 2;; 1: len 1; hex 32; asc 2;; 2: len 4; hex 80000002; asc     ;;
*** (2) TRANSACTION:TRANSACTION 330A, ACTIVE 163 sec insertingmysql tables in use 1, locked 13 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2MySQL thread id 3, OS thread handle 0x70000a888000, query id 59 localhost root updateinsert into t1(a, b)values("1", "1")*** (2) HOLDS THE LOCK(S):RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock_mode X locks rec but not gapRecord lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 32; asc 2;; 1: len 1; hex 32; asc 2;; 2: len 4; hex 80000002; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock mode S waitingRecord lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 1; hex 31; asc 1;; 2: len 4; hex 80000001; asc ;;
*** WE ROLL BACK TRANSACTION (2)
复制代码

到目前为止,已经陷入了僵局,完全没法复现死锁的情况。看了代码,发现在 insert 之前有一个 delete,但是 delete 与 insert 不在一个事务里面,也就是 delete 提交以后,才进行批量 insert,真正出问题的地方在批量 insert 的地方。一开始就排除了 delete 对后面的影响,难道不在一个事务,也会有影响?写了一个代码去模拟,有很大概率会复现

fun test() {    dao.delete() // 对应delete from    // sleep for 10ms    dao.insert() // 对应insert ignore}
复制代码

对应的 SQL 如下,注意是两个事务

begin;delete from t1 where a = '25'commit;
begin;INSERT ignore INTO `t1` (`a`, `b`) VALUES('25','1')commit;
复制代码

这个代码在两个线程同时调用的时候,非常容易死锁。后来翻遍了网上相关的死锁案例,有一个关于 purge 删除的过程可能跟这个有关系。

如果标记为删除,说明事务已经提交,还没来得及 purge,这时后面的事务加 S 锁等待;

在源码中打印一些日志。1.在 storage/innobase/row/row0ins.c 的 row_ins_set_shared_rec_lock 增加日志,可以看到对唯一索引增加 S 锁的过程

if (dict_index_is_clust(index)) {    err = lock_clust_rec_read_check_and_lock(        0, block, rec, index, offsets, LOCK_S, type, thr);} else {    err = lock_sec_rec_read_check_and_lock(        0, block, rec, index, offsets, LOCK_S, type, thr);    // 增加如下日志    fprintf(stderr, "row_ins_set_shared_rec_lock %s %lu %d\n" , index->name, type, err);}
复制代码

2.在 lock_rec_enqueue_waiting 增加日志,可以看到锁等待的情况

staticenum db_errlock_rec_enqueue_waiting({	fprintf(stderr, "lock_rec_enqueue_waiting::::: %s %lu\n" , index->name, type_mode);}
复制代码

日志大概如下

row_ins_set_shared_rec_lock uk_name 0 9 (t1获取S锁成功)row_ins_set_shared_rec_lock uk_name 0 9 (t2获取S锁成功)
lock_rec_enqueue_waiting::::: uk_name 2563(t1 X锁进如锁等待)lock_rec_enqueue_waiting::::: uk_name 2563(t2 X锁进如锁等待)
复制代码

其中 2563=2048+512+3=LOCK_INSERT_INTENTION+LOCK_GAP+LOCK_X

这个过程跟非常经典的三个事务同时 insert,一个回滚,剩下的两个事务一个成功,一个死锁,其实是一模一样的原理。

实验 03

三个 insert ignore,一个回滚造成的死锁

insert 语句都是 insert ignore into t1(a, b)values(“1”, “1”);以下省略

死锁日志,跟我们案例中的一模一样

------------------------LATEST DETECTED DEADLOCK------------------------181101 23:22:59*** (1) TRANSACTION:TRANSACTION 5032, ACTIVE 11 sec insertingmysql tables in use 1, locked 1LOCK WAIT 4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1MySQL thread id 5, OS thread handle 0x70000d736000, query id 125 localhost root updateinsert ignore into t1(a, b)values("1", "1")*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5032 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 313031; asc 101;; 1: len 3; hex 313031; asc 101;; 2: len 4; hex 800007b1; asc     ;;
*** (2) TRANSACTION:TRANSACTION 5033, ACTIVE 6 sec insertingmysql tables in use 1, locked 14 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1MySQL thread id 6, OS thread handle 0x70000d779000, query id 126 localhost root updateinsert ignore into t1(a, b)values("1", "1")*** (2) HOLDS THE LOCK(S):RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock mode S locks gap before recRecord lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 313031; asc 101;; 1: len 3; hex 313031; asc 101;; 2: len 4; hex 800007b1; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 313031; asc 101;; 1: len 3; hex 313031; asc 101;; 2: len 4; hex 800007b1; asc ;;
*** WE ROLL BACK TRANSACTION (2)
复制代码

目前来看,得到的结论是:一个已提交但是未 purge 掉的记录会造成后续插入获取 S 共享锁,两个事务同时获取 S 锁,然后尝试获取插入意向锁,造成死锁网上大神梳理的 insert 流程

首先对插入的间隙加插入意向锁(Insert Intension Locks)

如果该间隙已被加上了 GAP 锁或 Next-Key 锁,则加锁失败进入等待;如果没有,则加锁成功,表示可以插入;

然后判断插入记录是否有唯一键,如果有,则进行唯一性约束检查

如果不存在相同键值,则完成插入如果存在相同键值,则判断该键值是否加锁

如果没有锁, 判断该记录是否被标记为删除如果标记为删除,说明事务已经提交,还没来得及 purge,这时加 S 锁等待;如果没有标记删除,则报 1062 duplicate key 错误;

如果有锁,说明该记录正在处理(新增、删除或更新),且事务还未提交,加 S 锁等待;

插入记录并对记录加 X 记录锁;

上面是我 debug 源码得到的一些结论,如果我的理解有误的话,可以留言讨论哦~

看完三件事❤️

  • 如果你觉得这篇内容对你还蛮有帮助,我想邀请你帮我三个小忙:

  • 点赞,转发,有你们的 『点赞和评论』,才是我创造的动力。

  • 关注公众号 『 java 烂猪皮 』,不定期分享原创知识。

  • 同时可以期待后续文章 ing🚀

  • 关注后回复【666】扫码即可获取学习资料包



原文作者:挖坑的张师傅

文章出处:https://club.perfma.com/article/2342623

用户头像

AI乔治

关注

分享后端技术干货。公众号【 Java烂猪皮】 2019.06.30 加入

一名默默无闻的扫地僧!

评论

发布
暂无评论
MySQL 死锁套路:一次诡异的批量插入死锁问题分析