从策略和实践，带你掌握死锁检测

2023-10-16
广东
本文字数：3787 字
阅读完需：约 12 分钟

本文分享自华为云社区《掌握死锁检测：策略和最佳实践》，作者： Lion Long。

一、背景：死锁产生原因

死锁，是指多个线程或者进程在运行过程中因争夺资源而造成的一种僵局，当进程或者线程处于这种僵持状态，若无外力作用，它们将无法再向前推进。如下图所示，线程 A 想获取线程 B 的锁，线程 B 想获取线程 C 的锁，线程 C 想获取线程 D 的锁，线程 D 想获取线程 A 的锁，从而构建了一个资源获取环。

如果有两个及以上的 CPU 占用率达到 100%时，极可能是程序进入死锁状态。

死锁的存在是因为有资源获取环的存在，所以只要能检测出资源获取环，就等同于检测出死锁的存在。

1.1、构建一个死锁

#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <pthread.h>pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex3 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex4 = PTHREAD_MUTEX_INITIALIZER;
void *thread_funcA(void *arg){	pthread_mutex_lock(&mutex1);	sleep(1);	pthread_mutex_lock(&mutex2);
	printf("funcA --> \n");
	pthread_mutex_unlock(&mutex2);	pthread_mutex_unlock(&mutex1);}
void *thread_funcB(void *arg){	pthread_mutex_lock(&mutex2);	sleep(1);	pthread_mutex_lock(&mutex3);
	printf("funcB --> \n");
	pthread_mutex_unlock(&mutex3);	pthread_mutex_unlock(&mutex2);}
void *thread_funcC(void *arg){	pthread_mutex_lock(&mutex3);	sleep(1);	pthread_mutex_lock(&mutex4);
	printf("funcC --> \n");
	pthread_mutex_unlock(&mutex4);	pthread_mutex_unlock(&mutex3);}
void *thread_funcD(void *arg){	pthread_mutex_lock(&mutex4);	sleep(1);	pthread_mutex_lock(&mutex1);
	printf("funcD --> \n");
	pthread_mutex_unlock(&mutex1);	pthread_mutex_unlock(&mutex4);}int main(){	pthread_t tid[4] = { 0 };		pthread_create(&tid[0], NULL, thread_funcA, NULL);	pthread_create(&tid[1], NULL, thread_funcB, NULL);	pthread_create(&tid[2], NULL, thread_funcC, NULL);	pthread_create(&tid[3], NULL, thread_funcD, NULL);
	pthread_join(tid[0], NULL);	pthread_join(tid[1], NULL);	pthread_join(tid[2], NULL);	pthread_join(tid[3], NULL);
	return 0;}

复制代码

二、使用 hook 检测死锁

hook 使用场景：

（1）实现自己的协议栈，通过 hook posix api。

2.1、dlsym()函数

获取共享对象或可执行文件中符号的地址。

函数原型：

#include <dlfcn.h>
void *dlsym(void *handle, const char *symbol);
#define _GNU_SOURCE#include <dlfcn.h>
void *dlvsym(void *handle, char *symbol, char *version);
// Link with -ldl.

复制代码

描述：

函数 dlsym()接受 dlopen()返回的动态加载共享对象的“句柄”以及以空结尾的符号名，并返回该符号加载到内存中的地址。如果在指定对象或加载对象时 dlopen()自动加载的任何共享对象中找不到该符号，dlsym()将返回 NULL。（dlsym()执行的搜索是通过这些共享对象的依赖关系树进行的广度优先搜索。）

由于符号的值实际上可能是 NULL（因此，dlsym()的 NULL 返回值不必指示错误），因此测试错误的正确方法是调用 dlerror()以清除任何旧的错误条件，然后调用 dlsym。

handle 中可以指定两个特殊的伪句柄：

函数 dlvsym（）的作用与 dlsym（）相同，但使用版本字符串作为附加参数。

返回值：

成功时，这些函数返回与符号关联的地址。

失败时，返回 NULL；可以使用 dlerror()诊断错误的原因。

2.2、pthread_self()函数

获取调用线程的 ID。

函数原型：

#include <pthread.h>
pthread_t pthread_self(void);
// Compile and link with -pthread.

复制代码

说明：

函数的作用是返回调用线程的 ID。这与创建此线程的 pthread_create()调用中*thread 中返回的值相同。

返回值：

此函数始终成功，返回调用线程的 ID。

2.3、实现步骤

（1）构建函数指针

（2）定义与目标函数一样的类型

typedef int(*pthread_mutex_lock_t)(pthread_mutex_t *mutex);typedef int(*pthread_mutex_unlock_t)(pthread_mutex_t *mutex);
pthread_mutex_lock_t	pthread_mutex_lock_f;pthread_mutex_unlock_t	pthread_mutex_unlock_f;（3）具体函数实现，函数名与目标函数名一致
int pthread_mutex_lock(pthread_mutex_t *mutex){	pthread_t selfid = pthread_self();
	printf("pthread_mutex_lock: %ld, %p\n", selfid, mutex);	// ...	return 0;}
int pthread_mutex_unlock(pthread_mutex_t *mutex){	pthread_t selfid = pthread_self();
	printf("pthread_mutex_unlock: %ld, %p\n", selfid, mutex);	// ...	return 0;}（4）调用dlsym()函数，即钩子。
int init_hook(){	pthread_mutex_lock_f = dlsym(RTLD_NEXT, "pthread_mutex_lock");	pthread_mutex_unlock_f = dlsym(RTLD_NEXT, "pthread_mutex_unlock");	// ...	return 0;}

复制代码

2.4、示例代码

#define _GNU_SOURCE#include <dlfcn.h>#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <pthread.h>typedef int(*pthread_mutex_lock_t)(pthread_mutex_t *mutex);typedef int(*pthread_mutex_unlock_t)(pthread_mutex_t *mutex);
pthread_mutex_lock_t	pthread_mutex_lock_f;pthread_mutex_unlock_t	pthread_mutex_unlock_f;
int pthread_mutex_lock(pthread_mutex_t *mutex){	pthread_t selfid = pthread_self();
	pthread_mutex_lock_f(mutex);	printf("pthread_mutex_lock: %ld, %p\n", selfid, mutex);		return 0;}
int pthread_mutex_unlock(pthread_mutex_t *mutex){	pthread_t selfid = pthread_self();
	pthread_mutex_unlock_f(mutex);	printf("pthread_mutex_unlock: %ld, %p\n", selfid, mutex);
	return 0;}
int init_hook(){	pthread_mutex_lock_f = dlsym(RTLD_NEXT, "pthread_mutex_lock");	pthread_mutex_unlock_f = dlsym(RTLD_NEXT, "pthread_mutex_unlock");	return 0;}
#if 1 // debug
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex3 = PTHREAD_MUTEX_INITIALIZER;pthread_mutex_t mutex4 = PTHREAD_MUTEX_INITIALIZER;
void *thread_funcA(void *arg){	pthread_mutex_lock(&mutex1);	sleep(1);	pthread_mutex_lock(&mutex2);
	printf("funcA --> \n");
	pthread_mutex_unlock(&mutex2);	pthread_mutex_unlock(&mutex1);}
void *thread_funcB(void *arg){	pthread_mutex_lock(&mutex2);	sleep(1);	pthread_mutex_lock(&mutex3);
	printf("funcB --> \n");
	pthread_mutex_unlock(&mutex3);	pthread_mutex_unlock(&mutex2);}
void *thread_funcC(void *arg){	pthread_mutex_lock(&mutex3);	sleep(1);	pthread_mutex_lock(&mutex4);
	printf("funcC --> \n");
	pthread_mutex_unlock(&mutex4);	pthread_mutex_unlock(&mutex3);}
void *thread_funcD(void *arg){	pthread_mutex_lock(&mutex4);	sleep(1);	pthread_mutex_lock(&mutex1);
	printf("funcD --> \n");
	pthread_mutex_unlock(&mutex1);	pthread_mutex_unlock(&mutex4);}int main(){
	init_hook();
	pthread_t tid[4] = { 0 };		pthread_create(&tid[0], NULL, thread_funcA, NULL);	pthread_create(&tid[1], NULL, thread_funcB, NULL);	pthread_create(&tid[2], NULL, thread_funcC, NULL);	pthread_create(&tid[3], NULL, thread_funcD, NULL);
	pthread_join(tid[0], NULL);	pthread_join(tid[1], NULL);	pthread_join(tid[2], NULL);	pthread_join(tid[3], NULL);
	return 0;}
#endif

复制代码

缺点：这种方式在少量锁情况下还可以分析，在大量锁使用的情况，分析过程极为困难。

三、使用图算法检测死锁

死锁检测可以利用图算法，检测有向图是否有环。

3.1、图的构建

（1）矩阵

（2）邻接表

数据结构原理示意图：

“图”连接：

3.2、图的使用

先新增节点再新增边。

（1）每创建一个线程，新增一个节点；注意，不是线程创建的时候就要加节点（有些线程不会用到锁），而是线程调用锁（以互斥锁为例，pthread_mutex_lock() ）的时候才添加节点。

（2）线程加锁（以互斥锁为例，pthread_mutex_lock() ）的时候，并且检测到锁已经占用，则新增一条边。

（3）移除边，调用锁（以互斥锁为例，pthread_mutex_lock() ）前，如果此时锁没有被占用，并且该边存在，则移除边。

（4）移除节点是在解锁之后。

三个原语操作：

（1）加锁之前的操作，lock_before()；

（2）加锁之后的操作，lock_after()；

（3）解锁之后的操作，unlock_after()；

3.3、示例代码

代码比较长，为了避免篇幅较长，不利于阅读，这里没有贴上。如果需要，可以联系博主，或者关注微信公众号《Lion 莱恩呀》获取。

总结

死锁的产生是因为多线程之间存在交叉申请锁的情况，因争夺资源而造成的一种僵局。hook 使用：

（1）定义与目标函数一样的类型；

（2）具体函数实现，函数名与目标函数名一致；

（3）调用 dlsym()函数，初始化 hook。

死锁检测可以使用图算法，通过检测有向图是否有环判断是否有死锁。

点击关注，第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/0279487d31039896d99c74b02】。文章转载请联系作者。

华为云开发者联盟

关注

提供全面深入的云计算技术干货 2020-07-14 加入

生于云，长于云，让开发者成为决定性力量

发布

暂无评论

创作场景

从策略和实践，带你掌握死锁检测

一、背景：死锁产生原因

1.1、构建一个死锁

二、使用 hook 检测死锁

2.1、dlsym()函数

2.2、pthread_self()函数

2.3、实现步骤

2.4、示例代码

三、使用图算法检测死锁

3.1、图的构建

3.2、图的使用

3.3、示例代码

总结

华为云开发者联盟

评论