写点什么

DolphinScheduler 调度变慢?试试这些数据库性能优化策略

作者:白鲸开源
  • 2025-06-24
    天津
  • 本文字数:2831 字

    阅读完需:约 9 分钟

DolphinScheduler 调度变慢?试试这些数据库性能优化策略

问题背景

DolphinScheduler 1.3.9 版本


查询任务实例列表等接口时,有时会出现响应慢的情况,超过 20 秒才返回结果。

问题诊断

(1)mysql 开启慢日志

/etc/mysql.cnf 添加如下配置参数


slow_query_log = ON slow_query_log_file = /data/log/mysql/mysql-slow.log long_query_time = 2
复制代码

(2)慢日志分析

日志中以下几类 sql 超时频率较高:


select * from t_ds_process_instance where 1=1 and state in ( 0 , 1 , 2 , 4 ) and process_definition_id=71 and (schedule_time >= '2022-07-20 00:00:00' and schedule_time <= '2022-07-20 23:59:59.999' or start_time >= '2022-07-20 00:00:00' and start_time <= '2022-07-20 23:59:59.999') order by start_time desc limit 1;
复制代码


select instance.*,process.name as process_instance_name from t_ds_task_instance instance left join t_ds_process_definition define on instance.process_definition_id = define.id left join t_ds_process_instance process on process.id=instance.process_instance_id where define.project_id = 6 order by instance.start_time desc LIMIT 0,10;
复制代码


主要涉及t_ds_process_instancet_ds_task_instance两张表。


通过explain analyze命令分析,可以发现第一类 sql 使用start_time_index这个索引查询的过程中耗时较多,start_time_index区分度较差,需要有更好区分度的索引来提升查询速度;


-> Limit: 1 row(s) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1)     -> Filter: ((t_ds_process_instance.process_definition_id = 18) and (((t_ds_process_instance.schedule_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.schedule_time <= TIMESTAMP'2022-07-18 23:59:59.999')) or ((t_ds_process_instance.start_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.start_time <= TIMESTAMP'2022-07-18 23:59:59.999')))) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1)         -> Index scan on t_ds_process_instance using start_time_index (reverse) (cost=0.20 rows=2) (actual time=0.030..103.165 rows=1239 loops=1)
复制代码


第二类 sql 主要耗时在 left join 阶段,需要更新索引字段


-> Limit: 10 row(s) (actual time=3601.141..3601.147 rows=10 loops=1)     -> Sort row IDs: `instance`.start_time DESC, limit input to 10 row(s) per chunk (actual time=3601.140..3601.145 rows=10 loops=1)         -> Table scan on <temporary> (cost=0.01..3774.21 rows=301738) (actual time=0.002..190.179 rows=722743 loops=1)             -> Temporary table (cost=469419.96..473194.16 rows=301738) (actual time=3332.896..3551.716 rows=722743 loops=1)                 -> Nested loop left join (cost=439246.15 rows=301738) (actual time=0.051..1431.254 rows=722743 loops=1)                     -> Nested loop inner join (cost=107334.40 rows=301738) (actual time=0.045..1239.699 rows=722743 loops=1)                         -> Filter: (define.id is not null) (cost=12.40 rows=121) (actual time=0.025..0.162 rows=121 loops=1)                             -> Index lookup on define using process_definition_index (project_id=6) (cost=12.40 rows=121) (actual time=0.025..0.110 rows=121 loops=1)                         -> Index lookup on instance using task_instance_index (process_definition_id=define.id) (cost=639.65 rows=2494) (actual time=0.006..9.972 rows=5973 loops=121)                     -> Single-row index lookup on process using PRIMARY (id=`instance`.process_instance_id) (cost=1.00 rows=1) (actual time=0.000..0.000 rows=1 loops=722743)
复制代码

(3)问题定位

  • mysql 配置未优化当前使用的 mysql 配置均为默认配置,对查询影响较大的配置项如:


innodb_buffer_pool_size(InnoDB 使用一个缓冲池来保存索引和原始数据)。


innodb_thread_concurrency(设置为 0,表示不限制并发数,这里推荐设置为 0,更好去发挥 CPU 多核处理能力,提高并发量)


innodb_write_io_threads & innodb_read_io_threads (innodb 使用后台线程处理数据页上的读写 I/O 请求)需要调整以上参数,提高 mysql 性能。


  • DolphinScheduler 索引未优化当前慢查询报出的 SQL 的索引需要优化。

  • DolphinScheduler 单表数据较大当前慢查询涉及的 SQL 表(t_ds_task_instance)较大,数据大小超过 4G,数据条数超过 200w,需要切分归档。

优化内容

(1)大表归档

  • 创建备份表


根据t_ds_process_instance DDL建表语句复制表t_ds_process_instance_bak


根据t_ds_task_instance DDL建表语句复制表 t_ds_task_instance_bak


注:外键行删除


CONSTRAINT foreign_key_instance_id FOREIGN KEY (process_instance_id) REFERENCES t_ds_process_instance (id) ON DELETE CASCADE ON UPDATE RESTRICT


  • 将归档日期以前的数据从两张表中挪至备份表,如 2022-01-01 00:00:00


INSERT into t_ds_process_instance_copy select * from t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00';
复制代码


INSERT into t_ds_task_instance_copy select * from t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00';
复制代码


  • 比对是否备份成功


select count(*) from t_ds_process_instance_copy; select count(*) from t_ds_process_instance where start_time < '2022-01-01 00:00:00'; select count(*) from t_ds_task_instance_copy; select count(*) from t_ds_task_instance where start_time < '2022-01-01 00:00:00';
复制代码


  • 删除历史数据


DELETE FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; DELETE FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00';
复制代码

(2)参数优化

/etc/my.cnf mysql配置文件中调整如下参数并重启 mysql


innodb_log_buffer_size= 64M innodb_buffer_pool_size= 20G //(根据服务器内存适当调整)如果数据库独占一台机器则设置物理内存的70% innodb_log_file_size= 1G innodb_thread_concurrency = 0 join_buffer_size = 64M sort_buffer_size = 64M innodb_read_io_threads = 16 //(根据服务器CPU适当调整) innodb_write_io_threads = 16 //(根据服务器CPU适当调整)
复制代码

(3)索引优化

create index state_index on t_ds_process_instance(state, process_definition_id); create index start_time_process_definition_index on t_ds_task_instance(start_time, process_definition_id); alter table t_ds_task_instance drop index task_instance_index;
复制代码


转载自 daozi126


原文链接:https://blog.csdn.net/u012938208/article/details/147899920

用户头像

白鲸开源

关注

一家开源原生的DataOps商业公司。 2022-03-18 加入

致力于打造下一代开源原生的DataOps 平台,助力企业在大数据和云时代,智能化地完成多数据源、多云及信创环境的数据集成、调度开发和治理,以提高企业解决数据问题的效率,提升企业分析洞察能力和决策能力。

评论

发布
暂无评论
DolphinScheduler 调度变慢?试试这些数据库性能优化策略_开源_白鲸开源_InfoQ写作社区