数仓调优实践丨多次关联发散导致数据爆炸案例分析改写
- 2023-12-12 广东
本文字数:7255 字
阅读完需:约 24 分钟
本文分享自华为云社区《GaussDB(DWS)性能调优:求字段全体值中大于本行值的最小值——多次关联发散导致数据爆炸案例分析改写》,作者: Zawami 。
1、【问题描述】
语句中存在同一个表多次自关联,且均为发散关联,数据爆炸导致性能瓶颈。
2、【原始 SQL】
explain verbose
WITH TMP AS
(
SELECT WH_ID
, (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
, (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || '23:59:59')::TIMESTAMP AS MAX_ASD
FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
WHERE IS_OPEN = 'Y'
AND STOP_TIME IS NOT NULL
)
SELECT T1.WH_ID
, T1.THE_DATE
, T1.IS_OPEN
, MIN(T2.STOP_TIME) AS STOP_TIME
, MIN(T2.MAX_ASD) AS TODAY_MAX_ASD
, MIN(T3.MAX_ASD) AS NEXT_MAX_ASD
FROM (SELECT WH_ID
, THE_DATE
, IS_OPEN
, (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
) T1
LEFT JOIN TMP T2
ON T1.WH_ID = T2.WH_ID
AND T1.THE_DATE < T2.STOP_TIME
LEFT JOIN TMP T3
ON T1.WH_ID = T3.WH_ID
AND ADDDATE(T1.THE_DATE,1) < T3.STOP_TIME
GROUP BY T1.WH_ID, T1.THE_DATE, T1.IS_OPEN;
从 SQL 中不难看出,物理表 HOLIDAY_D 使用 WH_ID 为关联键,并使用其它字段做不等值关联。
3、【性能分析】
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
id | operation | E-rows | E-distinct | E-memory | E-width | E-costs |
----+----------------------------------------------------------------------------------+---------------+------------+---------------+---------+----------------- |
1 | -> Row Adapter | 51584 | | | 67 | 377559930171.36 |
2 | -> Vector Streaming (type: GATHER) | 51584 | | | 67 | 377559930171.36 |
3 | -> Vector Hash Aggregate | 51584 | | 16MB | 67 | 377559929546.36 |
4 | -> Vector CTE Append(5, 7) | 5699739636332 | | 1MB | 43 | 292063834485.54 |
5 | -> Vector Streaming(type: BROADCAST) | 757752 | | 2MB | 22 | 1474.87 |
6 | -> CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d [5, CTE tmp(1)] | 757752 | | 1MB | 22 | 1474.87 |
7 | -> Vector Hash Left Join (8, 11) | 5699739636332 | | 107MB(6863MB) | 43 | 292063833010.67 |
8 | -> Vector Hash Right Join (9, 10) | 542231841 | 50 | 16MB | 27 | 22365789.31 |
9 | -> Vector CTE Scan on tmp(1) t3 | 31573 | 50 | 1MB | 48 | 15155.04 |
10 | -> CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d | 51584 | 50 | 1MB | 19 | 556.58 |
11 | -> Vector CTE Scan on tmp(1) t2 | 31573 | 50 | 1MB | 48 | 15155.04 |
由于 SQL 非常慢,难以打出 performance 计划,我们先看 verbose 计划。从计划中我们看到,经过两次的关联发散,估计数据量达到了 5 万亿行;因为 hash join 根据 WH_ID 列进行关联,实际不会有这么多。所以调优的思路就是取消一些发散,让中间结果集行数变少。
4、【改写 SQL】
分析 SQL,可知发散是为了寻找所有 STOP_TIME 中大于本行 THE_DATE 的最小值。像这种每行都需要用到本行数据和所有数据的逻辑,或许可以使用窗口函数进行编写;但囿于笔者能力,先提供单次自关联的方法。
SQL 改写如下:
explain performance
WITH TMP AS
(
SELECT WH_ID
, (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || STOP_TIME)::TIMESTAMP AS STOP_TIME
, (IFNULL(SUBSTR(THE_DATE,1,10),'1900-01-01') || ' ' || '23:59:59')::TIMESTAMP AS MAX_ASD
FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
WHERE IS_OPEN = 'Y'
AND STOP_TIME IS NOT NULL
)
SELECT T1.WH_ID
, T1.THE_DATE
, T1.IS_OPEN
, MIN(CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN STOP_TIME ELSE NULL END) AS STOP_TIME
, MIN(CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END) AS TODAY_MAX_ASD
, MIN(CASE WHEN ADDDATE(T1.THE_DATE, 1) < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END) AS NEXT_MAX_ASD
FROM (SELECT DISTINCT WH_ID
, THE_DATE
, IS_OPEN
FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D
) T1
LEFT JOIN TMP T2
ON T1.WH_ID = T2.WH_ID
GROUP BY
T1.WH_ID
, T1.THE_DATE
, T1.IS_OPEN
;
经过改写,取消了一次自关联,SQL 的中间结果集变小。在关联后,通过条件聚合来得到需要的值。
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs
----+-----------------------------------------------------------------+----------------------+----------+--------+------------+----------------+----------+-----------+---------+----------
1 | -> Row Adapter | 7490.354 | 34035 | 200 | | 70KB | | | 58 | 15149.80
2 | -> Vector Streaming (type: GATHER) | 7488.129 | 34035 | 200 | | 216KB | | | 58 | 15149.80
3 | -> Vector Hash Aggregate | [7481.430, 7481.430] | 34035 | 200 | | [9MB, 9MB] | 16MB | [112,112] | 58 | 15137.30
4 | -> Vector Hash Left Join (5, 7) | [909.377, 909.377] | 31204164 | 109803 | | [2MB, 2MB] | 16MB | | 34 | 3880.50
5 | -> Vector Sonic Hash Aggregate | [5.876, 5.876] | 34035 | 34036 | 6807 | [3MB, 3MB] | 16MB | [51,51] | 18 | 1127.67
6 | -> CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d | [0.199, 0.199] | 34036 | 34036 | | [792KB, 792KB] | 1MB | | 18 | 532.04
7 | -> CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d | [40.794, 40.794] | 25122 | 21960 | 19 | [1MB, 1MB] | 1MB | [59,59] | 24 | 617.13
从执行计划中可以看到,中间结果集大小已经在可接受的范围内。但是又看到聚合 3 千万数据使用了 6s+的时间,这是过慢的,需要看执行计划中的 DN 信息寻找原因 。
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 --Row Adapter
(actual time=7486.498..7490.354 rows=34035 loops=1)
(CPU: ex c/r=107, ex row=34035, ex cyc=3668104, inc cyc=22468059912)
2 --Vector Streaming (type: GATHER)
(actual time=7486.466..7488.129 rows=34035 loops=1)
(Buffers: shared hit=1)
(CPU: ex c/r=660037, ex row=34035, ex cyc=22464391808, inc cyc=22464391808)
3 --Vector Hash Aggregate
dn_6083_6084 (actual time=7479.644..7481.430 rows=34035 loops=1) (projection time=4488.807)
dn_6083_6084 (Buffers: shared hit=40)
dn_6083_6084 (CPU: ex c/r=631, ex row=31204164, ex cyc=19718763112, inc cyc=22443886288)
4 --Vector Hash Left Join (5, 7)
dn_6083_6084 (actual time=48.009..909.377 rows=31204164 loops=1)
dn_6083_6084 (Buffers: shared hit=36)
dn_6083_6084 (CPU: ex c/r=43699, ex row=59157, ex cyc=2585141400, inc cyc=2725123176)
5 --Vector Sonic Hash Aggregate
dn_6083_6084 (actual time=5.177..5.876 rows=34035 loops=1)
dn_6083_6084 (Buffers: shared hit=11)
dn_6083_6084 (CPU: ex c/r=500, ex row=34036, ex cyc=17027544, inc cyc=17619064)
6 --CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d
dn_6083_6084 (actual time=0.043..0.199 rows=34036 loops=1) (CU ScanInfo: smallCu: 0, totalCu: 1, avrCuRow: 34036, totalDeadRows: 0)
dn_6083_6084 (Buffers: shared hit=11)
dn_6083_6084 (CPU: ex c/r=17, ex row=34036, ex cyc=591520, inc cyc=591520)
7 --CStore Scan on dmisc.dm_dim_cbg_wh_holiday_d
dn_6083_6084 (actual time=6.464..40.794 rows=25122 loops=1) (filter time=0.872 projection time=33.671) (RoughCheck CU: CUNone: 0, CUTagNone: 0, CUSome: 1) (CU ScanInfo: smallCu: 0, totalCu: 1, avrCuRow: 34036, totalDeadRows: 0)
dn_6083_6084 (Buffers: shared hit=25)
dn_6083_6084 (CPU: ex c/r=3595, ex row=34036, ex cyc=122362712, inc cyc=122362712)
从中可以看出,所有算子都只在一个 DN 上运行了。这可以视为严重的计算倾斜,若对单点性能有更高要求需要继续优化。查看 DMISC.DM_DIM_CBG_WH_HOLIDAY_D 表的定义,发现它是一个复制表(distribute by replication),在进行各层运算的时候只用其中一个 DN 来算。而在本 SQL 中,使用到这张表的时候,关联键都是 WH_ID。
再查看调整分布列为 WH_ID 的倾斜情况:
select * from pg_catalog.table_skewness('DMISC.DM_DIM_CBG_WH_HOLIDAY_D', 'wh_id');
结果有 23 行,小于集群 DN 个数,且存在倾斜。但是本 SQL 需要使用该表的全量数据,故可以把这张表改为使用 WH_ID 作为分步键进行重分布。
由表分布方式为复制表导致的计算倾斜无法使用 skew hint 解决,可以改变物理表分布方式或者创建临时表来解决(复制表通常较小)。由于表在 SQL 中的使用情况和表的倾斜情况,不适合更改物理表分步键为 WH_ID,故本例中试使用创建临时表指定重分布方式的办法解决。
DROP TABLE IF EXISTS holiday_d_tmp;
CREATE TEMP TABLE holiday_d_tmp WITH ( orientation = COLUMN, compression = low ) distribute BY hash ( wh_id ) AS ( SELECT * FROM DMISC.DM_DIM_CBG_WH_HOLIDAY_D );
EXPLAIN performance WITH TMP AS (
SELECT
WH_ID,
( IFNULL ( SUBSTR( THE_DATE, 1, 10 ), '1900-01-01' ) || ' ' || STOP_TIME ) :: TIMESTAMP AS STOP_TIME,
( IFNULL ( SUBSTR( THE_DATE, 1, 10 ), '1900-01-01' ) || ' ' || '23:59:59' ) :: TIMESTAMP AS MAX_ASD
FROM
holiday_d_tmp
WHERE
IS_OPEN = 'Y'
AND STOP_TIME IS NOT NULL
) SELECT
T1.WH_ID,
T1.THE_DATE,
T1.IS_OPEN,
MIN ( CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN STOP_TIME ELSE NULL END ) AS STOP_TIME,
MIN ( CASE WHEN T1.THE_DATE < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END ) AS TODAY_MAX_ASD,
MIN ( CASE WHEN ADDDATE ( T1.THE_DATE, 1 ) < T2.STOP_TIME THEN T2.MAX_ASD ELSE NULL END ) AS NEXT_MAX_ASD
FROM
( SELECT WH_ID, THE_DATE, IS_OPEN FROM holiday_d_tmp ) T1
LEFT JOIN TMP T2 ON T1.WH_ID = T2.WH_ID
GROUP BY
T1.WH_ID,
T1.THE_DATE,
T1.IS_OPEN;
下面是对应的执行计划:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs
----+--------------------------------------------------------------------------------------+------------------+----------+----------+------------+----------------+----------+---------+---------+----------
1 | -> Row Adapter | 673.495 | 34035 | 34032 | | 70KB | | | 58 | 68112.60
2 | -> Vector Streaming (type: GATHER) | 671.103 | 34035 | 34032 | | 216KB | | | 58 | 68112.60
3 | -> Vector Hash Aggregate | [0.079, 672.724] | 34035 | 34032 | | [1MB, 1MB] | 16MB | [0,114] | 58 | 67794.10
4 | -> Vector Hash Left Join (5, 6) | [0.047, 76.395] | 31205167 | 27587201 | | [324KB, 485KB] | 16MB | | 34 | 8876.88
5 | -> CStore Scan on pg_temp_cn_5003_6_22022_139764371019520.holiday_d_tmp | [0.004, 0.098] | 34036 | 34036 | 1 | [760KB, 792KB] | 1MB | | 18 | 1553.65
6 | -> CStore Scan on pg_temp_cn_5003_6_22022_139764371019520.holiday_d_tmp | [0.008, 3.253] | 25122 | 22018 | 1 | [880KB, 1MB] | 1MB | [0,61] | 24 | 1557.76
从计划中我们可以看到,耗时比单个 DN 运算快了不少,当然这里没有算上创建临时表的时间约 0.2s。
5、【调优总结】
在本案例中,因为实际执行 SQL 时间太长先看了 verbose 计划而非 performance 计划,发现中间结果集发散问题后,进行等价逻辑改写,把两个(等值-不等值)关联改为一个等值关联和条件聚合。之后,我们发现 SQL 因复制表存在计算倾斜问题,考虑 SQL 消费表数据的方式和表的统计数据,采用了使用临时表重新指定分布方式的方法,解决了计算倾斜问题,SQL 从单点 25min+优化到单点 800ms。
版权声明: 本文为 InfoQ 作者【华为云开发者联盟】的原创文章。
原文链接:【http://xie.infoq.cn/article/3ae4c1873c645f41da0260b53】。文章转载请联系作者。
华为云开发者联盟
提供全面深入的云计算技术干货 2020-07-14 加入
生于云,长于云,让开发者成为决定性力量
评论