GaussDB(DWS) 运维:导致 SQL 执行不下推的改写方案
作者:华为云开发者联盟
- 2023-03-21 广东
本文字数:3538 字
阅读完需:约 12 分钟
本文分享自华为云社区《GaussDB(DWS)运维 -- values子句做MERGE数据源导致SQL执行不下推的改写方案》,作者: 譡里个檔。
现网做实时接入的时候,有的时候会使用 MERGE INTO 语句实现类似 UPSERT 的功能。这种场景下 MERGE INTO 语句的 USING 部分的数据位 VALUES 子句,为了后续的 SQL 语句中描述方便,需要对 VALUES 子句的输出命名别名。USING 子句的书写方式可能导致 MERGE INTO 语句的执行不下推,本文就针对因此导致的不下推的场景,对 USING 子句的 SQL 语句进行改写,一遍整个 SQL 语句可以下推
预置条件
CREATE TABLE t1(name text, id INT) DISTRIBUTE BY HASH(id);
复制代码
原始语句
MERGE INTO t1 USING (
SELECT *
FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)
) tmp ON (t1.id = tmp.id)
WHEN MATCHED THEN
UPDATE SET t1.name = tmp.name
WHEN NOT MATCHED THEN
INSERT (name, id) VALUES(tmp.name, tmp.id);
复制代码
SQL 语句不下推,导致执行低效
postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (
postgres(# SELECT *
postgres(# FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)
postgres(# ) tmp ON (t1.id = tmp.id)
postgres-# WHEN MATCHED THEN
postgres-# UPDATE SET t1.name = tmp.name
postgres-# WHEN NOT MATCHED THEN
postgres-# INSERT (name, id) VALUES(tmp.name, tmp.id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | operation | E-rows | E-distinct | E-width | E-costs
----+-------------------------------------------------------+--------+------------+---------+---------
1 | -> Merge on public.t1 | 2 | | 54 | 0.08
2 | -> Nested Loop Left Join (3, 4) | 2 | | 54 | 0.08
3 | -> Values Scan on "*VALUES*" | 2 | | 36 | 0.03
4 | -> Data Node Scan on t1 "_REMOTE_TABLE_QUERY_" | 2 | | 18 | 0.00
SQL Diagnostic Information
------------------------------------------------------------
SQL is not plan-shipping
reason: Type of Record in non-real table can not be shipped
Predicate Information (identified by plan id)
-------------------------------------------------
1 --Merge on public.t1
Node expr: : $10
2 --Nested Loop Left Join (3, 4)
Join Filter: (t1.id = "*VALUES*".column2)
Targetlist Information (identified by plan id)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
1 --Merge on public.t1
Node/s: All datanodes
Remote query: UPDATE ONLY public.t1 SET name = $7, id = $8 WHERE t1.ctid = $5 AND t1.xc_node_id = $6
Node/s: All datanodes
Remote query: INSERT INTO public.t1 (name, id) VALUES ($9, $10)
2 --Nested Loop Left Join (3, 4)
Output: "*VALUES*".column1, "*VALUES*".column2, t1.name, t1.id, t1.ctid, t1.xc_node_id, "*VALUES*".column1, t1.id, "*VALUES*".column1, "*VALUES*".column2
3 --Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
4 --Data Node Scan on t1 "_REMOTE_TABLE_QUERY_"
Output: t1.name, t1.id, t1.ctid, t1.xc_node_id
Node/s: All datanodes
Remote query: SELECT name, id, ctid, xc_node_id FROM ONLY public.t1 WHERE true
====== Query Summary =====
--------------------------
Parser runtime: 0.079 ms
Planner runtime: 1.392 ms
Unique SQL Id: 1657855173
(40 rows)
复制代码
改写方案
MERGE INTO t1 USING (
WITH val(name, id) AS(
VALUES ('json', 1), ('sam', 2)
)
SELECT * FROM val
) tmp ON (t1.id = tmp.id)
WHEN MATCHED THEN
UPDATE SET t1.name = tmp.name
WHEN NOT MATCHED THEN
INSERT (name, id) VALUES(tmp.name, tmp.id);
复制代码
改写后下推
postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (
postgres(# WITH val(name, id) AS(
postgres(# VALUES ('json', 1), ('sam', 2)
postgres(# )
postgres(# SELECT * FROM val
postgres(# ) tmp ON (t1.id = tmp.id)
postgres-# WHEN MATCHED THEN
postgres-# UPDATE SET t1.name = tmp.name
postgres-# WHEN NOT MATCHED THEN
postgres-# INSERT (name, id) VALUES(tmp.name, tmp.id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
id | operation | E-rows | E-distinct | E-memory | E-width | E-costs
----+----------------------------------------------+--------+------------+----------+---------+---------
1 | -> Streaming (type: GATHER) | 1 | | | 54 | 1.56
2 | -> Merge on public.t1 | 2 | | | 54 | 1.15
3 | -> Streaming(type: REDISTRIBUTE) | 2 | | 2MB | 54 | 1.15
4 | -> Nested Loop Left Join (5, 7) | 2 | | 1MB | 54 | 1.11
5 | -> Subquery Scan on tmp | 2 | | 1MB | 36 | 0.08
6 | -> Values Scan on "*VALUES*" | 24 | | 1MB | 36 | 0.03
7 | -> Seq Scan on public.t1 | 2 | | 1MB | 18 | 1.01
Predicate Information (identified by plan id)
---------------------------------------------
4 --Nested Loop Left Join (5, 7)
Join Filter: (t1.id = tmp.id)
5 --Subquery Scan on tmp
Filter: (Hash By tmp.id)
Targetlist Information (identified by plan id)
----------------------------------------------------------------------------------------------------------------------------------------------------
1 --Streaming (type: GATHER)
Node/s: All datanodes
3 --Streaming(type: REDISTRIBUTE)
Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)
Distribute Key: (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)
Spawn on: All datanodes
Consumer Nodes: All datanodes
4 --Nested Loop Left Join (5, 7)
Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END
5 --Subquery Scan on tmp
Output: tmp.name, tmp.id
6 --Values Scan on "*VALUES*"
Output: "*VALUES*".column1, "*VALUES*".column2
7 --Seq Scan on public.t1
Output: t1.name, t1.id, t1.ctid, t1.xc_node_id
Distribute Key: t1.id
====== Query Summary =====
-------------------------------
System available mem: 3112960KB
Query Max mem: 3112960KB
Query estimated mem: 6336KB
Parser runtime: 0.107 ms
Planner runtime: 1.185 ms
Unique SQL Id: 780461632
(44 rows)
复制代码
划线
评论
复制
发布于: 刚刚阅读数: 3
版权声明: 本文为 InfoQ 作者【华为云开发者联盟】的原创文章。
原文链接:【http://xie.infoq.cn/article/e822a1c536fe58c979f6b8eb5】。文章转载请联系作者。
华为云开发者联盟
关注
提供全面深入的云计算技术干货 2020-07-14 加入
生于云,长于云,让开发者成为决定性力量
评论