写点什么

GaussDB(DWS) 运维:导致 SQL 执行不下推的改写方案

  • 2023-03-21
    广东
  • 本文字数:3538 字

    阅读完需:约 12 分钟

GaussDB(DWS)运维:导致SQL执行不下推的改写方案

本文分享自华为云社区《GaussDB(DWS)运维 -- values子句做MERGE数据源导致SQL执行不下推的改写方案》,作者: 譡里个檔。


现网做实时接入的时候,有的时候会使用 MERGE INTO 语句实现类似 UPSERT 的功能。这种场景下 MERGE INTO 语句的 USING 部分的数据位 VALUES 子句,为了后续的 SQL 语句中描述方便,需要对 VALUES 子句的输出命名别名。USING 子句的书写方式可能导致 MERGE INTO 语句的执行不下推,本文就针对因此导致的不下推的场景,对 USING 子句的 SQL 语句进行改写,一遍整个 SQL 语句可以下推


预置条件


CREATE TABLE t1(name text, id INT) DISTRIBUTE BY HASH(id);
复制代码


原始语句


MERGE INTO t1 USING (    SELECT *    FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)) tmp ON (t1.id = tmp.id)WHEN MATCHED THEN    UPDATE SET t1.name = tmp.nameWHEN NOT MATCHED THEN    INSERT (name, id) VALUES(tmp.name, tmp.id);
复制代码


SQL 语句不下推,导致执行低效


postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (postgres(#     SELECT *postgres(#     FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)postgres(# ) tmp ON (t1.id = tmp.id)postgres-# WHEN MATCHED THENpostgres-#     UPDATE SET t1.name = tmp.namepostgres-# WHEN NOT MATCHED THENpostgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);                                                                            QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------------------------------------  id |                       operation                       | E-rows | E-distinct | E-width | E-costs ----+-------------------------------------------------------+--------+------------+---------+---------   1 | ->  Merge on public.t1                                |      2 |            |      54 | 0.08   2 |    ->  Nested Loop Left Join (3, 4)                   |      2 |            |      54 | 0.08   3 |       ->  Values Scan on "*VALUES*"                   |      2 |            |      36 | 0.03   4 |       ->  Data Node Scan on t1 "_REMOTE_TABLE_QUERY_" |      2 |            |      18 | 0.00
SQL Diagnostic Information ------------------------------------------------------------ SQL is not plan-shipping reason: Type of Record in non-real table can not be shipped
Predicate Information (identified by plan id) ------------------------------------------------- 1 --Merge on public.t1 Node expr: : $10 2 --Nested Loop Left Join (3, 4) Join Filter: (t1.id = "*VALUES*".column2)
Targetlist Information (identified by plan id) ----------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 --Merge on public.t1 Node/s: All datanodes Remote query: UPDATE ONLY public.t1 SET name = $7, id = $8 WHERE t1.ctid = $5 AND t1.xc_node_id = $6 Node/s: All datanodes Remote query: INSERT INTO public.t1 (name, id) VALUES ($9, $10) 2 --Nested Loop Left Join (3, 4) Output: "*VALUES*".column1, "*VALUES*".column2, t1.name, t1.id, t1.ctid, t1.xc_node_id, "*VALUES*".column1, t1.id, "*VALUES*".column1, "*VALUES*".column2 3 --Values Scan on "*VALUES*" Output: "*VALUES*".column1, "*VALUES*".column2 4 --Data Node Scan on t1 "_REMOTE_TABLE_QUERY_" Output: t1.name, t1.id, t1.ctid, t1.xc_node_id Node/s: All datanodes Remote query: SELECT name, id, ctid, xc_node_id FROM ONLY public.t1 WHERE true
====== Query Summary ===== -------------------------- Parser runtime: 0.079 ms Planner runtime: 1.392 ms Unique SQL Id: 1657855173(40 rows)
复制代码


改写方案


MERGE INTO t1 USING (    WITH val(name, id) AS(        VALUES ('json', 1), ('sam', 2)    )    SELECT * FROM val) tmp ON (t1.id = tmp.id)WHEN MATCHED THEN    UPDATE SET t1.name = tmp.nameWHEN NOT MATCHED THEN    INSERT (name, id) VALUES(tmp.name, tmp.id);
复制代码


改写后下推


postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (postgres(#     WITH val(name, id) AS(postgres(#         VALUES ('json', 1), ('sam', 2)postgres(#     )postgres(#     SELECT * FROM valpostgres(# ) tmp ON (t1.id = tmp.id)postgres-# WHEN MATCHED THENpostgres-#     UPDATE SET t1.name = tmp.namepostgres-# WHEN NOT MATCHED THENpostgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);                                                                      QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------  id |                  operation                   | E-rows | E-distinct | E-memory | E-width | E-costs ----+----------------------------------------------+--------+------------+----------+---------+---------   1 | ->  Streaming (type: GATHER)                 |      1 |            |          |      54 | 1.56   2 |    ->  Merge on public.t1                    |      2 |            |          |      54 | 1.15   3 |       ->  Streaming(type: REDISTRIBUTE)      |      2 |            | 2MB      |      54 | 1.15   4 |          ->  Nested Loop Left Join (5, 7)    |      2 |            | 1MB      |      54 | 1.11   5 |             ->  Subquery Scan on tmp         |      2 |            | 1MB      |      36 | 0.08   6 |                ->  Values Scan on "*VALUES*" |     24 |            | 1MB      |      36 | 0.03   7 |             ->  Seq Scan on public.t1        |      2 |            | 1MB      |      18 | 1.01
Predicate Information (identified by plan id) --------------------------------------------- 4 --Nested Loop Left Join (5, 7) Join Filter: (t1.id = tmp.id) 5 --Subquery Scan on tmp Filter: (Hash By tmp.id)
Targetlist Information (identified by plan id) ---------------------------------------------------------------------------------------------------------------------------------------------------- 1 --Streaming (type: GATHER) Node/s: All datanodes 3 --Streaming(type: REDISTRIBUTE) Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END) Distribute Key: (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END) Spawn on: All datanodes Consumer Nodes: All datanodes 4 --Nested Loop Left Join (5, 7) Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END 5 --Subquery Scan on tmp Output: tmp.name, tmp.id 6 --Values Scan on "*VALUES*" Output: "*VALUES*".column1, "*VALUES*".column2 7 --Seq Scan on public.t1 Output: t1.name, t1.id, t1.ctid, t1.xc_node_id Distribute Key: t1.id
====== Query Summary ===== ------------------------------- System available mem: 3112960KB Query Max mem: 3112960KB Query estimated mem: 6336KB Parser runtime: 0.107 ms Planner runtime: 1.185 ms Unique SQL Id: 780461632(44 rows)
复制代码

点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 3
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
GaussDB(DWS)运维:导致SQL执行不下推的改写方案_数据库_华为云开发者联盟_InfoQ写作社区