GaussDB(DWS) 运维：导致 SQL 执行不下推的改写方案

2023-03-21
广东
本文字数：3538 字
阅读完需：约 12 分钟

本文分享自华为云社区《GaussDB(DWS)运维 -- values子句做MERGE数据源导致SQL执行不下推的改写方案》，作者：譡里个檔。

现网做实时接入的时候，有的时候会使用 MERGE INTO 语句实现类似 UPSERT 的功能。这种场景下 MERGE INTO 语句的 USING 部分的数据位 VALUES 子句，为了后续的 SQL 语句中描述方便，需要对 VALUES 子句的输出命名别名。USING 子句的书写方式可能导致 MERGE INTO 语句的执行不下推，本文就针对因此导致的不下推的场景，对 USING 子句的 SQL 语句进行改写，一遍整个 SQL 语句可以下推

预置条件

CREATE TABLE t1(name text, id INT) DISTRIBUTE BY HASH(id);

复制代码

原始语句

MERGE INTO t1 USING (    SELECT *    FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)) tmp ON (t1.id = tmp.id)WHEN MATCHED THEN    UPDATE SET t1.name = tmp.nameWHEN NOT MATCHED THEN    INSERT (name, id) VALUES(tmp.name, tmp.id);

复制代码

SQL 语句不下推，导致执行低效

postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (postgres(#     SELECT *postgres(#     FROM (VALUES ('json', 1), ('sam', 2)) AS val(name, id)postgres(# ) tmp ON (t1.id = tmp.id)postgres-# WHEN MATCHED THENpostgres-#     UPDATE SET t1.name = tmp.namepostgres-# WHEN NOT MATCHED THENpostgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);                                                                            QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------------------------------------  id |                       operation                       | E-rows | E-distinct | E-width | E-costs ----+-------------------------------------------------------+--------+------------+---------+---------   1 | ->  Merge on public.t1                                |      2 |            |      54 | 0.08   2 |    ->  Nested Loop Left Join (3, 4)                   |      2 |            |      54 | 0.08   3 |       ->  Values Scan on "*VALUES*"                   |      2 |            |      36 | 0.03   4 |       ->  Data Node Scan on t1 "_REMOTE_TABLE_QUERY_" |      2 |            |      18 | 0.00
                  SQL Diagnostic Information ------------------------------------------------------------ SQL is not plan-shipping         reason: Type of Record in non-real table can not be shipped
   Predicate Information (identified by plan id) -------------------------------------------------   1 --Merge on public.t1         Node expr: : $10   2 --Nested Loop Left Join (3, 4)         Join Filter: (t1.id = "*VALUES*".column2)
                                                          Targetlist Information (identified by plan id) -----------------------------------------------------------------------------------------------------------------------------------------------------------------   1 --Merge on public.t1         Node/s: All datanodes         Remote query: UPDATE ONLY public.t1 SET name = $7, id = $8 WHERE t1.ctid = $5 AND t1.xc_node_id = $6         Node/s: All datanodes         Remote query: INSERT INTO public.t1 (name, id) VALUES ($9, $10)   2 --Nested Loop Left Join (3, 4)         Output: "*VALUES*".column1, "*VALUES*".column2, t1.name, t1.id, t1.ctid, t1.xc_node_id, "*VALUES*".column1, t1.id, "*VALUES*".column1, "*VALUES*".column2   3 --Values Scan on "*VALUES*"         Output: "*VALUES*".column1, "*VALUES*".column2   4 --Data Node Scan on t1 "_REMOTE_TABLE_QUERY_"         Output: t1.name, t1.id, t1.ctid, t1.xc_node_id         Node/s: All datanodes         Remote query: SELECT name, id, ctid, xc_node_id FROM ONLY public.t1 WHERE true
 ====== Query Summary ===== -------------------------- Parser runtime: 0.079 ms Planner runtime: 1.392 ms Unique SQL Id: 1657855173(40 rows)

复制代码

改写方案

MERGE INTO t1 USING (    WITH val(name, id) AS(        VALUES ('json', 1), ('sam', 2)    )    SELECT * FROM val) tmp ON (t1.id = tmp.id)WHEN MATCHED THEN    UPDATE SET t1.name = tmp.nameWHEN NOT MATCHED THEN    INSERT (name, id) VALUES(tmp.name, tmp.id);

复制代码

改写后下推

postgres=# EXPLAIN VERBOSE MERGE INTO t1 USING (postgres(#     WITH val(name, id) AS(postgres(#         VALUES ('json', 1), ('sam', 2)postgres(#     )postgres(#     SELECT * FROM valpostgres(# ) tmp ON (t1.id = tmp.id)postgres-# WHEN MATCHED THENpostgres-#     UPDATE SET t1.name = tmp.namepostgres-# WHEN NOT MATCHED THENpostgres-#     INSERT (name, id) VALUES(tmp.name, tmp.id);                                                                      QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------  id |                  operation                   | E-rows | E-distinct | E-memory | E-width | E-costs ----+----------------------------------------------+--------+------------+----------+---------+---------   1 | ->  Streaming (type: GATHER)                 |      1 |            |          |      54 | 1.56   2 |    ->  Merge on public.t1                    |      2 |            |          |      54 | 1.15   3 |       ->  Streaming(type: REDISTRIBUTE)      |      2 |            | 2MB      |      54 | 1.15   4 |          ->  Nested Loop Left Join (5, 7)    |      2 |            | 1MB      |      54 | 1.11   5 |             ->  Subquery Scan on tmp         |      2 |            | 1MB      |      36 | 0.08   6 |                ->  Values Scan on "*VALUES*" |     24 |            | 1MB      |      36 | 0.03   7 |             ->  Seq Scan on public.t1        |      2 |            | 1MB      |      18 | 1.01
 Predicate Information (identified by plan id) ---------------------------------------------   4 --Nested Loop Left Join (5, 7)         Join Filter: (t1.id = tmp.id)   5 --Subquery Scan on tmp         Filter: (Hash By tmp.id)
                                                    Targetlist Information (identified by plan id) ----------------------------------------------------------------------------------------------------------------------------------------------------   1 --Streaming (type: GATHER)         Node/s: All datanodes   3 --Streaming(type: REDISTRIBUTE)         Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)         Distribute Key: (CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END)         Spawn on: All datanodes         Consumer Nodes: All datanodes   4 --Nested Loop Left Join (5, 7)         Output: tmp.name, tmp.id, t1.name, t1.id, t1.ctid, t1.xc_node_id, tmp.name, tmp.id, CASE WHEN (t1.ctid IS NULL) THEN tmp.id ELSE t1.id END   5 --Subquery Scan on tmp         Output: tmp.name, tmp.id   6 --Values Scan on "*VALUES*"         Output: "*VALUES*".column1, "*VALUES*".column2   7 --Seq Scan on public.t1         Output: t1.name, t1.id, t1.ctid, t1.xc_node_id         Distribute Key: t1.id
   ====== Query Summary ===== ------------------------------- System available mem: 3112960KB Query Max mem: 3112960KB Query estimated mem: 6336KB Parser runtime: 0.107 ms Planner runtime: 1.185 ms Unique SQL Id: 780461632(44 rows)