写点什么

2 个场景实例讲解 GaussDB(DWS) 基表统计信息估算不准的处理方案

  • 2023-06-02
    广东
  • 本文字数:4516 字

    阅读完需:约 15 分钟

2个场景实例讲解GaussDB(DWS)基表统计信息估算不准的处理方案

本文分享自华为云社区《GaussDB(DWS)运维 -- 基表统计信息估算不准的常见场景及处理方案》,作者:譡里个檔。

场景 1:基表过滤字段存在的隐式类型时,基表行数估算偏小


这种场景绝大部分场景 DWS 能够处理,但是如果隐式类型转后的结果与统计信息中的字段枚举值的表达式不一样,就会导致估算的严重偏差


原始 SQL 如下


SELECT * FROM dmgrpdi.dwl_inv_res_rpt_ci_grp_f WHERE period_id=202212 AND source_flag=1;
复制代码


对应的执行计划


                                                    QUERY PLAN-------------------------------------------------------------------------------------------------------------------  id |                            operation                             | E-rows | E-memory | E-width |  E-costs ----+------------------------------------------------------------------+--------+----------+---------+-----------   1 | ->  Row Adapter                                                  |  14160 |          |     717 | 680025.43   2 |    ->  Vector Streaming (type: GATHER)                           |  14160 |          |     717 | 680025.43   3 |       ->  Vector Partition Iterator                              |  14160 | 1MB      |     717 | 678241.33   4 |          ->  Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f |  14160 | 1MB      |     717 | 678241.33
Predicate Information (identified by plan id) ------------------------------------------------------------------------------- 3 --Vector Partition Iterator Iterations: 1 4 --Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f Filter: ((period_id = 202212::numeric) AND ((source_flag)::bigint = 1)) Pushdown Predicate Filter: (period_id = 202212::numeric) Partitions Selected by Static Prune: 36
复制代码


发现 source_flag 字段上存在隐式类型转换,查询字段 source_flag 的统计信息


postgres=# SELECT most_common_vals,most_common_freqs, histogram_bounds  FROM pg_stats WHERE tablename = 'dwl_inv_res_rpt_ci_grp_f' AND attname = 'source_flag'; most_common_vals |         most_common_freqs         | histogram_bounds------------------+-----------------------------------+------------------ {01,02,04,03}    | {.440034,.241349,.217413,.101089} | {05,06}(1 row)
复制代码


发现隐式类型转后的结果(1)与统计信息中的字段枚举值('01')的表达式不一样

处理方案:修改过滤条件,禁止类型转换,并且使用正确的常量值书写过滤条件


如上 SQL 语句中的 source_flag=1 修改为 source_flag='01',修改后 SQL 语句如下


SELECT * FROM dmgrpdi.dwl_inv_res_rpt_ci_grp_f WHERE period_id=202212 AND source_flag='01';
复制代码


查询新语句的执行计划


                                                      QUERY PLAN----------------------------------------------------------------------------------------------------------------------  id |                            operation                             |  E-rows   | E-memory | E-width |  E-costs ----+------------------------------------------------------------------+-----------+----------+---------+-----------   1 | ->  Row Adapter                                                  | 108359075 |          |     717 | 480542.98   2 |    ->  Vector Streaming (type: GATHER)                           | 108359075 |          |     717 | 480542.98   3 |       ->  Vector Partition Iterator                              | 108359075 | 1MB      |     717 | 478758.88   4 |          ->  Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f | 108359075 | 1MB      |     717 | 478758.88
Predicate Information (identified by plan id) ------------------------------------------------------------------------------------------------- 3 --Vector Partition Iterator Iterations: 1 4 --Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f Filter: ((period_id = 202212::numeric) AND (source_flag = '01'::text)) Pushdown Predicate Filter: ((period_id = 202212::numeric) AND (source_flag = '01'::text)) Partitions Selected by Static Prune: 36
复制代码

场景 2:基表在多列组合主键上过滤时,基表行数估算偏大


这种场景是因为 DWS 对基表上多个过滤条件之间采取弱相关性处理,当多个过滤条件是主键时,可能导致结果集估算偏大。


原始 SQL 如下


SELECT * FROM mca.mca_period_rate_t mca_rate2WHERE period_number = '202208' AND from_currency_code = 'RMB' AND to_currency_code = 'USD'
复制代码


执行信息如下


 id |                      operation                       |       A-time       | A-rows | E-rows | Peak Memory | E-memory | A-width | E-width | E-costs  ----+------------------------------------------------------+--------------------+--------+--------+-------------+----------+---------+---------+----------  1 | ->  Row Adapter                                      | 444.735            |      1 |   2033 | 227KB       |          |         |     321 | 22601.41   2 |    ->  Vector Streaming (type: GATHER)               | 444.720            |      1 |   2033 | 873KB       |          |         |     321 | 22601.41   3 |       ->  CStore Scan on mca_period_rate_t mca_rate2 | [435.167, 435.167] |      1 |   2033 | [5MB, 5MB]  | 1MB      |         |     321 | 22427.41 
Predicate Information (identified by plan id) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3 --CStore Scan on mca_period_rate_t mca_rate2 Filter: (((period_number)::text = '202208'::text) AND ((from_currency_code)::text = 'RMB'::text) AND ((to_currency_code)::text = 'USD'::text)) Rows Removed by Filter: 425812 Pushdown Predicate Filter: (((period_number)::text = '202208'::text) AND ((from_currency_code)::text = 'RMB'::text) AND ((to_currency_code)::text = 'USD'::text))
复制代码


可以发现基表 mca.mca_period_rate_t 的行数估算严重偏大。


使用如下 SQL 语句查看表 mca.mca_period_rate_t 的定义


SELECT pg_get_tabledef('mca.mca_period_rate_t'::regclass);
复制代码


查询表 mca.mca_period_rate_t 定义


SELECT pg_get_tabledef('mca.mca_period_rate_t');SET search_path = mca;CREATE  TABLE mca_period_rate_t (	seq numeric NOT NULL,	period_number character varying(10) NOT NULL,	from_currency_code character varying(20) NOT NULL,	to_currency_code character varying(20) NOT NULL,	begin_rate numeric(35,18),	end_rate numeric(35,18),	avg_rate numeric(35,18),	creation_date timestamp(0) without time zone NOT NULL,	created_by numeric NOT NULL,	last_update_date timestamp(0) without time zone,	last_updated_by numeric,	rmb_begin_rate numeric(35,18),	usd_begin_rate numeric(35,18),	rmb_end_rate numeric(35,18),	usd_end_rate numeric(35,18),	rmb_avg_rate numeric(35,18),	usd_avg_rate numeric(35,18),	crt_cycle_id numeric,	crt_job_instance_id numeric,	last_upd_cycle_id numeric,	upd_job_instance_id numeric,	cdc_key_id character varying(128) DEFAULT sys_guid(),	end_rate2 numeric(35,18),	avg_rate2 numeric(35,18),	last_period_end_rate numeric(35,18))WITH (orientation=column, compression=low, colversion=2.0, enable_delta=false)DISTRIBUTE BY REPLICATIONTO GROUP group_version1;CREATE UNIQUE INDEX mca_period_rate_u1 ON mca.mca_period_rate_t USING cbtree (period_number, from_currency_code, to_currency_code) TABLESPACE pg_default;
复制代码


发现 (period_number, from_currency_code, to_currency_code) 为组合的唯一索引。

处理方案:对组合索引列收多列统计信息


注意此种方案只适用在基表比较小的情况下。因为多列统计信息需要使用百分比采样的方式计算统计信息,当表比较大时,统计信息计算耗时回很长。


针对如上查询语句执行如下语句收集(period_number, from_currency_code, to_currency_code) 多列统计信息


ANALYZE mca.mca_period_rate_t((period_number, from_currency_code, to_currency_code));
复制代码


收集多列统计信息之后,基表的行数估算恢复正产


 id |                                      operation                                      |       A-time       | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+-------------------------------------------------------------------------------------+--------------------+--------+--------+-------------+---------+---------+---------  1 | ->  Row Adapter                                                                     | 195.504            |      1 |      1 | 227KB       |         |     321 | 675.14    2 |    ->  Vector Streaming (type: GATHER)                                              | 195.491            |      1 |      1 | 873KB       |         |     321 | 675.14    3 |       ->  CStore Index Scan using mca_period_rate_u1 on mca_period_rate_t mca_rate2 | [164.344, 164.344] |      1 |      1 | [5MB, 5MB]  |         |     321 | 501.14  
Predicate Information (identified by plan id) ---------------------------------------------------------------------------------------------------------------------------------------------------------- 3 --CStore Index Scan using mca_period_rate_u1 on mca_period_rate_t mca_rate2 Index Cond: (((period_number)::text = '202208'::text) AND ((from_currency_code)::text = 'RMB'::te
复制代码

点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 3
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
2个场景实例讲解GaussDB(DWS)基表统计信息估算不准的处理方案_数据库_华为云开发者联盟_InfoQ写作社区