写点什么

【数仓运维实践】关于 GaussDB(DWS) 单 SQL 磁盘空间管控

  • 2023-03-23
    广东
  • 本文字数:5050 字

    阅读完需:约 17 分钟

【数仓运维实践】关于GaussDB(DWS)单SQL磁盘空间管控

本文分享自华为云社区《GaussDB(DWS)运维 -- 单SQL磁盘空间管控》,作者: 譡里个檔。


【问题描述】


执行部分 SQL 语句时出现如下报错信息(具体数值可能因为配置有差异),本文针对根因和场景触发场景,确定触发此类问题的根因


: The space used on DN (209715224 kB) has exceeded the sql use space limit (209715200 kB)


【问题根因】


该报错表示用户执行的 sql 在单 DN 上所用空间超过了参数 sql_use_spacelimit 的限制。sql_use_spacelimit 限制单个 SQL 在单个 DN 上,触发落盘操作时,落盘文件的空间大小,管控的空间包括普通表、临时表以及中间结果集落盘占用的空间


可以使用如下 SQL 查看所有实例上的配置参数 sql_use_spacelimit 的值


SELECT * FROM pgxc_settings WHERE name = 'sql_use_spacelimit';
复制代码


【解决方案】


当前现网最常见的此类错误一般都是 INERT 语句触发的,我们以常见如下语句为例,说明这类问题的解决方案


INSERT INTO dwljaa.bif_col_edw_dut_257_t(attribute1, attribute2, attribute3, attribute4, attribute5, column_name1, column_name2, column_name3, column_name4, column_name5, tag_code, tag_id, table_name, period, tbl_code, tag_grp_code, target_key_val, cycle_id, creation_date, target_key_num, priority)SELECT     'SCN_SVC_3003', NULL, NULL, NULL, NULL, 'BIZ_SCR_CODE', NULL, NULL, NULL, NULL,    'SCN_SVC_3003-02', 3026937, 'dwr_fin_hwip_man_je_f_tmp0', '202208', 'PL_E17360237',    'SUB_PL_PUB_SCN', A.record_seq_num, 20230321000000, SYSDATE, A.record_seq_num, 53333FROM (SELECT  /*+PARALLEL(8) NO_EXPAND*/ T.record_seq_num    FROM dwljaa.dwr_fin_hwip_man_je_f_tmp0 T    INNER JOIN dwrdim.dwr_dim_department_d PL_E100134 ON T.COA_DEPT_KEY = PL_E100134.DEPT_KEY    INNER JOIN dwrdim.dwr_dim_grp_acct_code_d PL_E100119 ON T.GROUP_ACCOUNT_CODE = PL_E100119.GROUP_ACCOUNT_CODE    INNER JOIN dwrdim.dwr_dim_journal_category_d PL_E100147 ON T.JE_CATEGORY_ID = PL_E100147.JE_CATEGORY_ID    INNER JOIN dwrdim.dwr_dim_product_d PL_E100121 ON T.MAJOR_PROD_KEY = PL_E100121.PROD_KEY    INNER JOIN dwrdim.dwr_dim_product_d PL_E100122 ON T.MINOR_PROD_KEY = PL_E100122.PROD_KEY    INNER JOIN dwrdim.dwr_dim_project_d PL_E100155 ON T.PROJ_KEY = PL_E100155.PROJ_KEY    WHERE 1 = 1 AND ((((((((((PL_E100119.LVL1_ACCOUNT_CODE IN('501', '503', '506', '512')) OR(PL_E100119.GROUP_ACCOUNT_CODE = '5980406')) AND((PL_E100121.PROD_CLASS_FLAG = '0') OR(PL_E100122.PROD_CLASS_FLAG = '0'))) OR((PL_E100119.LVL1_ACCOUNT_CODE IN('504', 'SVC')) OR(PL_E100119.LVL2_ACCOUNT_CODE IN('55107', '57702', '57703', '58303')) OR(PL_E100119.GROUP_ACCOUNT_CODE IN('5980407', '5825107', '5827702', '5827703', 'C501'))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E01', 'E03', 'E05', 'E06', 'E08', 'E09', 'E10', 'E11', 'E02', 'E04', 'E07', 'E12')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE = 'Z4') AND((PL_E100155.PROJ_TYPE_CODE <> '02') OR(PL_E100155.PROJ_NUM IN('9000000')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E02', 'E04', 'E07', 'E12', 'E05', 'E09', 'E10')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.GROUP_ACCOUNT_CODE <> 'EXP5555')))) AND(PL_E100119.LVL2_ACCOUNT_CODE <> '58303') AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.LVL1_ACCOUNT_CODE = '504') AND(PL_E100147.CN_NAME IN('JV-PFC cooper cost', 'JV-ADJ PA Cooper/Constr', 'JV-Agent REV&COST Adj', 'JV-Agent totalvalue Adj')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE = 'E05') AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01') AND(PL_E100147.CN_NAME <> 'JV-ADJ 557 WITH B CODE') AND((((((((PL_E100119.LVL1_ACCOUNT_CODE IN('501', '503', '506', '512')) OR(PL_E100119.GROUP_ACCOUNT_CODE = '5980406')) AND((PL_E100121.PROD_CLASS_FLAG = '0') OR(PL_E100122.PROD_CLASS_FLAG = '0'))) OR((PL_E100119.LVL1_ACCOUNT_CODE IN('504', 'SVC')) OR(PL_E100119.LVL2_ACCOUNT_CODE IN('55107', '57702', '57703', '58303')) OR(PL_E100119.GROUP_ACCOUNT_CODE IN('5980407', '5825107', '5827702', '5827703', 'C501'))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E01', 'E03', 'E05', 'E06', 'E08', 'E09', 'E10', 'E11', 'E02', 'E04', 'E07', 'E12')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE = 'Z4') AND((PL_E100155.PROJ_TYPE_CODE <> '02') OR(PL_E100155.PROJ_NUM IN('9000000')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E02', 'E04', 'E07', 'E12', 'E05', 'E09', 'E10')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.GROUP_ACCOUNT_CODE <> 'EXP5555')))) AND(PL_E100119.LVL2_ACCOUNT_CODE <> '58303') AND(PL_E100155.PROJ_TYPE_CODE = '01')))))    AND T.PERIOD_ID >= 202208    AND T.PERIOD_ID <= 202208 ) A
复制代码


场景 1


1) 往目标表 INSERT 的数据量过大,导致 INSERT 的数据量在单 DN 上使用的存储空间超过 sql_use_spacelimit 阈值


这种场景一般是配置参数 sql_use_spacelimit 设置不合理,建议直接调大配置参数 sql_use_spacelimit 的值


2) INSERT 的数据存在倾斜,导致某个 DN 上数据写入量特别大,在这个 DN 上触发了 sql_use_spacelimit 阈值


这类问题的解决一般分为以下几个步骤


a) 使用如下 SQL 查找目标表分布列


SELECT pg_get_tabledef('dwljaa.bif_col_edw_dut_257_t'::regclass);
复制代码


获取的表定义如下


SET search_path = dwljaa;CREATE  TABLE bif_col_edw_dut_257_t (	tag_grp_code character varying(100),	tag_code character varying(100),	tag_id numeric,	period character varying(20),	tbl_code character varying(100) NOT NULL,	table_name character varying(50),	target_key_val character varying(100),	target_key_num numeric,	cycle_id numeric,	creation_date timestamp(0) without time zone,	attribute1 character varying(100),	attribute2 character varying(100),	attribute3 character varying(100),	attribute4 character varying(100),	attribute5 character varying(100),	priority numeric,	column_name1 character varying(100),	column_name2 character varying(100),	column_name3 character varying(100),	column_name4 character varying(100),	column_name5 character varying(100),	carrying_dimension1 character varying(100),	carrying_dimension2 character varying(100),	carrying_dimension3 character varying(100))WITH (orientation=column, compression=low, colversion=2.0, enable_delta=false)DISTRIBUTE BY HASH(target_key_num)TO GROUP group_version1;
复制代码


b) 根据表定义和 INSERT 语句,确认分布列在查询语句中的输出列位置


根据表定义(分布列为 target_key_num)以及原始的 INSERT 语句描述,查询语句输出的导数第二列( A.record_seq_num)对应目标表的分布列 target_key_num


c)构建如下查询语句,判断查询语句输出数据在字段 A.record_seq_num 是否存在严重


如果输出的第一条记录的 cnt 值非常大(比如上百万甚至更多),导致此值对应记录的存储空间可能触发 sql_use_spacelimit 阈值,那么就可以明确是数据倾斜导致的。 这种场景一般需要先排查数据倾斜产生的原因是否合理,如果数据缺失存在倾斜,那么建议修改表的分布列,具体修改方案参见GaussDB(DWS)性能调优系列实战篇三:十八般武艺之好味道表定义》


WITH t AS(-- 把原始语句中的查询部分封装为CTE,查询语句实处列只包含分布列    SELECT         A.record_seq_num    FROM (SELECT  /*+PARALLEL(8) NO_EXPAND*/ T.record_seq_num        FROM dwljaa.dwr_fin_hwip_man_je_f_tmp0 T        INNER JOIN dwrdim.dwr_dim_department_d PL_E100134 ON T.COA_DEPT_KEY = PL_E100134.DEPT_KEY        INNER JOIN dwrdim.dwr_dim_grp_acct_code_d PL_E100119 ON T.GROUP_ACCOUNT_CODE = PL_E100119.GROUP_ACCOUNT_CODE        INNER JOIN dwrdim.dwr_dim_journal_category_d PL_E100147 ON T.JE_CATEGORY_ID = PL_E100147.JE_CATEGORY_ID        INNER JOIN dwrdim.dwr_dim_product_d PL_E100121 ON T.MAJOR_PROD_KEY = PL_E100121.PROD_KEY        INNER JOIN dwrdim.dwr_dim_product_d PL_E100122 ON T.MINOR_PROD_KEY = PL_E100122.PROD_KEY        INNER JOIN dwrdim.dwr_dim_project_d PL_E100155 ON T.PROJ_KEY = PL_E100155.PROJ_KEY        WHERE 1 = 1 AND ((((((((((PL_E100119.LVL1_ACCOUNT_CODE IN('501', '503', '506', '512')) OR(PL_E100119.GROUP_ACCOUNT_CODE = '5980406')) AND((PL_E100121.PROD_CLASS_FLAG = '0') OR(PL_E100122.PROD_CLASS_FLAG = '0'))) OR((PL_E100119.LVL1_ACCOUNT_CODE IN('504', 'SVC')) OR(PL_E100119.LVL2_ACCOUNT_CODE IN('55107', '57702', '57703', '58303')) OR(PL_E100119.GROUP_ACCOUNT_CODE IN('5980407', '5825107', '5827702', '5827703', 'C501'))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E01', 'E03', 'E05', 'E06', 'E08', 'E09', 'E10', 'E11', 'E02', 'E04', 'E07', 'E12')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE = 'Z4') AND((PL_E100155.PROJ_TYPE_CODE <> '02') OR(PL_E100155.PROJ_NUM IN('9000000')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E02', 'E04', 'E07', 'E12', 'E05', 'E09', 'E10')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.GROUP_ACCOUNT_CODE <> 'EXP5555')))) AND(PL_E100119.LVL2_ACCOUNT_CODE <> '58303') AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.LVL1_ACCOUNT_CODE = '504') AND(PL_E100147.CN_NAME IN('JV-PFC cooper cost', 'JV-ADJ PA Cooper/Constr', 'JV-Agent REV&COST Adj', 'JV-Agent totalvalue Adj')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE = 'E05') AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01') AND(PL_E100147.CN_NAME <> 'JV-ADJ 557 WITH B CODE') AND((((((((PL_E100119.LVL1_ACCOUNT_CODE IN('501', '503', '506', '512')) OR(PL_E100119.GROUP_ACCOUNT_CODE = '5980406')) AND((PL_E100121.PROD_CLASS_FLAG = '0') OR(PL_E100122.PROD_CLASS_FLAG = '0'))) OR((PL_E100119.LVL1_ACCOUNT_CODE IN('504', 'SVC')) OR(PL_E100119.LVL2_ACCOUNT_CODE IN('55107', '57702', '57703', '58303')) OR(PL_E100119.GROUP_ACCOUNT_CODE IN('5980407', '5825107', '5827702', '5827703', 'C501'))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E01', 'E03', 'E05', 'E06', 'E08', 'E09', 'E10', 'E11', 'E02', 'E04', 'E07', 'E12')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE = 'Z4') AND((PL_E100155.PROJ_TYPE_CODE <> '02') OR(PL_E100155.PROJ_NUM IN('9000000')))) OR((PL_E100119.ACCOUNT_EXPENSE_CATG_CODE IN('E02', 'E04', 'E07', 'E12', 'E05', 'E09', 'E10')) AND(PL_E100134.DEPT_EXPENSE_TYPE_CODE IN('Z1', 'Z2', 'Z3')) AND(PL_E100155.PROJ_TYPE_CODE = '01'))) AND((PL_E100119.GROUP_ACCOUNT_CODE <> 'EXP5555')))) AND(PL_E100119.LVL2_ACCOUNT_CODE <> '58303') AND(PL_E100155.PROJ_TYPE_CODE = '01')))))        AND T.PERIOD_ID >= 202208        AND T.PERIOD_ID <= 202208     ) A)-- 对分布列做汇总求和,查找分布列值重复次数最多的值SELECT     record_seq_num, cntFROM (    SELECT record_seq_num, count(1) AS cnt     FROM t     GROUP BY record_seq_num HAVING count(1) > 10000)ORDER BY cntLIMIT 10
复制代码

点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 3
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
【数仓运维实践】关于GaussDB(DWS)单SQL磁盘空间管控_数据库_华为云开发者联盟_InfoQ写作社区