写点什么

GaussDB(DWS) 性能调优,解决 DM 区大内存占用问题

  • 2024-07-04
    广东
  • 本文字数:4981 字

    阅读完需:约 16 分钟

GaussDB(DWS)性能调优,解决DM区大内存占用问题

本文分享自华为云社区《GaussDB(DWS)性能调优:DM区优化案例——维度表关联条件存在会计期》,作者: O 泡果奶~。


当前 DM(P1、P3、CBGDM)存在维度表与主表关联时使用会计期作为关联条件,会导致出现大内存占用未识别数据倾斜的问题

【场景一】f.period_id = 维度表.period_id

1.1、【问题描述】


主表和维度表关联过程中将会计期作为关联条件,导致维度表未进行分区剪枝,可能会产生大内存占用的情况

1.2、【原始 SQL】


仅呈现 SQL 中的问题,详细 SQL 见附件


FROM		DMACC.dm_adp_ar_trx_dtl_tmp F		INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY		INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 		AND T9.PROD_POV_ID = 1		INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY		INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY		LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 		AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY		LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 		AND F.PERIOD_ID = PP.PERIOD_ID		LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID		INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID		INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID		INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID		LEFT JOIN (		SELECT C			.CONTRACT_KEY,			D.COMPANY_KEY,			R.FIRST_SHIP_DATE 		FROM			DMDIM.dm_dim_contract_d C,			DMDIM.DM_DIM_COMPANY_D D,			DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 		WHERE			C.CONTRACT_ID = R.CONTRACT_ID 			AND D.COMPANY_ID = R.COMPANY_ID 		) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 		AND F.COA_COMPANY_KEY = FR.COMPANY_KEY		INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY		JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID		JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 
复制代码

1.3、【性能分析】





从上图的执行计划可以看出,由于用会计期作为关联条件,导致维度表未进行分区剪枝,数据量大,不但产生了数据倾斜,同时还由于数据量大出现了关联下盘,大大降低了 sql 执行性能。主表只有一个会计期,可以识别出对应的会计期,然后对 SQL 进行如下改写:


FROM		DMACC.dm_adp_ar_trx_dtl_tmp F		INNER JOIN DMDIM.DM_DIM_REGION_RC_D REG ON F.COA_GEO_PC_KEY = REG.GEO_PC_KEY		INNER JOIN DMDIM.DM_DIM_PRODUCT_T_D T9 ON F.PROD_KEY = T9.PROD_KEY 		AND T9.PROD_POV_ID = 1		INNER JOIN DMDIM.DM_DIM_PROJECT_D J ON F.PROJ_KEY = J.PROJ_KEY		INNER JOIN DMDIM.DM_DIM_CONTRACT_D HT ON HT.CONTRACT_KEY = F.CONTRACT_KEY		LEFT JOIN DMCOMMON.DWR_CONFIG_DOMESTIC_FINANCE_V FIN ON F.COA_COMPANY_KEY = FIN.COMPANY_KEY 		AND F.COA_GEO_PC_KEY = FIN.GEO_PC_KEY		LEFT JOIN DMAR.DWB_FMD_DIM_INVOICE_PAY_PLAN_D PP ON F.AR_INVOICE_PAY_PLAN_ID = PP.AR_INVOICE_PAY_PLAN_ID 		AND PP.PERIOD_ID = '202406'		LEFT JOIN DMARDI.DWR_DIM_AR_INVOICE_V INV ON F.AR_INVOICE_ID = INV.AR_INVOICE_ID		INNER JOIN DMARDI.DWR_DIM_AR_APPLICATION_V APP ON F.AR_APPLICATION_RECORD_ID = APP.AR_APPLICATION_RECORD_ID		INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_V RCP ON F.AR_RECEIPT_RECORD_ID = RCP.AR_RECEIPT_RECORD_ID		INNER JOIN DMARDI.DWR_DIM_AR_RECEIPT_TYPE_V RT ON RCP.RECEIPT_RECORD_TYPE_ID = RT.AR_RECEIPT_TYPE_ID		LEFT JOIN (		SELECT C			.CONTRACT_KEY,			D.COMPANY_KEY,			R.FIRST_SHIP_DATE 		FROM			DMDIM.dm_dim_contract_d C,			DMDIM.DM_DIM_COMPANY_D D,			DMARDI.DWR_CTRCT_FIRST_SHIP_DATE_R R 		WHERE			C.CONTRACT_ID = R.CONTRACT_ID 			AND D.COMPANY_ID = R.COMPANY_ID 		) FR ON F.CONTRACT_KEY = FR.CONTRACT_KEY 		AND F.COA_COMPANY_KEY = FR.COMPANY_KEY		INNER JOIN DMDIM.DM_DIM_SALES_MODE_D MO ON F.SALES_MODE_KEY = MO.SALES_MODE_KEY		JOIN DMDIM.DM_DIM_JOURNAL_SOURCE_D T29 ON F.JE_SOURCE_ID = T29.JE_SOURCE_ID		JOIN DMDIM.DM_DIM_JOURNAL_CATEGORY_D T30 ON F.JE_CATEGORY_ID = T30.JE_CATEGORY_ID 
复制代码


经优化后,执行计划如下图所示,维度表进行了分区剪枝,数据量减少,缓解了数据倾斜,也避免了关联下盘的问题。




【场景二】f left join 维度表 on f.period_id = 维度表.period_id and 维度表.period_id = ‘会计期’

2.1、【问题描述】


主表和维度表关联过程中将会计期作为关联条件,同时还为维度表会计期进行赋值,可能会产生数据倾斜未识别的情况

2.2、【原始 SQL】


FROM		dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1		LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 		AND LT1.period_id = LT2.period_id 		AND LT2.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.period_id = LT3.period_id 		AND LT1.geo_pc_key = LT3.old_key 		AND LT3.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.period_id = LT4.period_id 		AND LT1.account_dept_cust_key = LT4.old_key 		AND LT4.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.period_id = LT5.period_id 		AND LT1.proj_key = LT5.old_key 		AND LT5.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.period_id = LT6.period_id 		AND LT1.enterprise_cust_key = LT6.old_key 		AND LT6.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.period_id = LT7.period_id 		AND LT1.report_item_id = LT7.old_key 		AND LT7.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.period_id = LT8.period_id 		AND LT1.supply_center_key = LT8.old_key 		AND LT8.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.period_id = LT9.period_id 		AND LT1.inventory_class_key = LT9.old_key 		AND LT9.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.period_id = LT10.period_id 		AND LT1.business_status_key = LT10.old_key 		AND LT10.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.period_id = LT11.period_id 		AND LT1.hisi_prod_key = LT11.old_key 		AND LT11.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.period_id = LT12.period_id 		AND LT1.inventory_org_key = LT12.old_key 		AND LT12.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.period_id = LT13.period_id 		AND LT1.end_cust_key = LT13.old_key 		AND LT13.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.period_id = LT14.period_id 		AND LT1.sign_cust_key = LT14.old_key 		AND LT14.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.period_id = LT15.period_id 		AND LT1.agent_distribution_cust_key = LT15.old_key 		AND LT15.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.period_id = LT16.period_id 		AND LT1.company_key = LT16.old_key 		AND LT16.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.period_id = LT17.period_id 		AND LT1.contract_key = LT17.old_key 		AND LT17.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.period_id = LT18.period_id 		AND LT1.loan_contract_key = LT18.old_key 		AND LT18.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.period_id = LT19.period_id 		AND LT1.target_supply_center_key = LT19.old_key 		AND LT19.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.period_id = LT20.period_id 		AND LT1.subinventory_key = LT20.old_key 		AND LT20.PERIOD_ID = 202406 	WHERE		1 = 1 	AND partition_value IN ( 0, 1 )
复制代码

2.3、【性能分析】




上图的执行计划可以看出,在主表一开始关联过程中就存在数据倾斜,导致 SQL 执行性能差。




详细执行计划中,虽然维度表进行了分区剪枝,但由于使用了 left join,导致关联条件中维度表的常量 period_id 不能直接赋值给主表 period_id,主表关联后的结果重分布时将 period_id 作为了分布键之一,这会影响优化器的倾斜优化。可以将 f.period_id = 维度表.period_id 这一关联条件删掉,对 sql 进行如下改写


FROM		dmdp.dm_dpc_inv_m_dtl_f_TEM_A LT1		LEFT JOIN dmcommon.dm_dim_prod_key_r LT2 ON LT1.prod_key = LT2.old_key 		AND LT2.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_reg_key_r LT3 ON LT1.geo_pc_key = LT3.old_key 		AND LT3.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT4 ON LT1.account_dept_cust_key = LT4.old_key 		AND LT4.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_proj_key_r LT5 ON LT1.proj_key = LT5.old_key 		AND LT5.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT6 ON LT1.enterprise_cust_key = LT6.old_key 		AND LT6.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_rep_key_r LT7 ON LT1.report_item_id = LT7.old_key 		AND LT7.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT8 ON LT1.supply_center_key = LT8.old_key 		AND LT8.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_inv_key_r LT9 ON LT1.inventory_class_key = LT9.old_key 		AND LT9.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_bus_key_r LT10 ON LT1.business_status_key = LT10.old_key 		AND LT10.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_hisi_key_r LT11 ON LT1.hisi_prod_key = LT11.old_key 		AND LT11.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_inv_org_key_r LT12 ON LT1.inventory_org_key = LT12.old_key 		AND LT12.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT13 ON LT1.end_cust_key = LT13.old_key 		AND LT13.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT14 ON LT1.sign_cust_key = LT14.old_key 		AND LT14.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_cus_key_r LT15 ON LT1.agent_distribution_cust_key = LT15.old_key 		AND LT15.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_com_key_r LT16 ON LT1.company_key = LT16.old_key 		AND LT16.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_con_key_r LT17 ON LT1.contract_key = LT17.old_key 		AND LT17.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_con_key_r LT18 ON LT1.loan_contract_key = LT18.old_key 		AND LT18.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_supply_center_key_r LT19 ON LT1.target_supply_center_key = LT19.old_key 		AND LT19.PERIOD_ID = 202406		LEFT JOIN dmcommon.dm_dim_subinventory_key_r LT20 ON LT1.subinventory_key = LT20.old_key 		AND LT20.PERIOD_ID = 202406 	WHERE		1 = 1 	AND partition_value IN ( 0, 1 )
复制代码


改写后,执行计划如下所示



可以看出,执行计划不但进行了分区剪枝,同时优化器还进行了倾斜优化,提高了 SQL 执行性能


点击关注,第一时间了解华为云新鲜技术~

发布于: 刚刚阅读数: 5
用户头像

提供全面深入的云计算技术干货 2020-07-14 加入

生于云,长于云,让开发者成为决定性力量

评论

发布
暂无评论
GaussDB(DWS)性能调优,解决DM区大内存占用问题_大数据_华为云开发者联盟_InfoQ写作社区