写点什么

基于 tidbV6.0 探索 tiflash 在多标签组合场景下的使用

  • 2022 年 7 月 11 日
  • 本文字数:10820 字

    阅读完需:约 35 分钟

作者: 边城元元原文来源:https://tidb.net/blog/3ca385cb

一、背景

1、有一个需求多字段表和几个基础信息表,现在的需求是需要对多字段表任意字段任意组合的查询。


2、考虑到对多个字段的不定组合进行筛选的话肯定要进行全表扫了。目前 TiDBV6.0 已经发布有一段时间了,TiFlash 的性能也更加抢强大和稳定了,决定实验使用 TiFlash 来承接这部分的业务需求。

二、准备

2.1 建立 TiDB cluster111

  • 拓扑如下(cluster111-full.yaml)

  • 部署集群


具体的部署可以参考文章 https://tidb.net/blog/af8080f7#TiDB-最小实践 Cluster111


# tiup cluster list # tiup cluster stop cluster111# tiup cluster destroy cluster111# 部署cluster111集群tiup cluster deploy cluster111 ./cluster111-full.yaml --user root -ptiup cluster start cluster111
复制代码


2.2 创建库表

CREATE TABLE `m_cust_org` (  `cust_id` char(30) not null,              `org_id` varchar(10) default null,          `org_name` varchar(100) default null,       `org_ii_id` varchar(10) default null,       `org_ii_name` varchar(100) default null,    `org_i_id` varchar(10) default null ,     `org_i_name` varchar(100) default null,     `org_level` varchar(2) default null ,                                                                             `pici` bigint(20) not null default '0',                                                                              PRIMARY KEY (`cust_id`) /*T![clustered_index] CLUSTERED */,  KEY `ix_m_cust_org_orgidmgrig` (`org_id`,`mgr_id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='条件筛选表';CREATE TABLE `m_cust_main` (  `cust_id` char(30) not null ,               `cust_name` varchar(200) default null,      `cert_type` varchar(13) default null,       `cert_num` varchar(40) default null ,       `cust_type` varchar(2) default null ,       `sex` varchar(13) default null,             `age` int(11) default null ,                `birth_dt` varchar(13) default null ,     `marriage` varchar(13) default null ,     `city_code` varchar(100) default null ,   `nation_code` varchar(100) default null ,                                                           `edu` varchar(13) default null ,          `ocup` varchar(100) default null ,        `post` varchar(20) default null ,         `copy_name` varchar(200) default null ,   `contact_addr` varchar(200) default null ,                                                           `card_level` varchar(2) default null ,    `service_level` varchar(2) default null ,                                                           `estimate_level` varchar(2) default null ,  `mark_id` varchar(50) default null ,      `mark_name` varchar(255) default null ,   primary key (`cust_id`) /*t![clustered_index] clustered */,  key `idx_m_cust_main_desc1` (`cert_type`,`cert_num`),  key `idx_m_cust_main_desc_3` (`cust_name`)) engine=innodb default charset=utf8mb4 collate=utf8mb4_bin comment='客户主表';CREATE TABLE `m_cust_data` (  `cust_id` char(30) not null     ,            `asset` decimal(18,2) default null ,      `asset_mon_avg` decimal(18,2) default null ,  `asset_sea_avg` decimal(18,2) default null ,  `asset_yea_avg` decimal(18,2) default null ,  `asset_roll_avg` decimal(18,2) default null ,  `debt` decimal(18,2) default null ,       `dep_bal` decimal(18,2) default null ,    `dep_mon_avg` decimal(18,2) default null ,  `dep_sea_avg` decimal(18,2) default null ,  `dep_yea_avg` decimal(18,2) default null ,  `nd_bal` decimal(18,2) default null ,     `mf_bal` decimal(18,2) default null ,     `fund_bal` decimal(18,2) default null ,   `ccard_out_amt` decimal(18,2) default null ,  `ccard_bal` decimal(18,2) default null ,  `ins_bal` decimal(18,2) default null ,    `loan_bal` decimal(18,2) default null ,   `loan_amt` decimal(18,2) default null ,   `etl_date` char(8) default null ,         `qszg_bal` decimal(24,2) default null ,   `dx_fnc_bal` decimal(24,2) default null ,  `cur_dep_bal` decimal(18,2) default null ,  `rep_bal` decimal(18,2) default null ,    `rep_avg` decimal(18,2) default null ,    `is_rep_beyond` char(2) default null ,  Primary Key (`cust_id`) /*t![clustered_index] clustered */,  Key `idx_m_cust_query_desc_4` (`asset_sea_avg`,`cust_id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='客户频繁更新数据表';CREATE TABLE `m_cust_label` (  `cust_id` char(30) NOT NULL,                     `cat1` int not null default  0 ,              `cat2` int not null default  0 ,                `cat3` int not null default  0 ,                `cat4` int not null default  0 ,                `cat5` int not null default  0 ,                `cat6` int not null default  0 ,                `cat7` int not null default  0 ,                `cat8` int not null default  0 ,                `cat9` int not null default  0 ,                `cat10` int not null default  0 ,               `cat11` int not null default  0 ,               `cat12` int not null default  0 ,               `cat13` int not null default  0 ,               `cat14` int not null default  0 ,               `cat15` int not null default  0 ,               `cat16` int not null default  0 ,               `cat17` int not null default  0 ,               `cat18` int not null default  0 ,               `cat19` int not null default  0 ,               `cat20` int not null default  0 ,               `cat21` int not null default  0 ,               `cat22` int not null default  0 ,               `cat23` int not null default  0 ,               `cat24` int not null default  0 ,               `cat25` int not null default  0 ,               `cat26` int not null default  0 ,               `cat27` int not null default  0 ,               `cat28` int not null default  0 ,               `cat29` int not null default  0 ,               `cat30` int not null default  0 ,               `cat31` int not null default  0 ,               `cat32` int not null default  0 ,               `cat33` int not null default  0 ,               `cat34` int not null default  0 ,               `cat35` int not null default  0 ,               `cat36` int not null default  0 ,               `cat37` int not null default  0 ,               `cat38` int not null default  0 ,               `cat39` int not null default  0 ,               `cat40` int not null default  0 ,               `cat41` int not null default  0 ,               `cat42` int not null default  0 ,               `cat43` int not null default  0 ,               `cat44` int not null default  0 ,               `cat45` int not null default  0 ,               `cat46` int not null default  0 ,               `cat47` int not null default  0 ,               `cat48` int not null default  0 ,               `cat49` int not null default  0 ,               `cat50` int not null default  0 ,               `cat51` int not null default  0 ,               `cat52` int not null default  0 ,               `cat53` int not null default  0 ,               `cat54` int not null default  0 ,               `cat55` int not null default  0 ,               `cat56` int not null default  0 ,               `cat57` int not null default  0 ,               `cat58` int not null default  0 ,               `cat59` int not null default  0 ,               `cat60` int not null default  0 ,               `cat61` int not null default  0 ,               `cat62` int not null default  0 ,               `cat63` int not null default  0 ,               `cat64` int not null default  0 ,               `cat65` int not null default  0 ,               `cat66` int not null default  0 ,               `cat67` int not null default  0 ,               `cat68` int not null default  0 ,               `cat69` int not null default  0 ,                       PRIMARY KEY (`CUST_ID`) /*T![clustered_index] CLUSTERED */) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='客户检索标签';
复制代码


2.3 初始化数据种子表

-- 建立seed-- m_seeddrop table if exists `m_seed`;CREATE TABLE `m_seed` (  `id` int(11) NOT NULL AUTO_INCREMENT,  PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;truncate table m_seed;SELECT * from m_seed;insert into m_seed values();-- 不连续的id 每次 【执行完后,auto_inccreaid 造成不连续】insert into m_seed select null from m_seed;insert into m_seed select null from m_seed;insert into m_seed select null from m_seed;insert into m_seed select null from m_seed;insert into m_seed select null from m_seed;insert into m_seed select null from m_seed;-- 本地虚拟机环境配置比较低这里使用每次528条记录-- 528 SELECT count(*) from m_seed;
复制代码

2.4 初始化数据脚本 bat 版

::echo off@echo off    rem 每次的数量为 m_seed表中的记录数rem 总次数set/a sumnum=200set num=0rem 这里必须有 start标签,否则循环里去时间 (都一样 的时间,需要从新调度)rem 在bat里有中文的,需要设置为gb2312的编码,,到sql才是中文:start     set pici=%Date:~5,2%%Date:~8,2%%Time:~0,2%%Time:~3,2%%Time:~6,2%    set pici=%pici: =%    echo %pici%        echo '----------------1-----------'    mysql -h 127.0.0.1 -P 4000 -u "root"  -D b_crm -e "INSERT INTO b_crm.m_cust_org(CUST_ID, ORG_ID, ORG_NAME, ORG_II_ID, ORG_II_NAME, ORG_I_ID, ORG_I_NAME, ORG_LEVEL, pici) select concat('%pici%','A',id), FLOOR( 100 + RAND() * (10000 - 100)),concat('归属三级机构名称', RAND() ), FLOOR( 100 + RAND() * (10000 - 100)),concat('归属二级机构名称', RAND() ), FLOOR( 100 + RAND() * (10000 - 100)),concat('归属一级机构名称', RAND() ), FLOOR( 1 + RAND() * (10 - 1)),'%pici%' from m_seed " --default-character-set=utf8    echo '----------------2-----------'    rem 2    mysql -h 127.0.0.1 -P 4000 -u "root"  -D b_crm -e "INSERT INTO m_cust_data(CUST_ID, ASSET, ASSET_MON_AVG, ASSET_SEA_AVG, ASSET_YEA_AVG, ASSET_ROLL_AVG, DEBT, DEP_BAL, DEP_MON_AVG, DEP_SEA_AVG, DEP_YEA_AVG, ND_BAL, MF_BAL, FUND_BAL, CCARD_OUT_AMT, CCARD_BAL, INS_BAL, LOAN_BAL, LOAN_AMT, ETL_DATE, QSZG_BAL, DX_FNC_BAL, CUR_DEP_BAL, REP_BAL, REP_AVG, IS_REP_BEYOND) select concat('%pici%','A',id) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) , FLOOR(RAND() *100000000) ,DATE_FORMAT(FROM_UNIXTIME(1524895617+FLOOR(RAND() * 126230510)),'%%Y%%m%%d') , FLOOR(RAND() *10000000) , FLOOR(RAND() *10000000) , FLOOR(RAND() *10000000) , FLOOR(RAND() *10000000) , FLOOR(RAND() *10000000) , if(RAND() >0.9,'Y','N') from m_seed " --default-character-set=utf8        echo '----------------3-----------'     rem 3    mysql -h 127.0.0.1 -P 4000 -u "root"  -D b_crm -e "INSERT INTO m_cust_main(CUST_ID, CUST_NAME, CERT_TYPE, CERT_NUM, CUST_TYPE, SEX, AGE, BIRTH_DT, MARRIAGE, CITY_CODE, NATION_CODE, EDU, OCUP, POST, COPY_NAME, CONTACT_ADDR, CARD_LEVEL, SERVICE_LEVEL, ESTIMATE_LEVEL, MARK_ID, MARK_NAME) select  concat('%pici%','A',id)  ,concat(substring('赵钱孙李周吴郑王冯陈诸卫蒋沈韩杨朱秦尤许何吕施张孔曹严华金魏陶姜戚谢邹喻柏水窦章云苏潘葛奚范彭郎鲁韦昌马苗凤花方俞任袁柳酆鲍史唐费廉岑薛雷贺倪汤滕殷罗毕郝邬安常乐于时傅皮齐康伍余元卜顾孟平黄和穆萧尹姚邵堪汪祁毛禹狄米贝明臧计伏成戴谈宋茅',floor(1+190*rand()),1),substring('明国华建文平志伟东海强晓生光林小民永杰军金健一忠洪江福祥中正振勇耀春大宁亮宇兴宝少剑云学仁涛瑞飞鹏安亚泽世汉达卫利胜敏群波成荣新峰刚家龙德庆斌辉良玉俊立浩天宏子松克清长嘉红山贤阳乐锋智青跃元武广思雄锦威启昌铭维义宗英凯鸿森超坚旭政传康继翔栋仲权奇礼楠炜友年震鑫雷兵万星骏伦绍麟雨行才希彦兆贵源有景升惠臣慧开章润高佳虎根远力进泉茂毅富博霖顺信凡豪树和恩向道川彬柏磊敬书鸣芳培全炳基冠晖京欣廷哲保秋君劲轩帆若连勋祖锡吉崇钧田石奕发洲彪钢运伯满庭申湘皓承梓雪孟其潮冰怀鲁裕翰征谦航士尧标洁城寿枫革纯风化逸腾岳银鹤琳显焕来心凤睿勤延凌昊西羽百捷定琦圣佩麒虹如靖日咏会久昕黎桂玮燕可越彤雁孝宪萌颖艺夏桐月瑜沛诚夫声冬奎扬双坤镇楚水铁喜之迪泰方同滨邦先聪朝善非恒晋汝丹为晨乃秀岩辰洋然厚灿卓杨钰兰怡灵淇美琪亦晶舒菁真涵爽雅爱依静棋宜男蔚芝菲露娜珊雯淑曼萍珠诗璇琴素梅玲蕾艳紫珍丽仪梦倩伊茜妍碧芬儿岚婷菊妮媛莲娟一',floor(1+400*rand()),1),substring('明国华建文平志伟东海强晓生光林小民永杰军金健一忠洪江福祥中正振勇耀春大宁亮宇兴宝少剑云学仁涛瑞飞鹏安亚泽世汉达卫利胜敏群波成荣新峰刚家龙德庆斌辉良玉俊立浩天宏子松克清长嘉红山贤',floor(1+400*rand()),if(rand()>0.6,0,1))) , FLOOR( 1 + RAND() * (13 - 1)) ,md5(rand()) , FLOOR( 1 + RAND() * (20 - 1)) , FLOOR( 1 + RAND() * (3 - 1))   , FLOOR( 1 + RAND() * (100 - 1)) , FLOOR( 1960 + RAND() * (2022 - 1960)) , if(RAND() >0.6,'Y','N') , FLOOR( 100 + RAND() * (500 - 100)) , FLOOR( 1 + RAND() * (57 - 1)) , FLOOR( 1 + RAND() * (6 - 1)) ,substring(md5(rand()), 1, 20) , FLOOR( 1000 + RAND() * (9000 - 1000)) ,concat('单位', RAND() ) ,concat('地址', RAND() ) , FLOOR( 1 + RAND() * (10 - 1)) , FLOOR( 1 + RAND() * (10 - 1)) , FLOOR( 1 + RAND() * (10 - 1)) , FLOOR( 10000 + RAND() * (90000 - 10000)) ,concat(substring('赵钱孙李周吴郑王冯陈诸卫蒋沈韩杨朱秦尤许何吕施张孔曹严华金魏陶姜戚谢邹喻柏水窦章云苏潘葛奚范彭郎鲁韦昌马苗凤花方俞任袁柳酆鲍史唐费廉岑薛雷贺倪汤滕殷罗毕郝邬安常乐于时傅皮齐康伍余元卜顾孟平黄和穆萧尹姚邵堪汪祁毛禹狄米贝明臧计伏成戴谈宋茅庞熊纪舒屈项祝董粱杜阮蓝闵席季麻强贾路娄危江童颜郭梅盛林刁钟徐邱骆高夏蔡田樊胡凌霍虞万支柯咎管卢莫经房裘干解应宗丁宣贲邓郁单杭洪包诸左石崔吉钮龚',floor(1+190*rand()),1),substring('明国华建文平志伟东海强晓生光林小民永杰军金健一忠洪江福祥中正振勇耀春大宁亮宇兴宝少剑云学仁涛瑞飞鹏安亚泽世汉达卫利胜敏群波成荣新峰刚家龙德庆斌辉良玉俊立浩天宏子松克清长嘉红山贤阳乐锋智青跃元武广思雄锦威启昌铭维义宗英凯鸿森超坚旭政传康继翔栋仲权奇礼楠炜友年震鑫雷兵万星骏伦绍麟雨行才希彦兆贵源有景升惠臣慧开章润高佳虎根远力进泉茂毅富博霖顺信凡豪树和恩向道川彬柏磊敬书鸣芳培全炳基冠晖京欣廷哲保秋君劲轩帆若连勋祖锡吉崇钧田石奕发洲彪钢运伯满庭申湘皓承梓雪孟其潮冰怀鲁裕翰征谦航士尧标洁城寿枫革纯风化逸腾岳银鹤琳显焕来心凤睿勤延凌昊西羽百捷定琦圣佩麒虹如靖日咏会久昕黎桂玮燕可越彤雁孝宪萌颖艺夏桐月瑜沛诚夫声冬奎扬双坤镇楚水铁喜之迪泰方同滨邦先聪朝善非恒晋汝丹为晨乃秀岩辰洋然厚灿卓杨钰兰怡灵淇美琪亦晶舒菁真涵爽雅爱依静棋宜男蔚芝菲露娜珊雯淑曼萍珠诗璇琴素梅玲蕾艳紫珍丽仪梦倩伊茜妍碧芬儿岚婷菊妮媛莲娟一',floor(1+400*rand()),1),substring('明国华建文平志伟东海强晓生光林小民永杰军金健一忠洪江福祥中正振勇耀春大宁亮宇兴宝少剑云学仁涛瑞飞鹏安亚泽世汉达卫利胜敏群波成荣新峰刚家龙德庆斌辉良玉俊立浩天宏子松',floor(1+400*rand()),if(rand()>0.6,0,1)))  from  m_seed " --default-character-set=utf8            echo '----------------4-----------'     mysql -h 127.0.0.1 -P 4000 -u "root"  -D b_crm -e "INSERT INTO m_cust_label(CUST_ID,cat1,cat2,cat3,cat4,cat5,cat6,cat7,cat8,cat9,cat10) select  concat('%pici%','A',id) , FLOOR(RAND() *1000),FLOOR(RAND() *1000),FLOOR(RAND() *100),FLOOR(RAND() *100),FLOOR(RAND() *10),FLOOR(RAND() *10),FLOOR(RAND() *800),FLOOR(RAND() *700),FLOOR(RAND() *600),FLOOR(RAND() *100000) from m_seed "            rem 数值的相加    set/a  num= %num%+1    echo %num%    echo %sumnum%    rem 比较大小 geq 表示大于等于    rem 注意小括号    if %num% geq %sumnum% (        echo '--------------------------'        echo %sumnum%        echo '---------end--------------'        goto stop    )        timeout /T 5 /NOBREAK            goto start    :stop    echo '---stop ----'pause
复制代码

三、测试多字段组合

3.1 准备数据

3.1.1 准备 c1,c2,… c10 中找出数量排行前 5 的作为测试数据使用

-- cat1   516select a1.* from (select 'c1',cat1,count(*) as ct  from m_cust_label GROUP BY cat1 ORDER BY ct desc limit 5 ) a1 union allselect a2.* from (select 'c2',cat2,count(*) as ct  from m_cust_label GROUP BY cat2 ORDER BY ct desc limit 5) a2 union allselect a3.* from (select 'c3',cat3,count(*) as ct  from m_cust_label GROUP BY cat3 ORDER BY ct desc limit 5 ) a3 union allselect a4.* from (select 'c4',cat4,count(*) as ct  from m_cust_label GROUP BY cat4 ORDER BY ct desc limit 5 ) a4 union allselect a5.* from (select 'c5',cat5,count(*) as ct  from m_cust_label GROUP BY cat5 ORDER BY ct desc limit 5 ) a5 union allselect a6.* from (select 'c6',cat6,count(*) as ct  from m_cust_label GROUP BY cat6 ORDER BY ct desc limit 5 ) a6 union allselect a7.* from (select 'c7',cat7,count(*) as ct  from m_cust_label GROUP BY cat7 ORDER BY ct desc limit 5 ) a7 union allselect a8.* from (select 'c8',cat8,count(*) as ct  from m_cust_label GROUP BY cat8 ORDER BY ct desc limit 5 ) a8 union allselect a9.* from (select 'c9',cat9,count(*) as ct  from m_cust_label GROUP BY cat9 ORDER BY ct desc limit 5 ) a9 union allselect a10.* from (select 'c10',cat10,count(*) as ct  from m_cust_label GROUP BY cat10 ORDER BY ct desc limit 5) a10;
复制代码


cat字段,id值,记录条数c3  19  155c3  37  153c3  45  153c3  66  151c3  50  151c2  154 25c2  937 25c2  203 24c2  82  23c2  504 22c5  7   1353c5  1   1305c5  4   1298c5  3   1268c5  8   1267c4  99  159c4  65  150c4  47  150c4  8   149c4  46  148c1  516 26c1  710 26c1  121 25c1  230 25c1  889 25c6  0   1343c6  9   1303c6  7   1291c6  6   1288c6  8   1287c7  196 30c7  751 29c7  756 27c7  584 27c7  265 26c8  27  36c8  239 33c8  604 31c8  214 30c8  126 30c9  416 39c9  38  37c9  69  36c9  91  35c9  72  35c10 94038   3c10 46313   3c10 72580   3c10 92499   3c10 94648   3
复制代码

3.1.2 准备 orgid

select ORG_ID,count(*) as ct from m_cust_org GROUP BY ORG_ID ORDER BY ct desc limit 10;
复制代码


3.1.3 开启 TiFlash

-- 针对指定表开启 Tiflash(列存)ALTER TABLE m_cust_label SET TIFLASH REPLICA 1;
-- 查看TiFlash同步状态select * from information_schema.tiflash_replica;select * from information_schema.TIFLASH_SEGMENTS;select * from information_schema.TIFLASH_TABLES;
复制代码



progress=1 表示同步完成!

3.2 具体场景分析

3.2.1 仅有宽表 m_cust_label 字段的组合

思路:先找出主键,在组装其他需要字段


EXPLAIN ANALYZEselect a.*,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from( -- 思路找到cust_id select /*+ read_from_storage(tiflash[m]) */ m.cust_id from m_cust_label m where m.cat1 in(516,710,230) and m.cat2 in(154,504) ORDER BY m.cust_id desc limit 100
) a left join m_cust_main b on a.cust_id=b.cust_id left join m_cust_org c on a.cust_id=c.cust_idleft join m_cust_data d on a.cust_id=d.cust_id;
复制代码



指定使用了 Tiflash 的列扫,


如果不指定 Tiflash 而是指定 TikV 又是什么情况呢?


EXPLAIN ANALYZEselect a.*,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from( -- 思路找到cust_id select /*+ read_from_storage(tikv[m]) */ m.cust_id from m_cust_label m where m.cat1 in(516,710,230) and m.cat2 in(154,504) ORDER BY m.cust_id desc limit 100
) a left join m_cust_main b on a.cust_id=b.cust_id left join m_cust_org c on a.cust_id=c.cust_idleft join m_cust_data d on a.cust_id=d.cust_id;
复制代码



在不确定组合的情况下无法使 tikv 加索引如果数据量较大的话 tiflash 的列扫更优有事。

3.2.2 仅有同表非宽表字段

EXPLAIN ANALYZEselect a.*,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from( -- 思路找到cust_id select n.cust_id from m_cust_org n where n.ORG_ID in('8716','7162') ORDER BY n.cust_id desc limit 100
) a left join m_cust_main b on a.cust_id=b.cust_id left join m_cust_org c on a.cust_id=c.cust_idleft join m_cust_data d on a.cust_id=d.cust_id;
复制代码


3.2.3 异表包含宽表字段

EXPLAIN ANALYZEselect a.*,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from( -- 思路找到cust_id select /*+ read_from_storage(tiflash[m]) */ m.cust_id from m_cust_label m right join m_cust_org n on m.CUST_ID=n.CUST_ID where m.cat1 in(516,710,230) and n.ORG_ID in('8716','7162') ORDER BY m.cust_id desc limit 100
) a left join m_cust_main b on a.cust_id=b.cust_id left join m_cust_org c on a.cust_id=c.cust_idleft join m_cust_data d on a.cust_id=d.cust_id
复制代码



利用了 Tiflash 的列扫和 Tikv 的索引优势。

3.2.4 异表不包含宽表字段

-- 1、先查cust_id的方式EXPLAIN ANALYZEselect a.*,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from( -- 思路找到cust_id select m.cust_id from m_cust_data m right join m_cust_org n on m.CUST_ID=n.CUST_ID where m.ASSET BETWEEN 10 and 100  and n.ORG_ID in('8716','7162') ORDER BY m.cust_id desc limit 100
) a left join m_cust_main b on a.cust_id=b.cust_id left join m_cust_org c on a.cust_id=c.cust_idleft join m_cust_data d on a.cust_id=d.cust_id
复制代码



-- 2、直接join的方式EXPLAIN ANALYZEselect b.cust_id,b.CUST_NAME, b.CERT_TYPE, b.CERT_NUM, b.CUST_TYPE, b.SEX, b.AGE, b.BIRTH_DT,c.ORG_ID, c.ORG_NAME,d.ASSET,d.ASSET_MON_AVG from m_cust_main bleft join m_cust_org c on b.cust_id=c.cust_idleft join m_cust_data d on b.cust_id=d.cust_id where d.asset BETWEEN 10 and 100 and c.ORG_ID in('8716','7162') ORDER BY b.cust_id desc limit 100
复制代码



在测试环境中这 2 个的速度差不多,从执行计划来看的话 先查 cust_id 的效率高一些!

3.3 Tiflash 的优化

既然要使用 TiFlash 来承接宽表的检索 那么如何优化 TiFlash 呢。


  • 1、TiFlash 的副本最好大于 1 小于 TiKV 的数量

  • 2、TiFlash 的并发数

  • 3、TiFlash 节点要独立部署

四、总结

基本思路:1) 建立了聚簇表 ;2)使用 where 过滤数据保留尽可能晓的数据记录;3)先找主键再 join


对于不确定字段组合的情况可以参考下面


  • 1、不确定 where 条件列和 order 排序列 必定会走全表扫 tableFullScan 使用 tiflash 来加速不确定条件的筛选

  • 2、where,order,limit 只是筛选出主键, 根据主键再 join 出需要的字段。

  • 3、不走 tiflash 的字段, 可以考虑走 tikv 的索引


非常感谢 TiDB 社区的提供学习交流的机会!


期待 TiDB 6.0 长期支持版本 发布!


发布于: 刚刚阅读数: 2
用户头像

TiDB 社区官网:https://tidb.net/ 2021.12.15 加入

TiDB 社区干货传送门是由 TiDB 社区中布道师组委会自发组织的 TiDB 社区优质内容对外宣布的栏目,旨在加深 TiDBer 之间的交流和学习。一起构建有爱、互助、共创共建的 TiDB 社区 https://tidb.net/

评论

发布
暂无评论
基于tidbV6.0探索tiflash在多标签组合场景下的使用_实践案例_TiDB 社区干货传送门_InfoQ写作社区