【GreatSQL 优化器 -09】make_join_query_block

2025-01-08
福建
本文字数：6804 字
阅读完需：约 22 分钟

【GreatSQL 优化器-09】make_join_query_block

一、make_join_query_block 介绍

GreatSQL 优化器对于多张表 join 的连接顺序在前面的章节介绍过的 best_access_path 函数已经执行了，接着就是把 where 条件进行切割然后推给合适的表。这个过程就是由函数 make_join_query_block 来执行的。

下面用几个简单的例子来说明 join 连接中条件推送是什么。

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT,date1 DATETIME);INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456'),(7,null,'2020-03-25 16:44:00.123456'),(8,10,'2020-10-25 16:44:00.123456'),(11,16,'2023-03-25 16:44:00.123456');CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');CREATE INDEX idx1 ON t1(c2);CREATE INDEX idx2 ON t1(c2,date1);CREATE INDEX idx2_1 ON t2(cc2);CREATE INDEX idx3_1 ON t3(ccc1);

复制代码

下面这个例子((t1.c1 = t3.ccc1) or (t3.ccc1 < 3))条件推送给 t1

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| EXPLAIN                                                                                                                                                                                                                                                         |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| -> Filter: ((t1.c1 = t3.ccc1) or (t3.ccc1 < 3))  (cost=5.26 rows=35)    -> Inner hash join (no condition)  (cost=5.26 rows=35)        -> Index scan on t1 using idx2  (cost=0.34 rows=7)        -> Hash            -> Table scan on t3  (cost=0.75 rows=5) |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.00 sec)

复制代码

下面例子(t1.c1 < 3)条件推给 t1，(ccc1=t1.c1)条件推给 t3

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 and t3.ccc1<3;+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| EXPLAIN                                                                                                                                                                                                                          |+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| -> Nested loop inner join  (cost=2.40 rows=2)    -> Filter: (t1.c1 < 3)  (cost=1.70 rows=2)        -> Index scan on t1 using idx2  (cost=1.70 rows=7)    -> Index lookup on t3 using idx3_1 (ccc1=t1.c1)  (cost=0.30 rows=1) |+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

复制代码

下面例子((t3.ccc1 = t1.c2) or (t3.ccc1 is null) or (t3.ccc2 like 'a%'))条件推给 t3，(((t3.ccc1 = t1.c2) and (t2.cc1 = t1.c1)) or (t3.ccc1 is null) or (t3.ccc2 like 'a%'))条件推给 t2

greatsql> EXPLAIN SELECT * FROM t2,t1,t3 WHERE t1.c1=t2.cc1 AND t1.c2=t3.ccc1 OR t3.ccc1 IS NULL OR t3.ccc2 LIKE 'a%';| -> Filter: (((t3.ccc1 = t1.c2) and (t2.cc1 = t1.c1)) or (t3.ccc1 is null) or (t3.ccc2 like 'a%'))  (cost=14.27 rows=85)    -> Inner hash join (no condition)  (cost=14.27 rows=85)        -> Index scan on t2 using idx2_1  (cost=0.09 rows=5)        -> Hash            -> Filter: ((t3.ccc1 = t1.c2) or (t3.ccc1 is null) or (t3.ccc2 like 'a%'))  (cost=4.70 rows=17)                -> Inner hash join (no condition)  (cost=4.70 rows=17)                    -> Table scan on t3  (cost=0.07 rows=5)                    -> Hash                        -> Index scan on t1 using idx2  (cost=0.95 rows=7)

复制代码

二、make_join_query_block 代码解释

make_join_query_block函数通过 join 表顺序和每张表的 table_map 属性以及 cond 条件的属性来决定 cond 条件添加到哪张表,并且可能会重新对表的索引进行 check 找出 cost 更低的索引，下面是代码解析。

bool JOIN::optimize() {  make_join_query_block();}
static bool make_join_query_block(JOIN *join, Item *cond) {  for (uint i = join->const_tables; i < join->tables; i++) {      // 这四个变量说明见表一      JOIN_TAB *const tab = join->best_ref[i];      const plan_idx first_inner = tab->first_inner();      const table_map used_tables = tab->prefix_tables();      const table_map current_map = tab->added_tables();      if (cond)        // 这里通过table_map属性决定了是否给这个表添加条件，见下面表二、表四和表五说明        tmp = make_cond_for_table(thd, cond, used_tables, current_map, false);     // 如果recheck_reason=true，这里需要重新做一次确认，找出cost最低的索引。见表六      if (recheck_reason)        test_if_order_by_key();        test_if_cheaper_ordering();        test_quick_select();      }        /* Add conditions added by add_not_null_conds(). */      if (and_conditions(&tmp, tab->condition())) return true;      if (join->attach_join_conditions(i)) return true;  }}// 条件添加基本原则是条件带有表列的添加到该表，但是如果属性不一致的话也不会添加，只会添加到最后一张表。具体解释见下面实际例子。

复制代码

表一：上面四个变量解释

表二：make_cond_for_table()动作

表三：is_expensive_processor()函数

表四：Item 的 table_map 属性

表五：表连接添加的属性

表六：表的索引是否要重新 check

三、实际例子说明

接下来看几个例子来说明上面的代码。

首先看一下最后确定的连接顺序，为 t1,t3,t2，因为条件不带有 RAND_TABLE_BIT 的 Item，因此最后是按照 cond 含有的列推送给对应表来实现的。

例子一：

greatsql> EXPLAIN SELECT * FROM t2,t1,t3 WHERE t1.c1=t2.cc1 AND t1.c2=t3.ccc1 OR t3.ccc1 IS NULL OR t3.ccc2 LIKE 'a%';+----+-------------+-------+------------+-------+-------------------+--------+---------+------+------+----------+---------------------------------------------------------+| id | select_type | table | partitions | type  | possible_keys     | key    | key_len | ref  | rows | filtered | Extra                                                   |+----+-------------+-------+------------+-------+-------------------+--------+---------+------+------+----------+---------------------------------------------------------+|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY,idx1,idx2 | idx2   | 11      | NULL |    7 |   100.00 | Using index                                             ||  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1            | NULL   | NULL    | NULL |    5 |    48.80 | Using where; Using join buffer (hash join)              ||  1 | SIMPLE      | t2    | NULL       | index | PRIMARY           | idx2_1 | 5       | NULL |    5 |   100.00 | Using where; Using index; Using join buffer (hash join) |+----+-------------+-------+------------+-------+-------------------+--------+---------+------+------+----------+---------------------------------------------------------+

复制代码

表一：是否把 cond 条件推送给表

注：这里的中括号代表当前检测表的左连接表，中括号右边就是当前正在检测的表

表二：表的 table_map 值

注：这里的 INNER_TABLE_BIT 和 OUTER_REF_TABLE_BIT 在函数 JOIN::set_prefix_tables()默认加上了

看一下结果是否符合预期，确实如上表所述。这里看到又执行了一次test_quick_select()来确定走哪个索引。

"attaching_conditions_to_tables": {              "original_condition": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))",              "attached_conditions_computation": [                {                  "table": "`t2`",                  "rechecking_index_usage": { 这里对索引重新做了一次check                    "recheck_reason": "not_first_table",                    "range_analysis": {                      "table_scan": {                        "rows": 5,                        "cost": 3.6                      },                      "potential_range_indexes": [                        {                          "index": "PRIMARY",                          "usable": true,                          "key_parts": [                            "cc1"                          ]                        },                        {                          "index": "idx2_1",                          "usable": false,                          "cause": "not_applicable"                        }                      ],                      "best_covering_index_scan": {                        "index": "idx2_1",                        "cost": 0.751098,                        "chosen": true                      },                      "setup_range_conditions": [                      ],                      "group_index_range": {                        "chosen": false,                        "cause": "not_single_table"                      },                      "skip_scan_range": {                        "chosen": false,                        "cause": "not_single_table"                      }                    }                  }                }              ],              "attached_conditions_summary": [                {                  "table": "`t1`",                  "attached": null                },                {                  "table": "`t3`",                  "attached": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))"                },                {                  "table": "`t2`",                  "attached": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))"                }              ]            }          },          {            "finalizing_table_conditions": [              {                "table": "`t3`",                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))",                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))"              },              {                "table": "`t2`",                "original_table_condition": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))",                "final_table_condition   ": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or (`t3`.`ccc1` is null) or (`t3`.`ccc2` like 'a%'))"              }            ]          },          {            "refine_plan": [              {                "table": "`t1`"              },              {                "table": "`t3`"              },              {                "table": "`t2`"              }            ]          }        ]      }    }

复制代码

如果条件带有 RAND_TABLE_BIT 的 Item，那么即使 cond 带有表的列，也不会推送给对应的表，而是推送到最后一张表。看下面的 t1.c1 < rand()这个条件。

例子二：

greatsql> SELECT * FROM t2,t1,t3 WHERE t1.c1=t2.cc1 AND t1.c2=t3.ccc1 OR t3.ccc1 IS NULL AND t1.c1 < rand();              "attached_conditions_summary": [                {                  "table": "`t1`",                  "attached": null                },                {                  "table": "`t3`",                  "attached": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null))"                },                {                  "table": "`t2`",                  "attached": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or ((`t3`.`ccc1` is null) and (`t1`.`c1` < rand())))"  看到条件t1.c1 < rand()没有推送给t1而是推送到最后一张表t2去了                }              ]            }          },          {            "finalizing_table_conditions": [              {                "table": "`t3`",                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null))",                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c2`) or (`t3`.`ccc1` is null))"              },              {                "table": "`t2`",                "original_table_condition": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or ((`t3`.`ccc1` is null) and (`t1`.`c1` < rand())))",                "final_table_condition   ": "(((`t3`.`ccc1` = `t1`.`c2`) and (`t2`.`cc1` = `t1`.`c1`)) or ((`t3`.`ccc1` is null) and (`t1`.`c1` < rand())))"              }            ]          },          {            "refine_plan": [              {                "table": "`t1`"              },              {                "table": "`t3`"              },              {                "table": "`t2`"              }

复制代码

看一下每张表的属性：

四、总结

从上面优化器最早的步骤我们认识了make_join_query_block函数的作用，知道了通过 join 表顺序和每张表的 table_map 属性以及 cond 条件的属性来决定 cond 条件添加到哪张表,并且可能会重新对表的索引进行 check 找出 cost 更低的索引，需要注意的是有的带有表列的条件不会被添加到对应表，因为 Item 的属性跟表的属性不一致所以最后只会被添加到最后一张 join 表。

发布于: 刚刚阅读数: 4

原文链接:【http://xie.infoq.cn/article/0a1dc8648bde26fd68e1c258b】。文章转载请联系作者。

GreatSQL

关注

GreatSQL社区 2023-01-31 加入

GreatSQL是由万里数据库维护的MySQL分支，专注于提升MGR可靠性及性能，支持InnoDB并行查询特性，是适用于金融级应用的MySQL分支版本。社区：https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL

发布

暂无评论

创作场景

【GreatSQL 优化器 -09】make_join_query_block

【GreatSQL 优化器-09】make_join_query_block

一、make_join_query_block 介绍

二、make_join_query_block 代码解释

三、实际例子说明

四、总结

GreatSQL

评论