【GreatSQL 优化器 -10】find_best_ref

2025-01-10
福建
本文字数：12674 字
阅读完需：约 42 分钟

【GreatSQL 优化器-10】find_best_ref

一、find_best_ref 介绍

GreatSQL 的优化器对于 join 的表需要根据行数和 cost 来确定最后哪张表先执行哪张表后执行，这里面就涉及到预估满足条件的表数据，在 keyuse_array 数组有值的情况下，会用 find_best_ref 函数来通过索引进行 cost 和 rows 的估计，并且会找出最优的索引。这样就可能不会继续用后面的 calculate_scan_cost()进行全表扫描计算 cost，可以节省查询时间。

这个功能是对之前【优化器 05-条件过滤】的补充功能，二者有可能一起用，也有可能只选择一种计算，要看具体条件。

下面用一个简单的例子来说明 find_best_ref 函数获得的结果。

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT,date1 DATETIME);INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456'),(7,null,'2020-03-25 16:44:00.123456'),(8,10,'2020-10-25 16:44:00.123456'),(11,16,'2023-03-25 16:44:00.123456');CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');CREATE INDEX idx1 ON t1(c2);CREATE INDEX idx2 ON t1(c2,date1);CREATE INDEX idx2_1 ON t2(cc2);CREATE INDEX idx3_1 ON t3(ccc1);
greatsql> SELECT * FROM t1 join t2 ON t1.c1=t2.cc1 and t1.c2<5;          {            "ref_optimizer_key_uses": [ 首先这里要有keyuse_array              {                "table": "`t1`",                "field": "c1",                "equals": "`t2`.`cc1`",                "null_rejecting": true              },              {                "table": "`t2`",                "field": "cc1",                "equals": "`t1`.`c1`",                "null_rejecting": true              }            ]          },            "considered_execution_plans": [              {                "plan_prefix": [                ],                "table": "`t1`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref",                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 2,                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "range",                      "range_details": {                        "used_index": "idx2"                      },                      "resulting_rows": 2,                      "cost": 0.660457,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 2,                "cost_for_plan": 0.660457,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t1`"                    ],                    "table": "`t2`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1, 这里就是通过find_best_ref()获得的结果                          "cost": 0.7, 这里就是通过find_best_ref()获得的结果                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "access_type": "scan",                          "chosen": false,                          "cause": "covering_index_better_than_full_scan"                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 2,                    "cost_for_plan": 1.36046,                    "chosen": true                  }                ]              },              {                "plan_prefix": [                ],                "table": "`t2`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref",                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 5,                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "scan",                      "resulting_rows": 5,                      "cost": 0.75,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 5,                "cost_for_plan": 0.75,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t2`"                    ],                    "table": "`t1`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1, 这里就是通过find_best_ref()获得的结果                          "cost": 5.5, 这里就是通过find_best_ref()获得的结果                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "rows_to_scan": 2,                          "filtering_effect": [                          ],                          "final_filtering_effect": 1,                          "access_type": "range",                          "range_details": {                            "used_index": "idx2"                          },                          "resulting_rows": 2,                          "cost": 3.30229,                           "chosen": true                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 10,                    "cost_for_plan": 4.05229,                    "pruned_by_cost": true                  }                ]              }            ]

复制代码

二、find_best_ref 代码解释

find_best_ref 的计算在 best_access_path 函数实现，用来在 keyuse_array 数组有值的时候，预估不同 join 连接时候的 join 表的读取行数和可能的 cost，以及找出 cost 最低的索引。

void Optimize_table_order::best_access_path(JOIN_TAB *tab,                                            const table_map remaining_tables,                                            const uint idx, bool disable_jbuf,                                            const double prefix_rowcount,                                            POSITION *pos) {  if (tab->keyuse() != nullptr &&      (table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0 &&      (!table->set_counter()))    best_ref =        find_best_ref(tab, remaining_tables, idx, prefix_rowcount,                      &found_condition, &ref_depend_map, &used_key_parts);    接着会进行一系列判断来决定是否要继续用calculate_scan_cost()方式来进行估计cost，然后二者进行比较选取更小的cost。  具体见下面<<采用的rows和cost结果>>表  两个方式都会使用calculate_condition_filter()计算条件过滤百分比。}
Key_use *Optimize_table_order::find_best_ref() {  ha_rows distinct_keys_est = tab->records() / MATCHING_ROWS_IN_OTHER_TABLE;  // 遍历keyuse_array数组，按照索引进行计算行数和cost，选取最优的索引  for (Key_use *keyuse = tab->keyuse(); keyuse->table_ref == tab->table_ref;) {   如果是第一张表就跳过，因为第一张表不需要用索引扫描。   主要计算得出下面<<find_best_ref函数获得的结果>>表里的三个变量结果，用于找出最优索引。  }}

复制代码

表：采用的 rows 和 cost 结果

表：find_best_ref 函数获得的结果

表一：cur_fanout 计算方式

表二：cur_read_cost 计算方式

表三：distinct_keys_est 计算方式

注：10 是 MATCHING_ROWS_IN_OTHER_TABLE

表四：keyuse->ref_table_rows 计算方式

注：通过 JOIN::optimize_keyuse()函数获得

表五：索引类型选择优先级

三、实际例子说明

接下来看一开始的例子来说明上面的代码：

greatsql> SELECT * FROM t1 JOIN t2 ON t1.c1=t2.cc1 AND t1.c2<5;          {            "ref_optimizer_key_uses": [              {                "table": "`t1`",                "field": "c1",                "equals": "`t2`.`cc1`",                "null_rejecting": true              },              {                "table": "`t2`",                "field": "cc1",                "equals": "`t1`.`c1`",                "null_rejecting": true              }            ]          },            "considered_execution_plans": [              {                "plan_prefix": [                ],                "table": "`t1`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref", 第一张表不考虑用ref扫描                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 2, 用非best_ref方式扫描                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "range",                      "range_details": {                        "used_index": "idx2"                      },                      "resulting_rows": 2,                      "cost": 0.660457,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 2,                "cost_for_plan": 0.660457,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t1`"                    ],                    "table": "`t2`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1,  eq_ref扫描固定只有一行                          "cost": 0.7,  计算公式=0.5(cur_read_cost) + 2(上一张表的行数) * 1(这张表行数) * 0.1                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "access_type": "scan", 因为t2没有mm tree因此只能选择全表扫描，又因为之前的索引扫描更优，所以这里不选择全表扫描                          "chosen": false,                          "cause": "covering_index_better_than_full_scan"                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 2,                    "cost_for_plan": 1.36046,                    "chosen": true                  }                ]              },              {                "plan_prefix": [                ],                "table": "`t2`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref", 第一张表不考虑用ref扫描                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 5,                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "scan",                      "resulting_rows": 5,                      "cost": 0.75,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 5,                "cost_for_plan": 0.75,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t2`"                    ],                    "table": "`t1`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1, eq_ref扫描固定只有一行                          "cost": 5.5, 计算公式=5(cur_read_cost) + 5(上一张表的行数) * 1(这张表行数) * 0.1                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "rows_to_scan": 2,                          "filtering_effect": [                          ],                          "final_filtering_effect": 1,                          "access_type": "range", 这里因为t1表有mm tree但是扫描方式不是ROWID_INTERSECTION所以还要继续用calculate_scan_cost来估计cost，然后跟上面用eq_ref方式扫描方式进行对比，看哪个更优                          "range_details": {                            "used_index": "idx2"                          },                          "resulting_rows": 2,                          "cost": 3.30229, 因为这里的cost更低，因此舍弃上面的eq_ref扫描方式改为range扫描方式                          "chosen": true                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 10,                    "cost_for_plan": 4.05229,                    "pruned_by_cost": true                  }                ]              }            ]

复制代码

再看一个三张表连接的例子，可以看到 eq_ref 扫描方式也要估算条件过滤百分比，并且第一张表不用索引扫描，也就不需要执行 find_best_ref 函数。

greatsql> SELECT * FROM t1 JOIN t2 JOIN t3 ON t1.c1=t2.cc1 AND t3.ccc1=t2.cc1 AND t1.c2<15 AND t1.c2=10;         {            "considered_execution_plans": [              {                "plan_prefix": [                ],                "table": "`t1`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref",                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "access_type": "ref",                      "index": "idx1",                      "rows": 2,                      "cost": 0.7,                      "chosen": true                    },                    {                      "access_type": "ref",                      "index": "idx2",                      "rows": 2,                      "cost": 0.450457,                      "chosen": true                    },                    {                      "access_type": "range",                      "range_details": {                        "used_index": "idx2"                      },                      "chosen": false,                      "cause": "heuristic_index_cheaper"                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 2,                "cost_for_plan": 0.450457,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t1`"                    ],                    "table": "`t2`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1,                          "cost": 0.7,                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "access_type": "scan",                          "chosen": false,                          "cause": "covering_index_better_than_full_scan"                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 2,                    "cost_for_plan": 1.15046,                    "rest_of_plan": [                      {                        "plan_prefix": [                          "`t1`",                          "`t2`"                        ],                        "table": "`t3`",                        "best_access_path": {                          "considered_access_paths": [                            {                              "access_type": "ref",                              "index": "idx3_1",                              "rows": 1,                              "cost": 0.7,                              "chosen": true                            },                            {                              "rows_to_scan": 5,                              "filtering_effect": [                              ],                              "final_filtering_effect": 1,                              "access_type": "scan",                              "using_join_cache": true,                              "buffers_needed": 1,                              "resulting_rows": 5,                              "cost": 1.25004,                              "chosen": false                            }                          ]                        },                        "added_to_eq_ref_extension": true,                        "condition_filtering_pct": 100,                        "rows_for_plan": 2,                        "cost_for_plan": 1.85046,                        "chosen": true                      }                    ]                  }                ]              },              {                "plan_prefix": [                ],                "table": "`t2`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref",                      "index": "PRIMARY",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 5,                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "scan",                      "resulting_rows": 5,                      "cost": 0.75,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 5,                "cost_for_plan": 0.75,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t2`"                    ],                    "table": "`t1`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1,                          "cost": 1.75,                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "rows_to_scan": 2,                          "filtering_effect": [                          ],                          "final_filtering_effect": 1,                          "access_type": "range",                          "range_details": {                            "used_index": "idx2"                          },                          "resulting_rows": 2,                          "cost": 3.30229,                          "chosen": false                        }                      ]                    },                    "condition_filtering_pct": 28.5714, 这里看到eq_ref扫描方式也要估算条件过滤百分比                    "rows_for_plan": 1.42857,                    "cost_for_plan": 2.5,                    "pruned_by_cost": true                  },                  {                    "plan_prefix": [                      "`t2`"                    ],                    "table": "`t3`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "ref",                          "index": "idx3_1",                          "rows": 1,                          "cost": 1.75,                          "chosen": true                        },                        {                          "rows_to_scan": 5,                          "filtering_effect": [                          ],                          "final_filtering_effect": 1,                          "access_type": "scan",                          "using_join_cache": true,                          "buffers_needed": 1,                          "resulting_rows": 5,                          "cost": 2.75004,                          "chosen": false                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 5,                    "cost_for_plan": 2.5,                    "pruned_by_cost": true                  }                ]              },              {                "plan_prefix": [                ],                "table": "`t3`",                "best_access_path": {                  "considered_access_paths": [                    {                      "access_type": "ref",                      "index": "idx3_1",                      "usable": false,                      "chosen": false                    },                    {                      "rows_to_scan": 5,                      "filtering_effect": [                      ],                      "final_filtering_effect": 1,                      "access_type": "scan",                      "resulting_rows": 5,                      "cost": 0.75,                      "chosen": true                    }                  ]                },                "condition_filtering_pct": 100,                "rows_for_plan": 5,                "cost_for_plan": 0.75,                "rest_of_plan": [                  {                    "plan_prefix": [                      "`t3`"                    ],                    "table": "`t1`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1,                          "cost": 1.75,                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "rows_to_scan": 2,                          "filtering_effect": [                          ],                          "final_filtering_effect": 1,                          "access_type": "range",                          "range_details": {                            "used_index": "idx2"                          },                          "resulting_rows": 2,                          "cost": 3.30229,                          "chosen": false                        }                      ]                    },                    "condition_filtering_pct": 28.5714,                    "rows_for_plan": 1.42857,                    "cost_for_plan": 2.5,                    "pruned_by_cost": true                  },                  {                    "plan_prefix": [                      "`t3`"                    ],                    "table": "`t2`",                    "best_access_path": {                      "considered_access_paths": [                        {                          "access_type": "eq_ref",                          "index": "PRIMARY",                          "rows": 1,                          "cost": 1.75,                          "chosen": true,                          "cause": "clustered_pk_chosen_by_heuristics"                        },                        {                          "access_type": "scan",                          "chosen": false,                          "cause": "covering_index_better_than_full_scan"                        }                      ]                    },                    "condition_filtering_pct": 100,                    "rows_for_plan": 5,                    "cost_for_plan": 2.5,                    "pruned_by_cost": true                  }                ]              }            ]          },

复制代码

四、总结

从上面优化器的步骤我们认识了find_best_ref()进行 rows 和 cost 估计的过程，以及知道了使用这个结果的条件。需要注意的是，只有"ref_optimizer_key_uses"项目有值的情况才有可能使用这个结果。

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/1d63cc7af54714a0013701ae7】。文章转载请联系作者。

GreatSQL

关注

GreatSQL社区 2023-01-31 加入

GreatSQL是由万里数据库维护的MySQL分支，专注于提升MGR可靠性及性能，支持InnoDB并行查询特性，是适用于金融级应用的MySQL分支版本。社区：https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL

发布

暂无评论

创作场景

【GreatSQL 优化器 -10】find_best_ref

【GreatSQL 优化器-10】find_best_ref

一、find_best_ref 介绍

二、find_best_ref 代码解释

三、实际例子说明

四、总结

GreatSQL

评论