MySQL8.0 优化器介绍（三）

作者：GreatSQL

2023-04-19
福建
本文字数：8178 字
阅读完需：约 27 分钟

GreatSQL 社区原创内容未经授权不得随意使用，转载请联系小编并注明来源。
GreatSQL 是 MySQL 的国产分支版本，使用上与 MySQL 一致。
作者：奥特曼爱小怪兽
文章来源：GreatSQL 社区原创

往期回顾

MySQL8.0 优化器介绍（一）

MySQL8.0 优化器介绍（二）

本篇将进一步深入介绍优化器相关的 join 优化

为更好的理解本篇内容需要提前看一下以下内容：

单表访问的方法，参考《MySQL 是怎样运行的：从根儿上理解 MySQL》第 10 章"单表访问方法"
更多 select 语句级别的优化细节见（https://dev.mysql.com/doc/refman/8.0/en/select-optimization.html）

为了让读者对 join 优化有更深的了解，章节里的 sql 例子，留了一些思考和动手的问题。可能大家得到的答案会不同，但探索未知的过程，方式应该是一样的。

join 优化（Join Optimizations）

MySQL 可以使用 Join Optimizations 来改进上次分享过的 join algorithms，或者决定如何执行部分查询。本次主要介绍三种经常用到的 join Optimizations，更多的 join type 见下面的链接：（https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#explain-join-types）

index merge

通常 MySQL 只会对每个表使用一个索引。但是，如果对同一表中的多个列在 where 后有条件限制，并且没有覆盖所有列的单个索引，无论选哪个索引都不是最佳的。对于这些情况，MySQL 支持索引合并 (index merge)。select a,b,c from t where a=x1 and b=x2 and c=x3,这种情况下，建立一个多列的复合索引 index_abc 比使用 index merge +index_a+index_b+index_c 性能更好。

Index merge 支持三种算法见下表

查询计划使用 index merge 时，会在 explain sql 的 access type 列有"index_merge"，key 列会包含所有参与 merge 的列， key_length 包含一个所用索引的最长关键部分的列表。举个 Intersection 例子：

Intersection

以下代码块注释中提到的知识点略多

##无论optimizer 是否选择  index merge 取决于index  statistics. ## index  statistics 是从哪个试图获得呢？mysql.innodb_index_stats  还是 information_schema.statistics ## 还是 information_schema.INNODB_SYS_TABLESTATS？ ## 可以参考 https://www.cnblogs.com/ClassicMan/articles/15871403.html## index_dive  eq_range_index_dive_limit 这两个参数有什么作用？##意味着即使返回相同STATEMENT_DIGEST_TEXT的sql查询语句, WHERE语句后面跟不同的值，得到的查询计划可能是不一样的  ##比如select * from people where name='唯一值'；##select * from people where name='超级多的重复值'
## 同理index statistics 的改变会让同一个查询走不同的执行计划，## 体现在 select a,b from t where a=1 and b=1  有时走了 index merges，有时没走。
CREATE TABLE `payment` ( `payment_id` smallint unsigned NOT NULL, `customer_id` smallint unsigned NOT NULL, `staff_id` tinyint unsigned NOT NULL, `rental_id` int(DEFAULT NULL, `amount` decimal(5,2) NOT NULL, `payment_date` datetime NOT NULL, `last_update` timestamp NULL, PRIMARY KEY (`payment_id`), KEY `idx_fk_staff_id` (`staff_id`), KEY `idx_fk_customer_id` (`customer_id`), KEY `fk_payment_rental` (`rental_id`)) ENGINE=InnoDB;
## case1  等值查询SELECT *  FROM sakila.payment WHERE staff_id = 1   AND customer_id = 75;  mysql> EXPLAIN SELECT *                  FROM sakila.payment                 WHERE staff_id = 1                   AND customer_id = 75\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_customer_id,idx_fk_staff_id key_len: 2,1 ref: NULL rows: 20 filtered: 100 Extra: Using intersect(idx_fk_customer_id,idx_fk_staff_id); Using where 1 row in set, 1 warning (0.0007 sec) mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE staff_id = 1 AND customer_id = 75\G**************************** 1. row ****************************EXPLAIN: -> Filter: ((sakila.payment.customer_id = 75) and (sakila.payment.staff_id = 1)) (cost=14.48 rows=20)             -> Index range scan on payment using intersect(idx_fk_customer_id,idx_fk_staff_id) (cost=14.48 rows=20)1 row in set (0.0004 sec)
##注意"Index range scan on payment",两个等值查询条件，为啥触发了rang scan？

## case2  下面的sql范围查询也能用到index  merge 吗？执行计划 自己下去测试验证SELECT * FROM sakila.payment WHERE payment_id > 10 AND customer_id = 318;

复制代码

Union Algorithm

##case1 等值查询SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318;  mysql> EXPLAIN SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_merge possible_keys: idx_fk_staff_id,idx_fk_customer_id key: idx_fk_staff_id,idx_fk_customer_id key_len: 1,2 ref: NULL rows: 8069 filtered: 100 Extra: Using union(idx_fk_staff_id,idx_fk_customer_id); Using where1 row in set, 1 warning (0.0008 sec)  mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE staff_id = 1 OR customer_id = 318\G**************************** 1. row ****************************EXPLAIN: -> Filter: ((sakila.payment.staff_id = 1) or (sakila.payment.customer_id = 318)) (cost=2236.18 rows=8069)             -> Index range scan on payment using union(idx_fk_staff_id,idx_fk_customer_id) (cost=2236.18 rows=8069)1 row in set (0.0010 sec)
 ## case2 范围查询也能用到index  merge 吗？执行计划 自己下去测试验证，  ## 有主键参与后，和Intersection 章节的case2 执行计划中用到的索引个数有啥不同？SELECT * FROM sakila.payment WHERE payment_id > 15000 OR customer_id = 318;

复制代码

Sort-Union Algorithm

SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10;  mysql> EXPLAIN SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: payment partitions: NULL type: index_mergepossible_keys: idx_fk_customer_id,fk_payment_rental key: idx_fk_customer_id,fk_payment_rental key_len: 2,5 ref: NULL rows: 826 filtered: 100 Extra: Using sort_union(idx_fk_customer_id,fk_payment_rental); Using where 1 row in set, 1 warning (0.0009 sec)
mysql> EXPLAIN FORMAT=TREE SELECT * FROM sakila.payment WHERE customer_id < 30 OR rental_id < 10\G**************************** 1. row *****************************EXPLAIN: -> Filter: ((sakila.payment.customer_id < 30) or (sakila.payment.rental_id < 10)) (cost=1040.52 rows=826)             -> Index range scan on payment using sort_union(idx_fk_customer_id,fk_payment_rental) (cost=1040.52 rows=826)1 row in set (0.0005 sec)

复制代码

Multi-Range Read (MRR)

多范围读取（MRR）优化旨在减少对辅助索引进行范围扫描所导致的随机 I/O 量。优化读取索引

首先，根据行 id（InnoDB 的聚集索引）对键进行排序，然后按行的存储顺序检索行。多量程读取优化

可以用于范围扫描和使用索引的等值连接。不支持虚拟生成列上的辅助索引。

使用 InnoDB 进行多范围读取优化的主要用例是用于没有覆盖索引的磁盘绑定查询（ disk-bound queries 另外一个层面对 disk-bound 的优化，详细可见：https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html）。优化的效果取决于需要多少行以及存储器的查找时间。MySQL 将会估算（estimate）是否有用。然而，成本估算在过于悲观而不是过于乐观的一面，因此可能有必要提供帮助优化器做出正确决策的信息。

有两个 optimizer switches 控制 MRR 优化

mrr: Whether the optimizer is allowed to use the Multi-Range Read optimization. The default is ON.
mrr_cost_based: Whether the decision to use the Multi-Range Read optimization is cost based. You can disable this option to always use the optimization when it is supported. The default is ON

可以用 MRR() 和 NO_MRR() 两个 optimizer switches 来控制表级别 or 索引级别的 Multi-Range Read ，举个例子：

mysql> EXPLAIN SELECT /*+ MRR(city) */ * FROM world.city WHERE CountryCode BETWEEN 'AUS' AND 'CHN'\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: city partitions: NULL type: rangepossible_keys: CountryCode key: CountryCode key_len: 3 ref: NULL rows: 812 filtered: 100 Extra: Using index condition; Using MRR1 row in set, 1 warning (0.0006 sec)

有必要使用MRR（）优化器提示或禁用基于MRR_cost_based的优化器开关。示例中查询的估计行数太小，没有MRR的hint时，基于成本的优化无法使用MRR。只能显示用hint来干预查询计划使用MRR。当MRR的优化被使用时, MySQL需要用到random read buffer来存储indexes. 有一个参数可以影响MRR的性能 read_rnd_buffer_size.

复制代码

Batched Key Access (BKA)

可以简单认为 BKA=BNL+MRR .这使得可以以与非索引连接类似的方式将连接缓冲区用于索引连接，并使用多范围读取优化来减少随机 I/O 的数量。BKA 用于大量 disk-bound 查询的场景。但是，没有明确的说明来确定优化何时有帮助，何时会导致性能下降。

可以借鉴一下国外知名 dba 在 MySQL 优化方面的 blog（http://oysteing.blogspot.com/2012/04/improved-dbt-3-results-with-mysql-565.html）

MRR 在最优时，查询耗时减少 20%,最糟糕时查询耗时增加 2/3。

BKA 主要的获益在一个相对较窄的查询范围，而其他查询的性能可能会降低，因此默认情况下禁用该优化。

（可以简单理解为 MySQL5.6 时，bka 优化带来的收益小于 bka 带来的成本开销）除非确定开启 bka 能来提升时，再用 hint BKA() 来启用。session 级别开启：

SET SESSION optimizer_switch = 'mrr=on,mrr_cost_based=off,batched_key_access=on';

复制代码

一个使用 BKA 的例子

mysql> EXPLAIN SELECT /*+ BKA(ci) */ co.Code, co.Name AS Country, ci.Name AS City FROM world.country co INNER JOIN world.city ci ON ci.CountryCode = co.Code\G**************************** 1. row ***************************** id: 1 select_type: SIMPLE table: co partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 239 filtered: 100 Extra: NULL**************************** 2. row ***************************** id: 1 select_type: SIMPLE table: ci partitions: NULL type: ref possible_keys: CountryCode key: CountryCode key_len: 3 ref: world.co.Code rows: 18 filtered: 100 Extra: Using join buffer (Batched Key Access)2 rows in set, 1 warning (0.0007 sec)

复制代码

注意看执行计划中 Extra 的关键字 "Using join buffer",说明 join_buffer_size 会影响 BKA 特性的性能。从全局怎么调整 join_buffer_size，并又能充分利用上 BKA，是一个极大的挑战。调优最常见的问题，搞定了 A sql,又引出了其他问题，比如内存使用率过高。

其他 join 优化

MySQL 还自动支持其他 join 优化，一旦对查询有性能帮助，优化器会自动选择他们，一般不需要手动。

了解一下其他 join 的优化方式，有助于我们在遇到 sql 性能问题时，可以适当给与优化器，一些有用的 hint。

具体有哪些 join 优化方式，可以查看 explain 输出中的 Extra 的内容说明。本文可能列举的不全，精力有限只做了一些简单的介绍，具体细节需要查看官网，以及大量的实践。

Condition Filtering 条件过滤当一个表有两个或多个与之相关联的条件，并且一个索引可以用于部分条件时，使用条件过滤优化。启用条件过滤后，在估计表的总体过滤时，将考虑其余条件的过滤效果。
Optimizer Switch: condition_fanout_filter – enabled by default
Optimizer Hints: None
EXPLAIN Output: None
Derived Merge 优化器可以将派生表(derived table)、视图引用和公共表表达式合并到它们所属的查询块中。优化的替代方法是物化表（materialize the table）、视图引用或公共表表达式。
Optimizer Switch: derived_merge – enabled by default.
Optimizer Hints: MERGE(), NO_MERGE().
EXPLAIN Output: The query plan reflects that the derived table has been merged
Engine Condition Pushdown 此优化将条件向下推到存储引擎。目前仅 NDBCluster 存储引擎支持它。
Index Condition Pushdown
官方文档中给的例子和解释如下：people 表中（zipcode，lastname，firstname）构成一个索引 SELECT * FROM people WHERE zipcode='95054' AND lastname LIKE '%etrunia%' AND address LIKE '%Main Street%';
如果没有使用索引下推技术，则 MySQL 会通过 zipcode='95054'从存储引擎中查询对应的数据，返回到 MySQL 服务端，然后 MySQL 服务端基于 lastname LIKE '%etrunia%'和 address LIKE '%Main Street%'来判断数据是否符合条件。
如果使用了索引下推技术，则 MySQL 首先会返回符合 zipcode='95054'的索引，然后根据 lastname LIKE '%etrunia%'和 address LIKE '%Main Street%'来判断索引是否符合条件。如果符合条件，则根据该索引来定位对应的数据，如果不符合，则直接 reject 掉。
有了索引下推优化，可以在有 like 条件查询的情况下，减少回表次数。
该优化也用于二级索引的范围条件。
Optimizer Switch: index_condition_pushdown – enabled by default.
Optimizer Hints: NO_ICP().
EXPLAIN Output: The traditional format has Using index condition in the Extra column, and the JSON format sets the index_condition field with the index condition that is pushed
Index Extensions InnoDB 中的所有二级非唯一索引都将主键列附加到索引中。当启用索引扩展优化时，MySQL 会将主键列视为索引的一部分。
Optimizer Switch: use_index_extensions – enabled by default
Optimizer Hints: None
EXPLAIN Output: None
Index Visibility 当表具有不可见的索引（ invisible index）时，默认情况下，优化器在创建查询计划时不会考虑它。如果启用了索引可见性优化器开关，则将考虑不可见的索引。例如，这可以用于测试已添加但尚未可见的索引的效果。
Optimizer Switch: use_invisible_indexes – disabled by default
Optimizer Hints: None
EXPLAIN Output: None
Loose Index Scan 在某些情况下，MySQL 可以使用部分索引来提高聚合数据或包含 DISTINCT 子句的查询的性能。这要求列用于通过形成多列索引的左前缀以及不用于分组的其他列来分组数据。当存在 GROUP BY 子句时，只有 MIN（）和 MAX（）聚合函数才能使用这个特性。
distinct 效率更高还是 group by 效率更高？
Optimizer Switch: None.
Optimizer Hints: NO_RANGE_OPTIMIZATION() disables the loose index scan optimization as well as index merges and range scans.
EXPLAIN Output: The traditional format has Using index for group-by in the Extra column. The JSON format sets the using_index_for_group_by field to true.
Range Access Method 范围优化与其他优化略有不同，因为它被认为是一种访问方法。MySQL 将只扫描表或索引的一个或多个部分，而不是执行完整的表或索引扫描。范围访问方法通常用于涉及运算符>、>=、<、=<、BETWEEN、IN（），为 NULL、LIKE 等 range-Optimization 与 index merge Optimization 具有同样的重要性。（https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html）
Optimizer Switch: None.
Optimizer Hints: NO_RANGE_OPTIMIZATION() – this also disables the loose index scan and index merge optimizations. It does however not disable the skip scan optimization even though that also uses range access.
EXPLAIN Output: The access method is set to range. range_optimizer_max_mem_size 可以限制 range access 使用的内存。默认是 8M
Semijoin 半联接优化用于 IN 和 EXIST 条件。支持四个策略:
当 subquery materialization 开启时， Semijoin 会尽可能的使用 materialization 策略。EXISTS 在 MySQL8.0.16 以后支持半联接。NOT EXISTS 在 MySQL8.0.17 以后支持半联接。每种策略，都可以以参数的形式，用于 SEMIJOIN() and NO_SEMIJOIN() hint
SEMIJOIN(DUPSWEEDOUT):The duplicate weedout strategy executes the semijoin as if it is a normal join and removes the duplicates using a temporary table. EXPLAIN Output: The traditional format has Start temporary and End temporary in the Extra column for the tables involved. The JSON-formatted output uses a block named duplicates_removal
SEMIJOIN(FIRSTMATCH)：The first match strategy returns the first match for each value rather than all values. EXPLAIN Output: The traditional format has FirstMatch(...) in the Extra column where the value between parentheses is the name of the reference table. The JSON format sets the value of the first_match field to the name of the reference table
SEMIJOIN(LOOSESCAN)：The loose scan strategy uses an index to choose a single value from each of the subquery’s value groups. EXPLAIN Output: The traditional format has LooseScan(m..n) in the Extra column where m and n indicate which parts of the index are used for the loose scan. The JSON format sets the loosescan field equal to true
半连接特别怕 null 值，Oracle 经常在以下方面出问题：
where null in (a,b,c,null), null exists (null) .
sum（null) 返回 null,count(null) 返回 0
materialization
duplicate weedout
first match
loose scan (不要和 loose index scan optimization 混淆)。
Skip Scan
Skip Scan MySQL 8.0.13 引入，工作方式类似 loose index scan.当多列索引的第二列上存在范围条件，但第一列上没有条件时使用。Skip Scan 将整个索引扫描转换为一系列范围扫描（对索引中第一列的每个值进行一次范围扫描）。
scan in the Extra column, and the JSON format sets the using_index_for_skip_scan field to true
Optimizer Switch: skip_scan – enabled by default.
Optimizer Hints: SKIP_SCAN(), NO_SKIP_SCAN().
EXPLAIN Output: The traditional format has Using index for skip
Subquery Materialization
子查询物化策略将子查询的结果存储在内部临时表中。如果可能的话，优化器将在临时表上添加一个自动生成的哈希索引，将使其快速连接到查询的其余部分。
当启用了 subquery_materialization_cost_based 优化器开关（默认开）时，优化器将使用估计的成本来决定是使用 Subquery Materialization 还是使用 IN-to-EXIST 子查询转换（将 IN 条件重写为 EXISTS）。
当开关关闭时，优化器总是选择 Subquery Materialization。
Optimizer Switch: materialization – enabled by default.
Optimizer Hints: SUBQUERY(MATERIALIZATION).
EXPLAIN Output: The traditional format has MATERIALIZED as the select type. The JSON format creates a block named materialized_from_subquery.

还有我们可以用哪些方法影响优化器，下篇文章再续。

发布于: 刚刚阅读数: 3

GreatSQL

关注

GreatSQL社区 2023-01-31 加入

GreatSQL是由万里数据库维护的MySQL分支，专注于提升MGR可靠性及性能，支持InnoDB并行查询特性，是适用于金融级应用的MySQL分支版本。社区：https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL

发布

暂无评论

创作场景

MySQL8.0 优化器介绍（三）

join 优化（Join Optimizations）

index merge

Intersection

Union Algorithm

Sort-Union Algorithm

Multi-Range Read (MRR)

Batched Key Access (BKA)

其他 join 优化

GreatSQL

评论