cost 量化分析
- 2023-03-09 福建
本文字数:23662 字
阅读完需:约 78 分钟
GreatSQL 社区原创内容未经授权不得随意使用,转载请联系小编并注明来源。
GreatSQL 是 MySQL 的国产分支版本,使用上与 MySQL 一致。
作者: xryz
文章来源:GreatSQL 社区原创
前言:
我们在日常维护数据库的时候,经常会遇到查询慢的语句,这时候一般会通过执行 EXPLAIN 去查看它的执行计划,但是执行计划往往只给我们带来了最基础的分析信息,比如是否有使用索引,还有一些其他供我们分析的信息,比如使用了临时表、排序等等,却无法展示为什么一些其他的执行计划未被选择,比如说明明有索引,或者好几个索引,但是为什么查询时未使用到期望的索引等
explain select * from basic_person_info t1 join basic_person_info2 t2 on t1.id_num=t2.id_num where t1.age >10 and t2.age<20;
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | id_num_unique,idx_age,idx_age_id_num | idx_age | 1 | NULL | 9594 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | eq_ref | id_num_unique,idx_age | id_num_unique | 60 | test.t2.id_num | 1 | 50.00 | Using where |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
2 rows in set, 1 warning (0.01 sec)
如上面这个例子,为什么 t2 表上列出了多个可能使用的索引,却选择了 idx_age,优化器为什么选择了指定的索引,这时候并不能直观的看出问题,这时候我们就可以开启 optimizer_trace 跟踪分析 MySQL 具体是怎么选择出最优的执行计划的。
OPTIMIZER_TRACE:
optimizer_trace 是什么:
optimizer_trace 是一个具有跟踪功能的工具,可以跟踪执行的语句的解析优化执行过程,并将跟踪到的信息记录到 INFORMATION_SCHEMA.OPTIMIZER_TRACE 表中,但是每个会话都只能跟踪它自己执行的语句,并且表中默认只记录最后一个查询的跟踪结果
使用方法:
# 打开optimizer trace功能 (默认情况下它是关闭的):
set optimizer_trace="enabled=on";
select ...; # 这里输入你自己的查询语句
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# 当你停止查看语句的优化过程时,把optimizer trace功能关闭
set optimizer_trace="enabled=off";
相关参数:
mysql> show variables like '%optimizer_trace%';
+------------------------------+----------------------------------------------------------------------------+
| Variable_name | Value |
+------------------------------+----------------------------------------------------------------------------+
| optimizer_trace | enabled=on,one_line=off |
| optimizer_trace_features | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |
| optimizer_trace_limit | 1 |
| optimizer_trace_max_mem_size | 1048576 |
| optimizer_trace_offset | -1 |
+------------------------------+----------------------------------------------------------------------------+
optimizer_trace: enabled 开启/关闭 optimizer_trace,one_line 是否单行显示,关闭为 json 模式,一般不开启
optimizer_trace_features:跟踪信息中可打印的项,一般不调整默认打印所有项
optimizer_trace_limit:存储的跟踪 sql 条数
optimizer_trace_offset:开始记录的 sql 语句的偏移量,负数表示从最近执行倒数第几条开始记录
optimizer_trace_max_mem_size:optimizer_trace 内存的大小,如果跟踪信息超过这个大小,信息将会被截断
optimizer_trace 表信息:
该表总共有 4 个字段
QUERY 表示我们的查询语句。
TRACE 表示优化过程的 JSON 格式文本。(重点关注)
MISSING_BYTES_BEYOND_MAX_MEM_SIZE 由于优化过程可能会输出很多,如果超过某个限制时,多余的文本将不会被显示,这个字段展示了被忽略的文本字节数。
INSUFFICIENT_PRIVILEGES 表示是否没有权限查看优化过程,默认值是 0,只有某些特殊情况下才会是 1,我们暂时不关心这个字段的值。
信息解读:
通过 optimizer_trace 表的 query 字段可以看到,一条语句的执行过程主要分为三个步骤:
"join_preparation": {},(准备阶段)
"join_optimization": {},(优化阶段)
"join_execution": {},(执行阶段)
各个步骤的详细内容解读:
preparation:
expanded_query :将语句进行格式化,补充隐藏的列名和表名等
transformations_to_nested_joins :查询重写,比如join的on改为where语句
optimization:
condition_processing{ :条件句处理。
transformation{:转换类型句。这三次的转换分别是
equality_propagation(等值条件句转换),如:a = b and b = c and c = 5
constant_propagation(常量条件句转换),如:a = 1 AND b > a
trivial_condition_removal(无效条件移除的转换),如:1 = 1
}
}
substitute_generated_columns :替换虚拟生成列,测试了很多sql,这一列都没有看到有用的信息
table_dependencies :梳理表之间的依赖关系。
ref_optimizer_key_uses :如果优化器认为查询可以使用ref的话,在这里列出可以使用的索引。
rows_estimation{ :估算表行数和扫描的代价。如果查询中存在range扫描的话,对range扫描进行计划分析及代价估算。
table_scan:全表扫描的行数(rows)以及所需要的代价(cost)。
potential_range_indexes:该阶段会列出表中所有的索引并分析其是否可用,并且还会列出索引中可用的列字段。
analyzing_range_alternatives :分析可选方案的代价。
}
considered_execution_plans{ :对比各可行计划的代价,选择相对最优的执行计划。
plan_prefix:前置的执行计划。
best_access_path:当前最优的执行顺序信息结果集。
access_type表示使用索引的方式,可参照为explain中的type字段。
condition_filtering_pct:类似于explain中的filtered列,这是一个估算值。
rows_for_plan:该执行计划最终的扫描行数,这里的行数其实也是估算值,是由considered_access_paths的resulting_rows相乘之后再乘以condition_filtering_pct获得。
cost_for_plan:该执行计划的执行代价,由considered_access_paths的cost相加而得。
chosen:是否选择了该执行计划。
}
attaching_conditions_to_tables :添加附加条件,使得条件尽可能筛选单表数据。
refine_plan :优化后的执行计划。
trace 信息中的 json 信息很长,因为我们关心的是不同执行计划的 cost 区别,所以只需要重点关注两个部分 rows_estimation 和 considered_execution_plans
代价模型计算:
统计信息和 cost 计算参数:
计算 cost 会涉及到表的主键索引数据页(聚簇索引)数量和表中的记录数,两个信息都可以通过 innodb 的表统计信息 mysql.innodb_table_stats 查到,n_rows 是记录数,clustered_index_size 是聚簇索引页数。
mysql> select * from mysql.innodb_table_stats where table_name='basic_person_info';
+---------------+-------------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+-------------------+---------------------+--------+----------------------+--------------------------+
| test | basic_person_info | 2022-12-23 18:27:24 | 86632 | 737 | 1401 |
+---------------+-------------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.01 sec)
代价模型将操作分为 Server 层和 Engine(存储引擎)层两类,Server 层主要是 CPU 代价,Engine 层主要是 IO 代价,比如 MySQL 从磁盘读取一个数据页的代价 io_block_read_cost 为 1,从 buffer pool 读取的代价 memory_block_read_cost 为 0.25,计算符合条件的行代价为 row_evaluate_cost 为 0.1,除此之外还有:
memory_temptable_create_cost (default 1.0) 内存临时表的创建代价。
memory_temptable_row_cost (default 0.1) 内存临时表的行代价。
key_compare_cost (default 0.1) 键比较的代价,例如排序。
disk_temptable_create_cost (default 20.0) 内部 myisam 或 innodb 临时表的创建代价。
disk_temptable_row_cost (default 0.5) 内部 myisam 或 innodb 临时表的行代价。
这些都可以通过 mysql.server_cost、mysql.engine_cost 查看 defalt 值和设置值
mysql> select * from mysql.server_cost;
+------------------------------+------------+---------------------+---------+---------------+
| cost_name | cost_value | last_update | comment | default_value |
+------------------------------+------------+---------------------+---------+---------------+
| disk_temptable_create_cost | NULL | 2022-05-11 16:09:37 | NULL | 20 |
| disk_temptable_row_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.5 |
| key_compare_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.05 |
| memory_temptable_create_cost | NULL | 2022-05-11 16:09:37 | NULL | 1 |
| memory_temptable_row_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.1 |
| row_evaluate_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.1 |
+------------------------------+------------+---------------------+---------+---------------+
mysql> select * from mysql.engine_cost;
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| engine_name | device_type | cost_name | cost_value | last_update | comment | default_value |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| default | 0 | io_block_read_cost | NULL | 2022-05-11 16:09:37 | NULL | 1 |
| default | 0 | memory_block_read_cost | NULL | 2023-01-09 11:17:39 | NULL | 0.25 |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
计算公式:
如上面介绍的一样,代价模型将操作分为两类 io_cost 和 cpu_cost,io_cost+cpu_cost 就是总的 cost,下面是具体的计算方法:
全表扫描:
全表扫描成本 = io_cost + 1.1 + cpu_cost + 1 (io_cost +1.1 和 cpu_cost +1 在代码里是直接硬加上的,不知道为什么,计算的时候直接加上)
io_cost = clustered_index_size (统计信息中的主键页数) * avg_single_page_cost(读取一个页的平均成本)
avg_single_page_cost = pages_in_memory_percent * 0.25(memory_block_read_cost) + pages_on_disk_percent * 1.0(io_block_read_cost)
pages_in_memory_percent 表示已经加载到 Buffer Pool 中的叶结点占所有叶结点的比例 pages_on_disk_percent 表示没有加载到 Buffer Pool 中的叶结点占所有叶结点的比例
所以当数据已经全部读取到 buffer pool 中的时候:
io_cost=clustered_index_size * 0.25
都没有读取到 buffer pool 中的时候:
io_cost=clustered_index_size * 1.0
当部分数据在 buffer pool 中,部分数据需要从磁盘读取时,这时的系数介于 0.25 到 1 之间
cpu_cost = n_rows(统计信息中记录数) * 0.1(row_evaluate_cost)
走索引的成本:
和全表扫描的计算方法类似,其中 io_cost 与搜索的区间数有关,比如扫描三个区间 where a between 1 and 10 or a between 20 and 30 or a between 40 and 50,此时:
io_cost=3 * avg_single_page_cost
cpu_cost=记录数 * 0.1(row_evaluate_cost)+0.01(代码中的微调参数)
针对二级索引还会有回表的操作:
MySQL 认为每次回表都相当于是访问一个页面,所以每次回表都会进行一次 IO,这部分成本:
io_cost=rows(记录数)*avg_single_page_cost
对回表查询的数据还需要进行一次计算:
cpu_cost=rows(记录数) * 0.1(row_evaluate_cost)(需要注意的是当索引需要回表扫描时,在 rows_estimation 阶段并不会计算这个值,在 considered_execution_plans 阶段会重新加上这部分成本)
所以针对需要回表的查询:
io_cost=查询区间 * avg_single_page_cost + rows(记录数) * avg_single_page_cost
cpu_cost=记录数 * 0.1(row_evaluate_cost) + 0.01(代码中的微调参数) + rows(记录数) * 0.1(row_evaluate_cost)
例子:
mysql> set optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)
mysql>explain select * from basic_person_info t1 join basic_person_info2 t2 on t1.id_num=t2.id_num where t1.age >10 and t2.age<20;
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
| 1 | SIMPLE | t2 | NULL | range | id_num_unique,idx_age,idx_age_id_num | idx_age | 1 | NULL | 9594 | 100.00 | Using index condition |
| 1 | SIMPLE | t1 | NULL | eq_ref | id_num_unique,idx_age | id_num_unique | 60 | test.t2.id_num | 1 | 50.00 | Using where |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+------+----------+-----------------------+
2 rows in set, 1 warning (0.04 sec)
查看 optimizer_trace 的内容
select * from basic_person_info t1 join basic_person_info2 t2 on t1.id_num=t2.id_num where t1.age >10 and t2.age<20 | {
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"expanded_query": "/* select#1 */ select `t1`.`id` AS `id`,`t1`.`id_num` AS `id_num`,`t1`.`lastname` AS `lastname`,`t1`.`firstname` AS `firstname`,`t1`.`mobile` AS `mobile`,`t1`.`sex` AS `sex`,`t1`.`birthday` AS `birthday`,`t1`.`age` AS `age`,`t1`.`top_education` AS `top_education`,`t1`.`address` AS `address`,`t1`.`income_by_year` AS `income_by_year`,`t1`.`create_time` AS `create_time`,`t1`.`update_time` AS `update_time`,`t2`.`id` AS `id`,`t2`.`id_num` AS `id_num`,`t2`.`lastname` AS `lastname`,`t2`.`firstname` AS `firstname`,`t2`.`mobile` AS `mobile`,`t2`.`sex` AS `sex`,`t2`.`birthday` AS `birthday`,`t2`.`age` AS `age`,`t2`.`top_education` AS `top_education`,`t2`.`address` AS `address`,`t2`.`income_by_year` AS `income_by_year`,`t2`.`create_time` AS `create_time`,`t2`.`update_time` AS `update_time` from (`basic_person_info` `t1` join `basic_person_info2` `t2` on((`t1`.`id_num` = `t2`.`id_num`))) where ((`t1`.`age` > 10) and (`t2`.`age` < 20))"
},
{
"transformations_to_nested_joins": {
"transformations": [
"JOIN_condition_to_WHERE",
"parenthesis_removal"
],
"expanded_query": "/* select#1 */ select `t1`.`id` AS `id`,`t1`.`id_num` AS `id_num`,`t1`.`lastname` AS `lastname`,`t1`.`firstname` AS `firstname`,`t1`.`mobile` AS `mobile`,`t1`.`sex` AS `sex`,`t1`.`birthday` AS `birthday`,`t1`.`age` AS `age`,`t1`.`top_education` AS `top_education`,`t1`.`address` AS `address`,`t1`.`income_by_year` AS `income_by_year`,`t1`.`create_time` AS `create_time`,`t1`.`update_time` AS `update_time`,`t2`.`id` AS `id`,`t2`.`id_num` AS `id_num`,`t2`.`lastname` AS `lastname`,`t2`.`firstname` AS `firstname`,`t2`.`mobile` AS `mobile`,`t2`.`sex` AS `sex`,`t2`.`birthday` AS `birthday`,`t2`.`age` AS `age`,`t2`.`top_education` AS `top_education`,`t2`.`address` AS `address`,`t2`.`income_by_year` AS `income_by_year`,`t2`.`create_time` AS `create_time`,`t2`.`update_time` AS `update_time` from `basic_person_info` `t1` join `basic_person_info2` `t2` where ((`t1`.`age` > 10) and (`t2`.`age` < 20) and (`t1`.`id_num` = `t2`.`id_num`))"
}
}
]
}
},
{
"join_optimization": {
"select#": 1,
"steps": [
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`t1`.`age` > 10) and (`t2`.`age` < 20) and (`t1`.`id_num` = `t2`.`id_num`))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "((`t1`.`age` > 10) and (`t2`.`age` < 20) and multiple equal(`t1`.`id_num`, `t2`.`id_num`))"
},
{
"transformation": "constant_propagation",
"resulting_condition": "((`t1`.`age` > 10) and (`t2`.`age` < 20) and multiple equal(`t1`.`id_num`, `t2`.`id_num`))"
},
{
"transformation": "trivial_condition_removal",
"resulting_condition": "((`t1`.`age` > 10) and (`t2`.`age` < 20) and multiple equal(`t1`.`id_num`, `t2`.`id_num`))"
}
]
}
},
{
"substitute_generated_columns": {
}
},
{
"table_dependencies": [
{
"table": "`basic_person_info` `t1`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": [
]
},
{
"table": "`basic_person_info2` `t2`",
"row_may_be_null": false,
"map_bit": 1,
"depends_on_map_bits": [
]
}
]
},
{
"ref_optimizer_key_uses": [
{
"table": "`basic_person_info` `t1`",
"field": "id_num",
"equals": "`t2`.`id_num`",
"null_rejecting": true
},
{
"table": "`basic_person_info2` `t2`",
"field": "id_num",
"equals": "`t1`.`id_num`",
"null_rejecting": true
}
]
},
{
"rows_estimation": [
{
"table": "`basic_person_info` `t1`",
"range_analysis": {
"table_scan": {
"rows": 86734,
"cost": 8859.75
t1表的scan成本=聚簇索引页数*0.25 + 行数 * 0.1 +1.1+1
737*0.25+1.1+86734*0.1+1=8859.75
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": false,
"cause": "not_applicable"
},
{
"index": "id_num_unique",
"usable": false,
"cause": "not_applicable"
},
{
"index": "mobile_unique",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_name",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_top_education",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_create_time",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_mobile",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_age",
"usable": true,
"key_parts": [
"age",
"id"
]
}
],
"setup_range_conditions": [
],
"group_index_range": {
"chosen": false,
"cause": "not_single_table"
},
"skip_scan_range": {
"chosen": false,
"cause": "not_single_table"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "idx_age",
"ranges": [
"10 < age"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 43367,
"cost": 15178.7,
通过索引idx_age读取数据:
io_cost=区间数* 0.25 +记录数* 0.25
io_cost=1*0.25+43367*0.25=10,842
cpu_cost=记录数* 0.1 (没有回表的cost)
cpu_cost=43367*0.1=4,336.7
cost=10842+4,336.7=15178.7
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
}
}
},
{
"table": "`basic_person_info2` `t2`",
"range_analysis": {
"table_scan": {
"rows": 73845,
"cost": 7538.85
t2表的scan成本=聚簇索引页数*0.25 + 行数 * 0.1 +1.1+1
609*0.25+1+73845*0.1+1.1=7538.85
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": false,
"cause": "not_applicable"
},
{
"index": "id_num_unique",
"usable": false,
"cause": "not_applicable"
},
{
"index": "mobile_unique",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_name",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_top_education",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_create_time",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_basic_person_info_mobile",
"usable": false,
"cause": "not_applicable"
},
{
"index": "idx_age",
"usable": true,
"key_parts": [
"age",
"id"
]
},
{
"index": "idx_age_id_num",
"usable": true,
"key_parts": [
"age",
"id_num",
"id"
]
}
],
"setup_range_conditions": [
],
"group_index_range": {
"chosen": false,
"cause": "not_single_table"
},
"skip_scan_range": {
"chosen": false,
"cause": "not_single_table"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "idx_age",
"ranges": [
"age < 20"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 9594,
"cost": 3358.16,
通过索引idx_age读取数据:
io_cost=区间数* 0.25 +记录数* 0.25
io_cost=1*0.25+9594*0.25=2,398.75
cpu_cost=记录数* 0.1 (没有回表的cost)
cpu_cost=9594*0.1959.4
cost=2,398.75+959.4=3,358.15
"chosen": true
},
{
"index": "idx_age_id_num",
"ranges": [
"age < 20"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 19086,
"cost": 6680.36,
通过索引idx_age_id_num读取数据:
io_cost=区间数* 0.25 +记录数* 0.25
io_cost=1*0.25+19086*0.25=4,771.75
cpu_cost=记录数* 0.1 (没有回表的cost)
cpu_cost=19086*0.1=1908.6
cost=4,771.75+1908.6=6,680.35
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "idx_age",
"rows": 9594,
"ranges": [
"age < 20"
]
},
"rows_for_plan": 9594,
"cost_for_plan": 3358.16,
"chosen": true
}
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`basic_person_info2` `t2`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "id_num_unique",
"usable": false,
"chosen": false
},
{
"rows_to_scan": 9594,
"filtering_effect": [
],
"final_filtering_effect": 1,
"access_type": "range",
"range_details": {
"used_index": "idx_age"
},
"resulting_rows": 9594,
"cost": 4317.56,
通过索引idx_age读取数据:
io_cost=区间数* 0.25 +记录数* 0.25
io_cost=1*0.25+9594*0.25=2,398.75
cpu_cost=记录数* 0.1 + 记录数* 0.1
cpu_cost=9594*0.1*2=1,918.8
cost=2,398.75+1,918.8=4317.56
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 9594,
"cost_for_plan": 4317.56,
"rest_of_plan": [
{
"plan_prefix": [
"`basic_person_info2` `t2`"
],
"table": "`basic_person_info` `t1`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "eq_ref",
"index": "id_num_unique",
"rows": 1,
"cost": 3357.9,
io_cost=t2表记录数*0.25=9594*0.25=2398.5
cpu_cost=记录数*0.1=9594*0.1=959.4
cost=2398.5+959.4=3357.9
"chosen": true
},
{
"rows_to_scan": 86734,
"filtering_effect": [
],
"final_filtering_effect": 0.5,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 14,
"resulting_rows": 43367,
"cost": 4.16701e+07,
"chosen": false
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 9594,
"cost_for_plan": 7675.46,
总cost=4,317.56+3,357.9=7,675.46
"chosen": true
}
]
},
{
"plan_prefix": [
],
"table": "`basic_person_info` `t1`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "id_num_unique",
"usable": false,
"chosen": false
},
{
"rows_to_scan": 86734,
"filtering_effect": [
],
"final_filtering_effect": 0.5,
"access_type": "scan",
"resulting_rows": 43367,
"cost": 8857.65,
t1的scan成本
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 43367,
"cost_for_plan": 8857.65,
"pruned_by_cost": true
放弃后续的计算
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": "((`t1`.`id_num` = `t2`.`id_num`) and (`t1`.`age` > 10) and (`t2`.`age` < 20))",
"attached_conditions_computation": [
],
"attached_conditions_summary": [
{
"table": "`basic_person_info2` `t2`",
"attached": "(`t2`.`age` < 20)"
},
{
"table": "`basic_person_info` `t1`",
"attached": "((`t1`.`id_num` = `t2`.`id_num`) and (`t1`.`age` > 10))"
}
]
}
},
{
"finalizing_table_conditions": [
{
"table": "`basic_person_info2` `t2`",
"original_table_condition": "(`t2`.`age` < 20)",
"final_table_condition ": "(`t2`.`age` < 20)"
},
{
"table": "`basic_person_info` `t1`",
"original_table_condition": "((`t1`.`id_num` = `t2`.`id_num`) and (`t1`.`age` > 10))",
"final_table_condition ": "(`t1`.`age` > 10)"
}
]
},
{
"refine_plan": [
{
"table": "`basic_person_info2` `t2`",
"pushed_index_condition": "(`t2`.`age` < 20)",
"table_condition_attached": null
},
{
"table": "`basic_person_info` `t1`"
}
]
}
]
}
},
{
"join_execution": {
"select#": 1,
"steps": [
]
}
}
]
}
成本常数修改:
前面已经介绍了成本常量值实际上存放在 MySQL 自带的系统库 MySQL 中的 server_cost 和 engine_cost 表中,其中 server_cost 表存放 server 层的成本常量,engine_cost 表存放 engine 层成本常量
mysql> select * from mysql.server_cost;
+------------------------------+------------+---------------------+---------+---------------+
| cost_name | cost_value | last_update | comment | default_value |
+------------------------------+------------+---------------------+---------+---------------+
| disk_temptable_create_cost | NULL | 2022-05-11 16:09:37 | NULL | 20 |
| disk_temptable_row_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.5 |
| key_compare_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.05 |
| memory_temptable_create_cost | NULL | 2022-05-11 16:09:37 | NULL | 1 |
| memory_temptable_row_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.1 |
| row_evaluate_cost | NULL | 2022-05-11 16:09:37 | NULL | 0.1 |
+------------------------------+------------+---------------------+---------+---------------+
mysql> select * from mysql.engine_cost;
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| engine_name | device_type | cost_name | cost_value | last_update | comment | default_value |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
| default | 0 | io_block_read_cost | NULL | 2022-05-11 16:09:37 | NULL | 1 |
| default | 0 | memory_block_read_cost | NULL | 2023-01-09 11:17:39 | NULL | 0.25 |
+-------------+-------------+------------------------+------------+---------------------+---------+---------------+
其中 default_value 的值是系统默认的,不能修改,cost_value 列的值我们可以修改,如果 cost_value 列的值不为空系统将用该值覆盖默认值,我们可以通过 update 语句来修改
mysql> update mysql.engine_cost set cost_value=10 where cost_name='memory_block_read_cost';
Query OK, 0 rows affected (0.00 sec)
mysql> update mysql.engine_cost set cost_value=10 where cost_name='io_block_read_cost';
Query OK, 0 rows affected (0.00 sec)
很多资料都说执行 flush optimizer_costs 就可以生效,不过我在修改完后并执行 flush optimizer_costs 并不能马上生效,最后是通过重启数据库实例才生效,这个可能是数据库版本的差异,大家可以自行验证。
mysql> explain select * from basic_person_info t1 join basic_person_info2 t2 on t1.id_num=t2.id_num where t1.age >10 and t2.age<20;
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+-------+----------+-------------+
| 1 | SIMPLE | t2 | NULL | ALL | id_num_unique,idx_age,idx_age_id_num | NULL | NULL | NULL | 73990 | 12.97 | Using where |
| 1 | SIMPLE | t1 | NULL | eq_ref | id_num_unique,idx_age | id_num_unique | 60 | test.t2.id_num | 1 | 50.00 | Using where |
+----+-------------+-------+------------+--------+--------------------------------------+---------------+---------+----------------+-------+----------+-------------+
"table": "`basic_person_info2` `t2`",
"range_analysis": {
"table_scan": {
"rows": 73990,
"cost": 13491.1
全表扫描cost=609*10+73990*0.1+1.1+1= 13491.1
},
"index": "idx_age",
"ranges": [
"age < 20"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 9594,
"cost": 96909.4,
idx_age索引扫描cost=1*10+9594*10+9594*0.1=96,909.4
"chosen": false,
"cause": "cost"
},
修改后的执行计划,发现 t2 表走了全表扫描了而没有走 idx_age 索引,分别查看一下 t2 表走全表扫描和 idx_age 索引的 cost 发现全表扫描的 cost 为 13491.1,而走索引的 cost 为 96,909.4,因为全表扫描的 cost 比走索引低,所以优化器没有选择 idx_age 索引。
从这个例子可以看出,更改成本常量值会直接影响优化器的方案选择,所以一定要慎重,没有特殊原因建议不要修改。
explain format=json
虽然通过 optimizer_trace 可以看到很多详细的优化器选择过程,但是使用起来起来还是比较麻烦,需要过滤的信息很多,这时 explain format=json 输出 json 格式的分析数据也是一个不错的选择,它也包含语句将要执行的成本信息,如下:
query_cost 总查询成本
read_cost IO成本+除 eval_cost以外cpu成本
eval_cost 检测rows * filter条记录的成本
prefix_cost 单次查询的成本,等于read_cost+eval_cost
mysql> explain format=json select * from basic_person_info t1 join basic_person_info2 t2 on t1.id_num=t2.id_num where t1.age >10 and t2.age<20;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "7675.46"
},
"nested_loop": [
{
"table": {
"table_name": "t2",
"access_type": "range",
"possible_keys": [
"id_num_unique",
"idx_age",
"idx_age_id_num"
],
"key": "idx_age",
"used_key_parts": [
"age"
],
"key_length": "1",
"rows_examined_per_scan": 9594,
"rows_produced_per_join": 9594,
"filtered": "100.00",
"index_condition": "(`test`.`t2`.`age` < 20)",
"cost_info": {
"read_cost": "3358.16",
包含所有io成本+(cpu成本-eval_cost)
"eval_cost": "959.40",
计算扇出的cpu成本,优化器利用启发式规则估算出满足所有条件的的比例(filtered)
=rows_examined_per_scan*filtered*0.1
"prefix_cost": "4317.56",
单表查询的总成本
"data_read_per_join": "3M"
},
"used_columns": [
"id",
"id_num",
"lastname",
"firstname",
"mobile",
"sex",
"birthday",
"age",
"top_education",
"address",
"income_by_year",
"create_time",
"update_time"
]
}
},
{
"table": {
"table_name": "t1",
"access_type": "eq_ref",
"possible_keys": [
"id_num_unique",
"idx_age"
],
"key": "id_num_unique",
"used_key_parts": [
"id_num"
],
"key_length": "60",
"ref": [
"test.t2.id_num"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 4797,
"filtered": "50.00",
"cost_info": {
"read_cost": "2398.50",
包含所有io成本+(cpu成本-eval_cost)
"eval_cost": "479.70",
计算扇出的cpu成本,优化器利用启发式规则估算出满足所有条件的的比例(filtered)
=rows_examined_per_scan*filtered*0.1
"prefix_cost": "7675.46",
两表查询的总cost
"data_read_per_join": "1M"
},
"used_columns": [
"id",
"id_num",
"lastname",
"firstname",
"mobile",
"sex",
"birthday",
"age",
"top_education",
"address",
"income_by_year",
"create_time",
"update_time"
],
"attached_condition": "(`test`.`t1`.`age` > 10)"
}
}
]
}
}
另外,explain 结合 show warnings 语句一起使用还可以得知优化器改写后的语句
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t1`.`id` AS `id`,`test`.`t1`.`id_num` AS `id_num`,`test`.`t1`.`lastname` AS `lastname`,`test`.`t1`.`firstname` AS `firstname`,`test`.`t1`.`mobile` AS `mobile`,`test`.`t1`.`sex` AS `sex`,`test`.`t1`.`birthday` AS `birthday`,`test`.`t1`.`age` AS `age`,`test`.`t1`.`top_education` AS `top_education`,`test`.`t1`.`address` AS `address`,`test`.`t1`.`income_by_year` AS `income_by_year`,`test`.`t1`.`create_time` AS `create_time`,`test`.`t1`.`update_time` AS `update_time`,`test`.`t2`.`id` AS `id`,`test`.`t2`.`id_num` AS `id_num`,`test`.`t2`.`lastname` AS `lastname`,`test`.`t2`.`firstname` AS `firstname`,`test`.`t2`.`mobile` AS `mobile`,`test`.`t2`.`sex` AS `sex`,`test`.`t2`.`birthday` AS `birthday`,`test`.`t2`.`age` AS `age`,`test`.`t2`.`top_education` AS `top_education`,`test`.`t2`.`address` AS `address`,`test`.`t2`.`income_by_year` AS `income_by_year`,`test`.`t2`.`create_time` AS `create_time`,`test`.`t2`.`update_time` AS `update_time` from `test`.`basic_person_info` `t1` join `test`.`basic_person_info2` `t2` where ((`test`.`t1`.`id_num` = `test`.`t2`.`id_num`) and (`test`.`t1`.`age` > 10) and (`test`.`t2`.`age` < 20)) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
总结:
MySQL 的优化器是基于成本来选择最优执行方案的,哪个成本最少就选哪个,所以重点在于计算出各个执行计划的 cost
成本由 CPU 成本和 IO 成本组成,每个成本常数值可以自己调整,非必要的情况下不要调整,以免影响整个数据库的执行计划选择
通过开启 optimizer_trace 可以跟踪优化器的各个环节的分析步骤,可以判断有时候为什么没有走索引而走了全表扫描
explain 加上 format=json 选项后可以查看成本信息分为 read_cost 和 eval_cost,但只能看到当前已经选择的执行计划,另外通过 show warnings 可以看到优化器改写后的语句
Enjoy GreatSQL :)
关于 GreatSQL
GreatSQL 是由万里数据库维护的 MySQL 分支,专注于提升 MGR 可靠性及性能,支持 InnoDB 并行查询特性,是适用于金融级应用的 MySQL 分支版本。
相关链接: GreatSQL社区 Gitee GitHub Bilibili
GreatSQL 社区:
社区博客有奖征稿详情:https://greatsql.cn/thread-100-1-1.html
技术交流群:
微信:扫码添加
GreatSQL社区助手
微信好友,发送验证信息加群
。
版权声明: 本文为 InfoQ 作者【GreatSQL】的原创文章。
原文链接:【http://xie.infoq.cn/article/5df3a126ae5277612f90a92a0】。文章转载请联系作者。
GreatSQL
GreatSQL社区 2023-01-31 加入
GreatSQL是由万里数据库维护的MySQL分支,专注于提升MGR可靠性及性能,支持InnoDB并行查询特性,是适用于金融级应用的MySQL分支版本。 社区:https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL
评论