写点什么

【GreatSQL 优化器 -16】INDEX_SKIP_SCAN

作者:GreatSQL
  • 2025-03-12
    福建
  • 本文字数:9988 字

    阅读完需:约 33 分钟

【GreatSQL 优化器-16】INDEX_SKIP_SCAN

一、INDEX_SKIP_SCAN 介绍

GreatSQL 优化器的索引跳跃扫描(Index Skip Scan) 是一种优化查询的技术,尤其在联合索引中用于减少扫描的无效行数。它通过"跳跃"式的扫描方式,避免了对索引中无用部分的扫描,从而提升查询效率。这种技术适合特定场景,并有一定的优缺点。


索引跳跃扫描利用的是联合索引中非首列(非最左前缀)的索引列,来提高查询效率。例如,如果你有一个复合索引 (A, B),在传统的 B-Tree 索引中,只有当查询条件包含 A 列时,索引才会生效。但在跳跃扫描中,即使 A 没有出现在查询条件中,仍然可以通过扫描 B 列来有效查询。跳跃扫描会逐步扫描 A 列的每一个可能值,然后在每个 A 值下查找 B 列中符合条件的记录。这样避免了扫描大量无关记录,提升了查询性能。


下面用一个简单的例子来说明INDEX_SKIP_SCAN怎么应用在优化器。


CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT);CREATE UNIQUE INDEX i1_t1 ON t1(c1,c2,c3);INSERT INTO t1 VALUES (1,1,1,1), (1,1,2,2), (1,3,3,3), (1,4,4,4), (1,5,5,5),                      (2,1,1,1), (2,2,2,2), (2,3,3,3), (2,4,4,4), (2,5,5,5);INSERT INTO t1 SELECT c1, c2, c3+5, c4+10  FROM t1;INSERT INTO t1 SELECT c1, c2, c3+10, c4+20 FROM t1;INSERT INTO t1 SELECT c1, c2, c3+20, c4+40 FROM t1;INSERT INTO t1 SELECT c1, c2, c3+40, c4+80 FROM t1;ANALYZE TABLE t1;
-- 下面的结果用到了skip scangreatsql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2 > 40;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 10 | NULL | 53 | 100.00 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
/*运行过程:1、先找到第一列的不同值,得到结果为1和2select distinct c1 from t1; 2、根据第一列的分组结果在分组内执行范围扫描select c1, c2 from t1 where c1 = 1 and c2 > 40;select c1, c2 from t1 where c1 = 2 and c2 > 40;*/
复制代码

二、get_best_skip_scan 代码解释

skip scan 是根据条件生成 mm tree 以后,根据 mm tree 结果和索引进行判定能不能用 skip scan,其中联合索引中范围条件列之前的列为 prefix 列,即用来作为分组的依据的列,如果有多个除了第一列之外的范围列,那么选取遇到的第一个列之前的列为前缀进行分组。


int test_quick_select() {  // 先判断skip_scan是否满足条件,不满足的话执行索引合并。  AccessPath *skip_scan_path = get_best_skip_scan();  // 不满足skip_scan执行索引合并,相关知识看上一节介绍}
AccessPath *get_best_skip_scan() { // 先判断sql是否支持skip_scan,见表一 // 接着开始执行skip_scan,遍历所有索引 for (uint cur_param_idx = 0; cur_param_idx < param->keys; ++cur_param_idx) { // covering_keys代表可以使用的联合索引,sql涉及的所有列必须都在联合索引上才会有值 if (!table->covering_keys.is_set(cur_index)) { cause = "query_references_nonkey_column"; goto next_index; } // 从mm tree抽出当前联合索引相关的SEL_ROOT,如果没有那么就不能执行skip scan cur_index_range_tree = get_index_range_tree(cur_index, tree, param); // 遍历当前索引涉及的所有列 for (cur_part = cur_index_info->key_part; cur_part != end_part; cur_part++, part++) { // 从mm tree获取当前列相关的SEL_ROOT,就是查询条件涉及的列 get_sel_root_for_keypart(); 1、如果找到联合索引前缀列相关条件并且不是等号条件,那么就判断为"prefix_not_const_equality"并且跳到下一个索引 2、如果找到联合索引前缀列以外的列相关条件,以遇到的第一个条件作为skip scan的条件。 } // 针对联合索引的前缀列估计范围内包含的行数,用在条件包含前缀列有等号条件的情况 quick_prefix_records = check_quick_select(); // 计算skip_scan的行数和cost cost_skip_scan(); // 最后生成AccessPath,包含IndexSkipScanParameters }}// 获取skip scan的行数和cost结果void cost_skip_scan() { table_records = table->file->stats.records; // 如果可以根据前缀列估算出行数,那么就用估算的值,如果不能就用全表的行数。 if (quick_prefix_records == HA_POS_ERROR) skip_scan_records = table_records; else skip_scan_records = quick_prefix_records; // 首先根据联合索引前缀列数值获取每个唯一数组包含的相同数的数量 /* Compute the number of keys in a group. */ if (index_info->has_records_per_key(distinct_key_parts - 1)) { keys_per_group = index_info->records_per_key(distinct_key_parts - 1); } else keys_per_group = guess_rec_per_key(table, index_info, distinct_key_parts);
num_groups = (uint)(skip_scan_records / keys_per_group) + 1; num_groups = std::max(num_groups, 1U);
// 接着根据可以用于skip scan的列,获取这个列的不同唯一数值包含的相同数的数量 if (index_info->has_records_per_key(distinct_key_parts)) keys_per_range = index_info->records_per_key(distinct_key_parts); else keys_per_range = guess_rec_per_key(table, index_info, distinct_key_parts + 1); // double max_distinct_values = max( 1.0, static_cast<double>(uint(keys_per_group) / uint(keys_per_range))); // 根据条件估算过滤系数 float filtering_effect = where_cond->get_filtering_effect(); // 计算得出的总行数 *records = max(ha_rows(1), ha_rows(skip_scan_records * filtering_effect)); // 最后估算IO cost和CPU cost Cost_estimate cost_skip_scan = table->file->index_scan_cost(key, num_groups, *records);}
最后具体实现过程见代码IndexSkipScanIterator::Read(),这里不再展开赘述。
复制代码


表一:不能使用 skip_scan 的场合



表二:索引判断状态迁移



表三:skip index 使用场景



表四:skip index 优点



表五:skip index 缺点



表六:相关关键词


三、实际例子说明

接下来看几个例子来说明上面的代码:


greatsql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2 > 40;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                                  |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+|  1 | SIMPLE      | t1    | NULL       | range | i1_t1         | i1_t1 | 10      | NULL |   53 |   100.00 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+                  "skip_scan_range": {                    "potential_skip_scan_indexes": [                      {                        "index": "i1_t1",                        "tree_travel_cost": 0.4,                        "num_groups": 3, -- 计算方式见下面                        "rows": 53, -- 计算方式见下面                        "cost": 11.483                      }                    ]                  },                  "best_skip_scan_summary": {                    "type": "skip_scan",                    "index": "i1_t1",                    "key_parts_used_for_access": [                      "c1",                      "c2"                    ],                    "range": [                      "40 < c2"                    ],                    "chosen": true                  },
/* num_groups和rows计算过程:skip_scan_records = table->file->stats.records = 160keys_per_group = 条件涉及的第一个列的每个key包含的值 = 80keys_per_range = 条件涉及列的每个key包含的值 = 17.7777786num_groups = (skip_scan_records / keys_per_group) + 1 = 160 / 80 + 1 = 3 --->num_groups结果max_distinct_values = keys_per_group / keys_per_range = 80 / 17.7777786 = 4filtering_effect = max(1/4, COND_FILTER_INEQUALITY) = 0.333records = skip_scan_records * filtering_effect = 160 * 0.333 = 53 --->rows结果 */
复制代码


下面例子有的没用 skip scan 有的用了,区别仅在于包含的条件,根据条件最后生成的 mm tree 不同导致结果不同。


-- 查询语句的Item包含不在联合索引的列c4,因此被判定为"query_references_nonkey_column",不使用skip scan。greatsql> EXPLAIN SELECT c1, c2,c3,c4 FROM t1 WHERE c3 >= 40 and c1<5;+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+|  1 | SIMPLE      | t1    | NULL       | ALL  | i1_t1         | NULL | NULL    | NULL |  160 |    33.33 | Using where |+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
-- 这个例子mm tree的c1<5因为不是等号条件因此被判断为"prefix_not_const_equality",因此不执行skip scan。greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1<5;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| 1 | SIMPLE | t1 | NULL | index | i1_t1 | i1_t1 | 15 | NULL | 160 | 33.33 | Using where; Using index |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
-- 这个例子mm tree只有一个40 <= c3,没有c1相关的tree,因此不需要判断索引前缀。greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and 0<c1<3;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 53 | 100.00 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+/* ※0<c1<3不是标准条件,不支持生成mm tree,详情见https://bugs.mysql.com/bug.php?id=116515&thanks=4。这里可以改成c1 between 0 and 3 */
/*这个例子mm tree有两个40 <= c3和0<c1<3,0<c1<3被判断为"prefix_not_const_equality",因此不能用skip scan方式。*/
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and 0<c1 and c1<3;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| 1 | SIMPLE | t1 | NULL | index | i1_t1 | i1_t1 | 15 | NULL | 160 | 33.33 | Using where; Using index |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+

-- 这个例子mm tree有两个:c1 = 5和40 <= c3,c1条件满足等号条件的要求,可以考虑用skip scan。greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=5;+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+| 1 | SIMPLE | t1 | NULL | ref | i1_t1 | i1_t1 | 5 | const | 1 | 33.33 | Using where; Using index |+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+ "skip_scan_range": { "potential_skip_scan_indexes": [ { "index": "i1_t1", "tree_travel_cost": 0.4, "num_groups": 1, -- 这里分组为1,因为c1=5前缀列算出来的skip_scan_records=1,因为不足1所以最后取1 "rows": 1, "cost": 1.2 } ] }, "best_skip_scan_summary": { "type": "skip_scan", "index": "i1_t1", "key_parts_used_for_access": [ "c1", "c2", "c3" ], "prefix ranges": [ -- 这里的prefix指的是条件出现的联合索引的第一列 "c1 = 5" ], "range": [ -- 这里的范围指的是可以用skip scan的列 "40 <= c3" ], "chosen": true
/* 这个例子mm tree有两个:c2=2和40 <= c3,skip scan只选除了联合索引第一个列以外遇到的第一个条件,因此最后用的是c2=2作为skip scan的条件。*/greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c2=2;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 10 | NULL | 40 | 33.33 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+ "skip_scan_range": { "potential_skip_scan_indexes": [ { "index": "i1_t1", "tree_travel_cost": 0.4, "num_groups": 3, -- 这里分组为3,因为用c1来分组的,c1只有1和2两个不同值,因此结果为2+1=3 "rows": 40, "cost": 8.67494 } ] }, "best_skip_scan_summary": { "type": "skip_scan", "index": "i1_t1", "key_parts_used_for_access": [ "c1", "c2" ], "range": [ -- 这里的范围指的是可以用skip scan的列。由于条件没有涉及第一列c1,因此没有出现"prefix ranges"选项。 "c2 = 2" ], "chosen": true -- 这个例子mm tree有1个:40 <= c3,最后用的是c1和c2作为前缀。greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 53 | 100.00 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+ "skip_scan_range": { "potential_skip_scan_indexes": [ { "index": "i1_t1", "tree_travel_cost": 0.4, "num_groups": 10, -- 这里公式=160 / 17.777 + 1 = 9+1 = 10,因为没有前缀条件可以过滤条件 "rows": 53, "cost": 16.2332 } ] }, "best_skip_scan_summary": { "type": "skip_scan", "index": "i1_t1", "key_parts_used_for_access": [ "c1", "c2", "c3" ], "range": [ "40 <= c3" ], "chosen": true }, -- 下面的例子因为出现了联合索引的所有列c1和c2和c3,因此不符合skip scan要求,不能用skip scan。greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=1 and c2=2;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 1 | 100.00 | Using where; Using index |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
复制代码


下面看一下用到 SKIP_SCAN 提示关键词的情况。下面这个例子没加关键字的时候因为 cost 没选择index skip scan,加上关键字之后就会强制走index skip scan


greatsql> EXPLAIN SELECT  /*+ SKIP_SCAN(t1) */ c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=5;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                                  |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+|  1 | SIMPLE      | t1    | NULL       | range | i1_t1         | i1_t1 | 15      | NULL |    1 |   100.00 | Using where; Using index for skip scan |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
复制代码


下面看一下用到 NO_SKIP_SCAN 提示关键词的情况。


greatsql> EXPLAIN SELECT /*+ NO_SKIP_SCAN(t1 i1_t1) */ c1, c2,c3 FROM t1 WHERE c3 >= 40;+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+| id | select_type | table | partitions | type  | possible_keys | key   | key_len | ref  | rows | filtered | Extra                    |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+|  1 | SIMPLE      | t1    | NULL       | index | NULL          | i1_t1 | 15      | NULL |  160 |    33.33 | Using where; Using index |+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
复制代码

四、总结

从上面 skip index 的代码我们了解了跳跃扫描的使用条件和使用场景,以及使用规则,这个功能让一些以前用不到联合索引的查询也可以用到索引,提高了效率,但也有一定的局限和缺点,要看具体表的情况来决定。


发布于: 刚刚阅读数: 3
用户头像

GreatSQL

关注

GreatSQL社区 2023-01-31 加入

GreatSQL是由万里数据库维护的MySQL分支,专注于提升MGR可靠性及性能,支持InnoDB并行查询特性,是适用于金融级应用的MySQL分支版本。 社区:https://greatsql.cn/ Gitee: https://gitee.com/GreatSQL/GreatSQL

评论

发布
暂无评论
【GreatSQL优化器-16】INDEX_SKIP_SCAN_GreatSQL_InfoQ写作社区