从使用者，DBA，内核开发三个不同角度来分析 SQL 的性能问题

作者：楚

2022 年 9 月 25 日
北京
本文字数：11742 字
阅读完需：约 39 分钟

有时候写的 SQL 有性能问题时往往束手无策，而求助于 DBA。今天，我们从使用者、DBA、内核开发三个不同的角度来分析一个有趣的 SQL 性能问题的案例，从浅入深了解 postgreSQL 的优化器。

问题描述

同事 A 来问我这个假 DBA 一条 SQL 的性能问题：

A：两条 SQL 语句只有 limit 不一样，而limit 1的执行比limit 10的慢 N 倍
我：是不是缓存问题，先执行limit 10再执行limit 1试试
A：......，执行了，limit 还是很慢

两条 SQL 生产环境执行情况

limit 10

select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 10;

复制代码

Execution Time: 1.307 ms

limit 1

select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 1;

复制代码

Execution Time: 144.098 ms

分析

执行计划

既然不是缓存问题，那我们先看看执行计划有什么不一样的

limit 1

# explain analyze verbose select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 1;                                                                                QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=0.43..416.25 rows=1 width=73) (actual time=135.213..135.214 rows=1 loops=1)   Output: xxx   ->  Index Scan Backward using user_gift_pkey on yay.user_gift  (cost=0.43..368000.44 rows=885 width=73) (actual time=135.212..135.212 rows=1 loops=1)         Output: xxx         Filter: ((user_gift.user_id = 11695667) AND (user_gift.user_type = 'default'::user_type))         Rows Removed by Filter: 330192 Planning Time: 0.102 ms Execution Time: 135.235 ms(8 rows)
Time: 135.691 ms

复制代码

limit 10

# explain analyze verbose select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 10;                                                                        QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=868.20..868.22 rows=10 width=73) (actual time=1.543..1.545 rows=10 loops=1)   Output: xxx   ->  Sort  (cost=868.20..870.41 rows=885 width=73) (actual time=1.543..1.543 rows=10 loops=1)         Output: xxx         Sort Key: user_gift.id DESC         Sort Method: top-N heapsort  Memory: 27kB         ->  Index Scan using idx_user_type on yay.user_gift  (cost=0.56..849.07 rows=885 width=73) (actual time=0.020..1.366 rows=775 loops=1)               Output: xxx               Index Cond: (user_gift.user_id = 11695667)               Filter: (user_gift.user_type = 'default'::user_type) Planning Time: 0.079 ms Execution Time: 1.564 ms(12 rows)
Time: 1.871 ms

复制代码

可以看到，两个 SQL 执行计划不一样：

limit 1语句：使用主键进行倒序扫描， Index Scan Backward using user_gift_pkey on yay.user_gift
limit 10语句：使用(user_id, user_type)复合索引直接查找用户数据，Index Scan using idx_user_type on yay.user_gift

为什么执行计划不一样？

total cost

其实 postgreSQL 的执行计划并没有“问题”，因为limit 1的 total costLimit (cost=0.43..416.25 rows=1 width=73) 是 416，run cost 是 416-0.43=415.57。而limit 10的 total costLimit (cost=868.20..868.22 rows=10 width=73)是 868.22。

如果使用Index Scan Backward using user_gift_pkey的方式估算，那么limit 1成本是 415, limit 2是 415*2=830, limit 3 是 1245，大于 868，所以当limit 3的时候会使用Index Scan using idx_user_type扫索引的计划。

验证

# explain  select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 2;                                                       QUERY PLAN------------------------------------------------------------------------------------------------------------------------- Limit  (cost=0.43..831.95 rows=2 width=73)   ->  Index Scan Backward using user_gift_pkey on user_gift  (cost=0.43..367528.67 rows=884 width=73)         Filter: ((user_id = 11695667) AND (user_type = 'default'::user_type))(3 rows) Time: 0.341 ms# explain  select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 3;                                                QUERY PLAN---------------------------------------------------------------------------------------------------------- Limit  (cost=866.19..866.20 rows=3 width=73)   ->  Sort  (cost=866.19..868.40 rows=884 width=73)         Sort Key: id DESC         ->  Index Scan using idx_user_type on user_gift  (cost=0.56..854.76 rows=884 width=73)               Index Cond: (user_id = 11695667)               Filter: (user_type = 'default'::user_type)(6 rows) Time: 0.352 ms

复制代码

结果显示：

当limit 2时，执行计划是Index Scan Backward using user_gift_pkey
当limit 3时，就改变计划了，Index Scan using idx_user_type on user_gift

实际执行时间

limit 1时成本估算的是 416.25，比limit 10的868.22还是要快的。

但是实际limit 1执行 cost 是 135.691 ms，而limit 10执行 cost 是 1.871 ms，比limit 10慢了 70 倍！！！

我们重新执行下 explain，加上 buffers 选项

# explain (analyze, buffers, verbose)  select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 1;                                                                                QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=0.43..416.29 rows=1 width=73) (actual time=451.542..451.544 rows=1 loops=1)   Output: xxx   Buffers: shared hit=214402 read=5280 dirtied=2302   I/O Timings: read=205.027   ->  Index Scan Backward using user_gift_pkey on yay.user_gift  (cost=0.43..368032.94 rows=885 width=73) (actual time=451.540..451.540 rows=1 loops=1)         Output: xxx         Filter: ((user_gift.user_id = 11695667) AND (user_gift.user_type = 'default'::user_type))         Rows Removed by Filter: 333462         Buffers: shared hit=214402 read=5280 dirtied=2302         I/O Timings: read=205.027 Planning Time: 1.106 ms Execution Time: 451.594 ms(12 rows)

复制代码

# explain (analyze, buffers, verbose)  select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id desc limit 3;                                                                        QUERY PLAN----------------------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=860.51..860.52 rows=3 width=73) (actual time=14.633..14.634 rows=3 loops=1)   Output: xxx   Buffers: shared hit=467 read=321   I/O Timings: read=10.112   ->  Sort  (cost=860.51..862.72 rows=885 width=73) (actual time=14.632..14.632 rows=3 loops=1)         Output: xxx         Sort Key: user_gift.id DESC         Sort Method: top-N heapsort  Memory: 25kB         Buffers: shared hit=467 read=321         I/O Timings: read=10.112         ->  Index Scan using idx_user_type on yay.user_gift  (cost=0.56..849.07 rows=885 width=73) (actual time=0.192..14.424 rows=775 loops=1)               Output: xxx               Index Cond: (user_gift.user_id = 11695667)               Filter: (user_gift.user_type = 'default'::user_type)               Buffers: shared hit=464 read=321               I/O Timings: read=10.112 Planning Time: 0.111 ms Execution Time: 14.658 ms(18 rows)

复制代码

可以看出：

limit 1时的 IO 成本I/O Timings: read=205.027，Rows Removed by Filter: 333462显示过滤了 333462 行记录
limit 3时 IO 成本I/O Timings: read=10.112，

从上面输出Buffers: shared hit=214402 read=5280 dirtied=2302可以看出limit 1的计划从磁盘读取了 5280 个 blocks(pages)才找到符合 where 条件的记录。

为什么要读取这么多数据呢？我们来看看统计信息：

schemaname             | yaytablename              | user_giftattname                | idinherited              | fnull_frac              | 0avg_width              | 8n_distinct             | -1most_common_vals       | most_common_freqs      | histogram_bounds       | {93,9817,19893,28177,.......}correlation            | 0.788011most_common_elems      | most_common_elem_freqs | elem_count_histogram   | 
schemaname             | yaytablename              | user_giftattname                | user_idinherited              | fnull_frac              | 0avg_width              | 4n_distinct             | -0.175761most_common_vals       | {11576819,10299480,14020501,.......,11695667,......}most_common_freqs      | {0.000353333,0.000326667,0.000246667,......,9.33333e-05,......}histogram_bounds       | {3,10002181,10005599,10009672,......,11693300,11698290,......}correlation            | 0.53375most_common_elems      | most_common_elem_freqs | elem_count_histogram   | 
schemaname             | yaytablename              | user_giftattname                | user_typeinherited              | fnull_frac              | 0avg_width              | 4n_distinct             | 3most_common_vals       | {default, invalid, deleted}most_common_freqs      | {0.997923,0.00194,0.000136667}histogram_bounds       | correlation            | 0.99763most_common_elems      | most_common_elem_freqs | elem_count_histogram   |

复制代码

从统计信息里可以看出：

user_id字段的most_common_vals中有 11695667(user_id)的值，则可以直接通过其对应的most_common_freqs来得到其 selectivity 是 9.33333e-05；
user_type字段为default对应的 selectivity 是 0.997923。
所以where user_id=11695667 and user_type='default'的 selectivity 是 0.0000933333*0.997923 = 0.0000931394467359。

那么可以估算出满足 where 条件的用户数是 0.0000931394467359 * 9499740(总用户数) = 884.8，和执行计划(cost=0.43..367528.67 rows=884 width=73)的 884 行一样。

而优化器的估算是基于数据分布均匀这个假设的：

从 user_gift_pkey(主键 id)扫描的话：只要扫描 9499740/884=10746 行就能找到满足条件的记录，且无须进行排序(order by id desc)
从 idx_user_type 索引扫描的话：虽然能很快找到此用户的数据，但是需要给 884 行进行排序，扫描+排序的 cost 比从主键扫描要高。

那么数据分布真的均匀吗？继续查看数据的实际分布：

表最大的 page=128709

# select max(ctid) from user_gift;     max------------- (128709,29)(1 row)

复制代码

user id=11695667 的最大 page=124329

# select max(ctid), min(ctid) from user_gift where user_id=11695667;     max     |    min-------------+----------- (124329,22) | (3951,64)(1 row)

复制代码

表本身的 pages 和 tuples 数量

# SELECT relpages, reltuples FROM pg_class WHERE relname = 'user_gift'; relpages |  reltuples----------+-------------   128875 | 9.49974e+06(1 row)

复制代码

每个 page 存储的记录数：9.49974e+06 tuples / 128875 pages = 73.713 tuples/page。

计算：表(main table)的 B+tree 的最大 page 是 128709，而实际用户 11695667 的最大 page 是 124329，128709 - 124329 = 4380，需要扫描 4380 个 page 才能找到符合 where 条件的记录所在的 page，所以过滤的 rows 是 4380 pages * 73.713 tuples/page ≈ 322862。

实际limit 1时扫描了 5280 个 pages(包含了主键索引的 pages)，过滤了 333462 万行记录，和估算的基本一样：

Rows Removed by Filter: 333462         Buffers: shared hit=214402 read=5280 dirtied=2302         I/O Timings: read=205.027

复制代码

所以，此用户数据分布倾斜了：

优化器假设数据分布均匀，只需要扫描 10746 个记录
而实际需要扫描 322862 个记录

那么扫描 5280 个 pages 要多久？

需要读取的数据量：5280pages * 8KB/page = 41.2MB 的数据。

[root]$ fio -name iops -rw=randread -bs=8k -runtime=10 -iodepth=1 -filename /dev/sdb -ioengine mmap -buffered=1...Run status group 0 (all jobs):        READ: bw=965KiB/s (988kB/s), 965KiB/s-965KiB/s (988kB/s-988kB/s), io=9656KiB (9888kB), run=10005-10005msec
[root]$ fio -name iops -rw=read -bs=8k -runtime=10 -iodepth=1 -filename /dev/sdb -ioengine mmap -direct=1...Run status group 0 (all jobs):   READ: bw=513MiB/s (538MB/s), 513MiB/s-513MiB/s (538MB/s-538MB/s), io=5132MiB (5381MB), run=10001-10001msec

复制代码

从fio结果可以看出，此数据库机器磁盘的顺序读取速度约为 500MB/s，如果数据都是顺序的，那么扫描 40MB 数据需要约 80ms，

如果数据都是随机的，那么需要 40 秒。不是所有的数据都是顺序访问的，而且测试的是非线上机器，没有其他 IO 进程在运行。

到这里问题基本定位清楚了：

postgreSQL 的优化器认为数据分布是均匀的，只需要倒序扫描很快就找到符合条件的记录，而实际上此用户的数据分布在表的前端，就导致了实际执行 start-up time 如此慢了。

从内核视角来分析

我们从 postgreSQL 内核的角度来继续分析几个问题：

优化器如何估算 cost
优化器如何统计 actual time

表的信息

表结构

# \d user_gift;                                         Table "yay.user_gift"    Column    |           Type           | Collation | Nullable |                    Default--------------+--------------------------+-----------+----------+------------------------------------------------ id           | bigint                   |           | not null | nextval('user_gift_id_seq'::regclass) user_id      | integer                  |           | not null | ug_name | character varying(100)   |           | not null | expired_time | timestamp with time zone |           |          | now() created_time | timestamp with time zone |           | not null | now() updated_time | timestamp with time zone |           | not null | now() user_type   | user_type               |           | not null | 'default'::user_typeIndexes:    "user_gift_pkey" PRIMARY KEY, btree (id)    "idx_user_type" btree (user_id, ug_name)    "user_gift_ug_name_idx" btree (ug_name)

复制代码

主键索引

# SELECT relpages, reltuples FROM pg_class WHERE relname = 'user_gift_pkey'; relpages |  reltuples----------+-------------    40035 | 9.49974e+06(1 row)

复制代码

user_id 索引

# SELECT relpages, reltuples FROM pg_class WHERE relname = 'idx_user_type'; relpages |  reltuples----------+-------------   113572 | 9.49974e+06(1 row)

复制代码

表本身的 pages 是 128875

# SELECT relpages, reltuples FROM pg_class WHERE relname = 'user_gift'; relpages |  reltuples----------+-------------   128875 | 9.49974e+06(1 row)

复制代码

user id=11695667 的数据 775 行

=# select count(1) from user_gift where user_id=11695667; count-------   775(1 row)
=#  select count(1)  from user_gift where user_id=11695667 and user_type = 'default' ; count-------   775(1 row)

复制代码

树高度

# 主键高度# select * from  bt_metap('user_gift_pkey'); magic  | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples--------+---------+------+-------+----------+-----------+-------------+------------------------- 340322 |       3 |  412 |     2 |      412 |         2 |           0 |             9.31928e+06(1 row)

// idx_user_type 高度# select * from  bt_metap('idx_user_type'); magic  | version | root  | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples--------+---------+-------+-------+----------+-----------+-------------+------------------------- 340322 |       3 | 15094 |     3 |    15094 |         3 |           0 |             9.49974e+06(1 row)

复制代码

估算 cost

start-up cost

postgreSQL 对于每种索引的成本估算是不一样的，我们看看 B+tree 的 start-up 成本是如何估算的：

// selfuncs.cvoidbtcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,               Cost *indexStartupCost, Cost *indexTotalCost,               Selectivity *indexSelectivity, double *indexCorrelation,               double *indexPages){    ......        descentCost = ceil(log(index->tuples) / log(2.0)) * cpu_operator_cost;    costs.indexStartupCost += descentCost;
    ......    // This cost is somewhat arbitrarily set at 50x cpu_operator_cost per page touched    descentCost = (index->tree_height + 1) * 50.0 * cpu_operator_cost;    costs.indexStartupCost += descentCost;        ......}

复制代码

其实 start-up cost 估算很简单，只考虑从 B+tree 的 root page 遍历到 leaf page，且将这个 page 读入第一个 tuple(记录)的 cost。

start-up 估算公式如下：

\left { ceil({\log_2 (N_{index,tuple})}) + (Height_{index} + 1) \times 50 \right }\ \times cpu_operator_cost

N(index,tuple) ：索引 tuples(记录)数量
Height(index) ：索引 B+tree 的高度
cpu_operator_cost : 默认值 0.0025

使用 user_gift_pkey 计划的 start-up cost

从上面表信息中可以看出：

N(index,tuple) ：9.49974e+06，
Height(index) ： 2

所以

\left { ceil({\log_2 (9499740)}) + (2 + 1) \times 50 \right }\ \times cpu_operator_cost = 173 \times 0.0025 = 0.435

和 postgreSQL 估算的 start-up cost=0.43 一样。

使用 idx_user_type 计划的 start-up cost

N(index,tuple) ：9.49974e+06，
Height(index) ： 3$ $\left { ceil({\log_2 (9499740)}) + (3 + 1) \times 50 \right }\ \times cpu_operator_cost = 223 \times 0.0025 = 0.5575$ $和 postgreSQL 估算的 start-up cost=0.56 一样。

run cost

run cost 的估算是比较复杂的，判断的条件非常多，无法用一个固定的公式计算出来，所以这里只是简单描述下，有兴趣的可以看 postgreSQL 源码src/backend/optimizer/path/costsize.c的cost_index函数，针对这个案例，一般情况下可以根据此链接的脚本进行来模拟计算 cost。

run cost$ $r u n_{c} o s t = 索引成本 + 主表成本$ $

索引成本 $ $索引成本 = 随机读取索引相关 p a g e s 的成本 + 操作相关 t u p l e s 的成本$ $

主表成本 $ $主表成本 = m a x_{i} o_{c} o s t + i n d e x_{c} o r r e l a t i o n^{2} \times (m i n_{i} o_{c} o s t - m a x_{i} o_{c} o s t)$ $

ndex_correlation : 索引相关性。索引的顺序与主表数据排列顺序的关联性，用来描述通过索引扫描数据时，回表的顺序读的概率

max io cost（最坏情况下 IO 成本）

所有 pages 都是随机读取

m a x_{i} o_{c} o s t = p a g e s_{f} e t c h e d \times r a n d o m_{p} a g e_{c} o s t

min_io_cost（最优情况下 IO 成本）

第一个 page 是随机读取，后面 pages 都是顺序读取

m i n_{i} o_{c} o s t = 1 \times r a n d o m_{p} a g e_{c} o s t + (p a g e s_{f} e t c h e d - 1) \times s e q_{p} a g e_{c} o s t

actual start-up time vs estimated start-up cost

刚刚的分析中有一个疑问被忽略了：estimated start-up cost 是开始执行计划到从表中读到的第一个 tuple 的 cost(cost is an arbitrary unit)；而 actual start-up time 则是开始执行计划到从表中读取到第一个符合 where 条件的 tuple 的时间。这是为什么呢？

SQL 处理流程：postgreSQL 将 SQL 转化成 AST，然后进行优化，再将 AST 转成执行器(executor)来实现具体的操作。不进行优化的执行器是这样的：

  ┌──────────────┐  │  projection  │  └──────┬───────┘         │         │  ┌──────▼──────┐  │    limit    │  └──────┬──────┘         │         │  ┌──────▼──────┐  │  selection  │  └──────┬──────┘         │         │  ┌──────▼──────┐  │ index scan  │  └─────────────┘

复制代码

简化的执行流程如下：

index scan executor：扫描到一个 tuple，就返回给 selection executor
selection executor：对 tuple 进行过滤，如果符合条件则返回给 limit executor，如果不符合则继续调用 index scan executor
limit executor：当达到 limit 限制则将数据返回给 projection executor
projection executor：过滤掉非select列的数据

那么如果进行优化，一般会将selection executor和projection executor合并到index scan executor中执行，以减少数据在 executor 之间的传递。

┌─────────────┐│    limit    │└──────┬──────┘       │       │┌──────▼──────┐│ index scan  ││             ││ + selection ││ + projection│└─────────────┘

复制代码

优化后的执行流程：

index scan executor：扫描到 tuple，然后进行 selection 过滤，如果符合条件就进行 projection 再返回给 limit，如果不符合条件，则继续扫描
limit executor：当达到 limit 限制则将数据返回

而通过下面代码可以看出，postgreSQL 对于执行时间的统计是基于 executor 的，

// src/backend/executor/execProcnode.cstatic TupleTableSlot *ExecProcNodeInstr(PlanState *node){    TupleTableSlot *result;    InstrStartNode(node->instrument);    result = node->ExecProcNodeReal(node);        // 统计执行指标    InstrStopNode(node->instrument, TupIsNull(result) ? 0.0 : 1.0);    return result;}

复制代码

所以 actual time 的 start-up 是从启动 executor 直到扫描到符合where语句的第一条结果为止。

再看看实际的函数调用栈，user_id=xxx的过滤已经下沉到index scan executor里面了。

--->    int4eq(FunctionCallInfo fcinfo) (/home/ken/cpp/postgres/src/backend/utils/adt/int.c:379)        ExecInterpExpr(ExprState * state, ExprContext * econtext, _Bool * isnull) (/home/ken/cpp/postgres/src/backend/executor/execExprInterp.c:704)        ExecInterpExprStillValid(ExprState * state, ExprContext * econtext, _Bool * isNull) (/home/ken/cpp/postgres/src/backend/executor/execExprInterp.c:1807)        ExecEvalExprSwitchContext(ExprState * state, ExprContext * econtext, _Bool * isNull) (/home/ken/cpp/postgres/src/include/executor/executor.h:322)--->    ExecQual(ExprState * state, ExprContext * econtext) (/home/ken/cpp/postgres/src/include/executor/executor.h:391)        ExecScan(ScanState * node, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd) (/home/ken/cpp/postgres/src/backend/executor/execScan.c:227)--->    ExecIndexScan(PlanState * pstate) (/home/ken/cpp/postgres/src/backend/executor/nodeIndexscan.c:537)        ExecProcNodeInstr(PlanState * node) (/home/ken/cpp/postgres/src/backend/executor/execProcnode.c:466)        ExecProcNodeFirst(PlanState * node) (/home/ken/cpp/postgres/src/backend/executor/execProcnode.c:450)        ExecProcNode(PlanState * node) (/home/ken/cpp/postgres/src/include/executor/executor.h:248)--->    ExecLimit(PlanState * pstate) (/home/ken/cpp/postgres/src/backend/executor/nodeLimit.c:96)        ExecProcNodeInstr(PlanState * node) (/home/ken/cpp/postgres/src/backend/executor/execProcnode.c:466)        ExecProcNodeFirst(PlanState * node) (/home/ken/cpp/postgres/src/backend/executor/execProcnode.c:450)        ExecProcNode(PlanState * node) (/home/ken/cpp/postgres/src/include/executor/executor.h:248)        ExecutePlan(EState * estate, PlanState * planstate, _Bool use_parallel_mode, CmdType operation, _Bool sendTuples, uint64 numberTuples, ScanDirection direction, DestReceiver * dest, _Bool execute_once) (/home/ken/cpp/postgres/src/backend/executor/execMain.c:1632)        standard_ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/home/ken/cpp/postgres/src/backend/executor/execMain.c:350)        ExecutorRun(QueryDesc * queryDesc, ScanDirection direction, uint64 count, _Bool execute_once) (/home/ken/cpp/postgres/src/backend/executor/execMain.c:294)        ExplainOnePlan(PlannedStmt * plannedstmt, IntoClause * into, ExplainState * es, const char * queryString, ParamListInfo params, QueryEnvironment * queryEnv, const instr_time * planduration, const BufferUsage * bufusage) (/home/ken/cpp/postgres/src/backend/commands/explain.c:571)        ExplainOneQuery(Query * query, int cursorOptions, IntoClause * into, ExplainState * es, const char * queryString, ParamListInfo params, QueryEnvironment * queryEnv) (/home/ken/cpp/postgres/src/backend/commands/explain.c:404)        ExplainQuery(ParseState * pstate, ExplainStmt * stmt, ParamListInfo params, DestReceiver * dest) (/home/ken/cpp/postgres/src/backend/commands/explain.c:275)

复制代码

下面代码是 scan 的实现，其中的ExecQual(qual, econtext)是对 tuple 进行过滤，因为 selection 已经合并到 scan 中了。

TupleTableSlot *ExecScan(ScanState *node, ExecScanAccessMtd accessMtd, ExecScanRecheckMtd recheckMtd){    ......    for (;;)    {        TupleTableSlot *slot;        slot = ExecScanFetch(node, accessMtd, recheckMtd);        ......        econtext->ecxt_scantuple = slot;                // Note : selection判断        if (qual == NULL || ExecQual(qual, econtext))        {            if (projInfo)            {                return ExecProject(projInfo);            }            else            {                return slot;            }        }        else            InstrCountFiltered1(node, 1);    }}

复制代码

解决方案

禁用走主键扫描

既然计划走的是 user_gift_pkey 倒序扫描，那么我们可以手动避免优化器使用这个索引。

# explain analyze verbose select xxx from user_gift where user_id=11695667 and user_type = 'default' order by id+0 desc limit 1;

复制代码

将order by id改成order by id+0，由于id+0是个表达式所以优化器就就不会使用 user_gift_pkey 这个索引了。

这个方案不适合所有场景，如果数据分布均匀的话则某些情况下使用 user_gift_pkey 扫描更加合理。

增加(user_id, id)索引

create index idx_user_id on user_gift(user_id, id);

复制代码

通过增加 where 条件列和排序键的复合索引，来避免走主键扫描。

写在最后

从排除缓存因素，分析查询计划，定位数据分布倾斜，到调试内核源码来进一步确定原因，最终成功解决性能问题。通过这个有趣的 SQL 优化经历，相信能给大家带来收获。

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/418249ec94bf2882b33585e7a】。文章转载请联系作者。

楚

关注

还未添加个人签名 2018.06.12 加入

还未添加个人简介

发布

暂无评论

创作场景

从使用者，DBA，内核开发三个不同角度来分析 SQL 的性能问题

问题描述

分析

执行计划

实际执行时间

从内核视角来分析

表的信息

估算 cost

start-up cost

run cost

actual start-up time vs estimated start-up cost

解决方案

禁用走主键扫描

增加(user_id, id)索引

写在最后

楚

评论