写点什么

Postgresql 分析慢 sql

  • 2023-01-10
    广东
  • 本文字数:7179 字

    阅读完需:约 24 分钟

Postgresql分析慢sql

现象

突然发现测试环境一条慢 sql,就想着分析一下,写写总结。说到优化其实我个人认为是不到不得已还是没有必要的,毕竟除非特别重大的问题,影响了基本操作和体验,平时还是基本的配置也够了,就像《重构:改善代码的既有设计》当你闻到了代码的坏味道才需要重构,毕竟如果一个项目用户量小,并发不高,其实优化跟不优化差距差不了多少,而且有可能改着改着,新的 bug 就有产生了。但大的项目就不一样了,一点点小小的优化就是指数级别的收益,就像你跑步一样,跑个 5 公里,10 公里,什么都可以不在乎,但是你跑个 50 公里,衣服鞋子肺活量身体素质,这些都是要考虑的范围。突然让我想起最近边跑马拉松边抽烟那哥们,简直违反生物学,牛顿看了都会直呼惊呆了🤯。


select fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side,fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchangefrom ccc  fo   JOIN aaa sp ON fo.symbol = sp.symbolJOIN bbb bo ON bo.order_status = '1' and bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_idwhere 1 = 1 and fo.trade_time >= 1609430400000  and fo.trade_time <= 1702655999999and fo.broker_account =  any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[])ORDER BY fo.trade_time DESC, fo.fund_account, fo.symbol   LIMIT 30 OFFSET 0  
复制代码


言归正传,当我看到这条 sql 的时候,我查了一下发现 sql 执行用了 12s,顿时有点惊呆了,一般的 sql 大概超过 2s 就应该优化了,好了我们来分析一下吧。

分析

拿到 sql 我就想看看表数据量多少



select count(*) from aaa;6945
select count(*) from bbb;558729
select count(*) from ccc;798024
复制代码


从表数据上面来看,50 多万的数据还好,不至于这么差,接下来好好分析一下。第一个点,但从 sql 上面我就发现一个点不合理,我之前也喜欢用 where 1=1觉得后面就是一个条件 true,直到后来经过跟别人讨论,有一种可能 SQL 解析会认为 1 是一个属性名,完了去表里面找这样就跟写 SQL 背到而驰了,我们理解可能是认为他就是 TRUE,但是回到 SQL 解析上面又差别不大,去掉 1=1 之后发现运行速度快了 3 秒,从某种程度来说还是会影响 SQL 的执行效率,而且从多表拼接的 SQL 上面确实发现啊了 200 毫秒,单表可能看不出什么差距。


explain analyze select * from ccc where 1= 1 limit 10;Limit  (cost=0.00..0.35 rows=10 width=289) (actual time=0.017..0.020 rows=10 loops=1)  ->  Seq Scan on ccc  (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.016..0.018 rows=10 loops=1)Planning time: 0.150 msExecution time: 0.057 ms
explain analyze select * from ccc limit 10;Limit (cost=0.00..0.35 rows=10 width=289) (actual time=0.020..0.023 rows=10 loops=1) -> Seq Scan on ccc (cost=0.00..27920.00 rows=798300 width=289) (actual time=0.018..0.019 rows=10 loops=1)Planning time: 0.173 msExecution time: 0.062 ms
复制代码


这是当天的 SQL 的执行计划和索引,我发现几个神奇的现象:


  1. PG 数据的 ANY 居然比常用的 in 还要好,原因竟然是走 ANY 不需要使用哈希表,我是看到大佬的文章得出的结论。

  2. 如果没有匹配到索引 ORDER BY 的运行效率会变得非常慢,如果匹配到了索引那么速度就会非常快。


create unique index idx_aaa_symbol    on aaa (symbol);-----create index idx_bbb_place_time    on bbb (place_time);
create index idx_bbb_broker_account on bbb (broker_account);-----create index idx_ccc_trade_time on ccc (trade_time);
create index idx_ccc_broker_account on ccc (broker_account);------
explain (ANALYZE,BUFFERS) select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side, fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time ,bo.price book_price-- ,sp.symbol_name, sp.exchangefrom ccc fo-- JOIN aaa sp ON fo.symbol = sp.symbol JOIN bbb bo ON bo.order_status = '1'-- and bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_idwhere fo.trade_time < 1702655999999 and fo.trade_time > 1609430400000-- fo.trade_time = 1669704603470-- and fo.fo_id = 1007811-- and fo.broker_account in ('0014510913T0','0014510911As','0014510911Th','0014510911Ts','0014510913Alpha','100000001Ts','100000001As','100000001Th','0014510910As','0014510910Ts','0014510910Th','0014510921Th','0014510921Ts','0014510219Th','0014510219As',-- '0014510921Ah','0014510219Ts','0014510921As','16654T0','16654Alpha','0321080000016656Th','0321080000016656Ts','0321080000016656As','0014510920Th','0014510920Ts','0014510920As','0014510910Ah','100000001Ah','0014510911Ah','0311020001234597Ah','0014510219Ah','0321080000016656Ah','0014510920Ah','0014510909T0','0014510909Alpha') and fo.broker_account = any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[] ) ORDER BY fo.trade_time DESC , fo.fund_account ASC , fo.symbolLIMIT 30 OFFSET 0;
复制代码


进行了字段排序 sort,但是 ANY 却不匹配索引了,所以进行了全表检索,执行效率非常的慢。


Limit  (cost=1483726.37..1483726.45 rows=30 width=89) (actual time=3398.207..3398.213 rows=30 loops=1)"  Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"  ->  Sort  (cost=1483726.37..1554119.82 rows=28157379 width=89) (actual time=3398.205..3398.208 rows=30 loops=1)"        Sort Key: fo.trade_time DESC, fo.fund_account, fo.symbol"        Sort Method: top-N heapsort  Memory: 37kB"        Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"        ->  Merge Join  (cost=226264.61..652113.59 rows=28157379 width=89) (actual time=2051.812..2788.036 rows=1345806 loops=1)              Merge Cond: (fo.cl_order_id = bo.cl_order_id)"              Buffers: shared hit=2123 read=42246 dirtied=5, temp read=11657 written=11687"              ->  Sort  (cost=134482.56..135779.65 rows=518838 width=89) (actual time=1436.513..1584.291 rows=512740 loops=1)                    Sort Key: fo.cl_order_id                    Sort Method: external merge  Disk: 56464kB"                    Buffers: shared hit=1113 read=16319 dirtied=2, temp read=10028 written=10050"                    ->  Seq Scan on ccc fo  (cost=0.00..58628.56 rows=518838 width=89) (actual time=0.069..882.202 rows=512740 loops=1)"                          Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"                          Rows Removed by Filter: 188246                          Buffers: shared hit=1113 read=16319 dirtied=2              ->  Materialize  (cost=91406.68..93963.41 rows=511346 width=16) (actual time=613.979..813.987 rows=1521084 loops=1)"                    Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"                    ->  Sort  (cost=91406.68..92685.04 rows=511346 width=16) (actual time=613.975..695.387 rows=510968 loops=1)                          Sort Key: bo.cl_order_id                          Sort Method: external merge  Disk: 13032kB"                          Buffers: shared hit=1010 read=25927 dirtied=3, temp read=1629 written=1637"                          ->  Seq Scan on bbb bo  (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.013..361.825 rows=510968 loops=1)                                Filter: (order_status = '1'::bpchar)                                Rows Removed by Filter: 67455                                Buffers: shared hit=1010 read=25927 dirtied=3Planning time: 0.879 msExecution time: 2828.213 ms                              
复制代码


没有进行字段排序 no sort,但是却走了索引,这可能是 ANY 的一大特点,,所以从执行结果来看效果非常明显。所以我一直陷入一个怪圈,想着把 ANY 的多字符匹配给去掉,换成一种直接匹配的方式会好很多,可是多字符匹配哪里那么容易改变,然后就查资料查文档。突然我想到阿里的葵花宝典还是在哪里记得,我看过这么一句话,外键要加索引,但是看了文档却没有发现就不了了之了,之后又被其他的事情耽误了。


Limit  (cost=0.42..2866.61 rows=30 width=89) (actual time=27.068..400.315 rows=30 loops=1)  Buffers: shared hit=36576  ->  Nested Loop  (cost=0.42..2690142553.60 rows=28157379 width=89) (actual time=27.066..400.294 rows=30 loops=1)        Buffers: shared hit=36576        ->  Seq Scan on bbb bo  (cost=0.00..34181.50 rows=511346 width=16) (actual time=0.012..0.063 rows=13 loops=1)              Filter: (order_status = '1'::bpchar)              Buffers: shared hit=1        ->  Index Scan using ccc_order_id_cl_order_id_index on ccc fo  (cost=0.42..5260.72 rows=12 width=89) (actual time=25.727..30.779 rows=2 loops=13)              Index Cond: (cl_order_id = bo.cl_order_id)"              Filter: ((trade_time < '1702655999999'::numeric) AND (trade_time > '1609430400000'::numeric) AND ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[])))"              Buffers: shared hit=36575Planning time: 0.761 msExecution time: 400.389 ms
复制代码


过几天我再来看的时候发现执行效率恢复了正常,细看发现,只是改动了几个索引,果然验证了那个猜想,外键加索引瞬间快到飞起。这是修改索引之后的执行结果



--- ADDcreate index bbb_cl_order_id_order_id_order_status_index on bbb (cl_order_id, order_id, order_status);--- UDATEcreate index idx_ccc_trade_time_fund_account_symbol on ccc (trade_time desc, fund_account asc, symbol asc);--- ADDcreate index idx_ccc_cl_order_id_order_id on ccc (cl_order_id, order_id);----

explain ANALYZE select fo.fo_id, fo.fund_account, fo.order_type, fo.trade_id, fo.order_id, fo.symbol, fo.trade_type, fo.side, fo.quantity, fo.price, fo.commission, fo.mac_address, fo.trade_time, bo.price book_price, sp.symbol_name, sp.exchangefrom ccc fo JOIN aaa sp ON fo.symbol = sp.symbol JOIN bbb bo ON bo.order_id = fo.order_id and bo.cl_order_id = fo.cl_order_id
where 1=1 AND bo.order_status = '1' and fo.trade_time >= 1609430400000 and fo.trade_time <= 1702655999999 and fo.broker_account = any('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::varchar[] )ORDER BY fo.trade_time DESC , fo.fund_account, fo.symbolLIMIT 30 OFFSET 0;

Limit (cost=1001.16..5150.43 rows=30 width=106) (actual time=95.097..119.294 rows=30 loops=1) -> Gather Merge (cost=1001.16..258255.92 rows=1860 width=106) (actual time=95.094..119.286 rows=30 loops=1) Workers Planned: 2 Workers Launched: 2 -> Nested Loop (cost=1.13..257041.20 rows=775 width=106) (actual time=4.812..21.327 rows=15 loops=3) -> Nested Loop (cost=0.85..256808.08 rows=775 width=90) (actual time=4.756..21.130 rows=15 loops=3) -> Parallel Index Scan using idx_ccc_trade_time_fund_account_symbol on ccc fo (cost=0.42..95331.87 rows=230396 width=90) (actual time=4.706..20.921 rows=9 loops=3) Index Cond: ((trade_time >= '1609430400000'::numeric) AND (trade_time <= '1702655999999'::numeric))" Filter: ((broker_account)::text = ANY ('{0014510913T0,0014510911As,0014510911Th,0014510911Ts,0014510913Alpha,100000001Ts,100000001As,100000001Th,0014510910As,0014510910Ts,0014510910Th,0014510921Th,0014510921Ts,0014510219Th,0014510219As,0014510921Ah,0014510219Ts,0014510921As,16654T0,16654Alpha,0321080000016656Th,0321080000016656Ts,0321080000016656As,0014510920Th,0014510920Ts,0014510920As,0014510910Ah,100000001Ah,0014510911Ah,0311020001234597Ah,0014510219Ah,0321080000016656Ah,0014510920Ah,0014510909T0,0014510909Alpha}'::text[]))" Rows Removed by Filter: 11298 -> Index Scan using bbb_cl_order_id_order_id_order_status_index on bbb bo (cost=0.42..0.69 rows=1 width=21) (actual time=0.018..0.020 rows=2 loops=26) Index Cond: ((cl_order_id = fo.cl_order_id) AND ((order_id)::text = (fo.order_id)::text) AND (order_status = '1'::bpchar)) -> Index Scan using idx_aaa_symbol on aaa sp (cost=0.28..0.30 rows=1 width=23) (actual time=0.011..0.011 rows=1 loops=46) Index Cond: ((symbol)::text = (fo.symbol)::text)Planning time: 1.990 msExecution time: 119.455 ms
复制代码

总结

越来越觉得 PG 数据库,有点像半自动,很多东西需要自己去配置,不像 mysql 那么智能,放到上面基本没有什么 SQL 上面的问题。总的来说还是蛮有收获的,外键加索引,查看执行计划看有没有走索引,索引其实也是一把双刃剑,加快了查询的速度,却增加了新增修改的速度,就像达摩克之剑需要小心谨慎使用。

参考

PostgreSQL in 与 = any 的SQL语法异同与性能优化


Postgresql sql-explain


慢sql查询

发布于: 2023-01-10阅读数: 16
用户头像

生命中任何一件事都值得全力以赴 2019-08-27 加入

码出人生

评论

发布
暂无评论
Postgresql分析慢sql_数据库_i查拉图斯特拉如是说_InfoQ写作社区