ARTS week 2

用户头像
锈蠢刀
关注
发布于: 2020 年 05 月 28 日

Review

Evaluate good search part I - measure it

https://medium.com/@dtunkelang/evaluating-good-search-part-i-measure-it-5507b2dbf4f6

Chose this article because I'm actively working on improving search experience for our platform.

This article brings general guidelines on what / how to evaluate search quality.



What to measure?

  1. Precision, among the recalled results, relevant results / recalled results

  2. Recall, among all relevant results, recalled results / all relevant results

Indeed these two are very well-acknowledged search metrics by today's industry.



How to measure?

When it comes to the question of How, the short answer is always (at least 90%) it depends. We all want perfect solutions but mostly ended up with some feasible solution that is highly adapted to our situation. But this isn't saying there is nothing we could try first.

One good thing to try IMO is :

  1. Measure precision by CTR (click-through-rate), that is, # of clicks / # of views

  2. Do above for only the top ranked items, because only a small amount of users would go through the entire list of search results.



Tip

本周遇到一个mysql slow query问题,研究了一下和index merge有关.

出问题的query:

select *
from A
left join B on A.name = B.name
left join C on B.owner = C.owner
left join D on C.type = D.type
where B.id between 1 and 10000
and B.status = 'ON'

query实际执行时间超过100s,必定超时
join用的字段全部建好了index,query plan显示了对于表B,应用了以下index
Using intersect(index_B_status,PRIMARY)



何为index merge

https://dev.mysql.com/doc/refman/5.6/en/index-merge-optimization.html

mysql对多个索引进行扫描并把结果合并,以上的Using intersect表示合并用到的是intersect方法,即把index_B_status和PRIMARY这两个index的扫描结果做了交集.



这里什么导致了index merge慢

B.id between 1 and 10000, 这里至多能得出10000个结果,扫描PRIMARY index尚可
B.status = 'ON', 这里有海量的结果满足条件,扫描index_B_status花费了大量时间,和以上求交集的时候又花费了大量时间

当把B.status = 'ON'条件去掉以后,query花了4s完成了



Short term fix

使用hint让表B强制使用PRIMARY index :

select *
from A
left join B FORCE INDEX FOR JOIN (`PRIMARY`)
on A.name = B.name
left join C on B.owner = C.owner
left join D on C.type = D.type
where B.id between 1 and 10000
and B.status = 'ON'

query plan显示不再使用index_B_status了



Long term fix

一般来说出现了intersect index merge,表示index有可以优化的空间,比如建立一个PRIMARY + status的联合索引.

参考

https://www.cnblogs.com/digdeep/p/4975977.html



联合索引中常见的最左前缀问题

假设有
c_index(id,name,status)

query :
select * from items where id = 1 and name = 'foo' and status = 'bar'
这个query将能够完全使用c_index

而如果query变成:
select * items where id = 1 and status = 'bar'
这个query将不能使用c_index, 因为缺少了name条件

这里只要id, name, status三者的顺序不重要,只要三个都在,就能应用c_index



另外有一篇关于mysql回表的文章:

cnblogs.com/myseries/p/11265849.html

以后有机会可以好好研究一下



Share

https://xie.infoq.cn/article/0ab0e56794e4f761f38c3a46a

如何做一个合格的team leader.

用户头像

锈蠢刀

关注

还未添加个人签名 2018.12.25 加入

还未添加个人简介

评论

发布
暂无评论
ARTS week 2