ARTS week 2

 关注
发布于: 2020 年 05 月 28 日
ReviewEvaluate good search part I - measure it
https://medium.com/@dtunkelang/evaluating-good-search-part-i-measure-it-5507b2dbf4f6
Chose this article because I'm actively working on improving search experience for our platform. 
This article brings general guidelines on what / how to evaluate search quality.
﻿
What to measure?
Precision, among the recalled results, relevant results / recalled results
Recall, among all relevant results, recalled results / all relevant results
Indeed these two are very well-acknowledged search metrics by today's industry.
﻿
How to measure?
When it comes to the question of How, the short answer is always (at least 90%) it depends. We all want perfect solutions but mostly ended up with some feasible solution that is highly adapted to our situation. But this isn't saying there is nothing we could try first.
One good thing to try IMO is : 
Measure precision by CTR (click-through-rate), that is, # of clicks / # of views
Do above for only the top ranked items, because only a small amount of users would go through the entire list of search results.
﻿
Tip本周遇到一个mysql slow query问题，研究了一下和index merge有关.
出问题的query:
select * 
from A 
left join B on A.name = B.name 
left join C on B.owner = C.owner
left join D on C.type = D.type
where B.id between 1 and 10000
and B.status = 'ON'
﻿
query实际执行时间超过100s，必定超时
join用的字段全部建好了index，query plan显示了对于表B，应用了以下index
Using intersect(index_B_status,PRIMARY)
﻿
何为index merge？
https://dev.mysql.com/doc/refman/5.6/en/index-merge-optimization.html
mysql对多个索引进行扫描并把结果合并，以上的Using intersect表示合并用到的是intersect方法，即把index_B_status和PRIMARY这两个index的扫描结果做了交集.
﻿
这里什么导致了index merge慢？
B.id between 1 and 10000, 这里至多能得出10000个结果，扫描PRIMARY index尚可
B.status = 'ON', 这里有海量的结果满足条件，扫描index_B_status花费了大量时间，和以上求交集的时候又花费了大量时间
﻿
当把B.status = 'ON'条件去掉以后，query花了4s完成了
﻿
Short term fix
使用hint让表B强制使用PRIMARY index : 
select * 
from A 
left join B   FORCE INDEX FOR JOIN (`PRIMARY`)
on A.name = B.name 
left join C on B.owner = C.owner
left join D on C.type = D.type
where B.id between 1 and 10000
and B.status = 'ON'
﻿
query plan显示不再使用index_B_status了
﻿
Long term fix
一般来说出现了intersect index merge，表示index有可以优化的空间，比如建立一个PRIMARY + status的联合索引.
参考
https://www.cnblogs.com/digdeep/p/4975977.html
﻿
联合索引中常见的最左前缀问题
假设有 
c_index(id,name,status)
﻿
query : 
select * from items where id = 1 and name = 'foo' and status = 'bar'
这个query将能够完全使用c_index
﻿
而如果query变成：
select * items where id = 1 and status = 'bar'
这个query将不能使用c_index, 因为缺少了name条件
﻿
这里只要id, name, status三者的顺序不重要，只要三个都在，就能应用c_index
﻿
另外有一篇关于mysql回表的文章:
cnblogs.com/myseries/p/11265849.html
以后有机会可以好好研究一下
﻿
Sharehttps://xie.infoq.cn/article/0ab0e56794e4f761f38c3a46a
如何做一个合格的team leader.
发布于: 2020 年 05 月 28 日阅读数: 69
锈蠢刀

 关注
还未添加个人签名 2018.12.25 加入
还未添加个人简介
发布
暂无评论