写点什么

ARTS week 3

用户头像
锈蠢刀
关注
发布于: 2020 年 06 月 08 日

Review

https://towardsdatascience.com/semantic-code-search-3cd6d244a39c

This article is about building a simple semantic search engine from scratch and is super helpful and intuitive to folks who don't have context what semantic search means.

My impression of semantic search is that, it's one step further than naive text matching. What I mean by naive text matching is that you explicitly define rules for matches, e.g. for keyword "apple juice", there could be a simple rule that's saying "give me all items with name being apple juice", with this rule, your search returns some results.

In real systems without semantic understanding, we usually see a combination of such rules, e.g. "give me all items with name being apple juice + those with name containing apple juice + those with name containing apple and juice + those with name containing apple + those with name containing juice...", if we use elasticsearch as our underlying storage, each rule in above combo translates into a ES subquery.

The limitations with this approach are quite obvious :

  1. In real life, string match isn't good enough, when I type apple juice I might actually mean a bar that sells apple juice, hence above combo rule won't work as apple juice is hidden in that bar's menu, not name.

  2. Defining an exhausted list of rules isn't scalable and introduces unnecessary waste of your resources. Using above combo rule as an example, it's highly possible that those queries return same, or highly similar results.

Lack of understanding of keyword semantic leaves you no choice but to relying on large and dummy rules.

So what question is semantic search trying to answer? Still using "apple juice" as an example, a search engine that understands semantics would think like "looks like you want apple juice, so let me find out the stores that are most relevant with apple juice", such relevance could mean the store sells apple juice, or the store is named apple juice, or the store is located somewhere near a location called apple juice street. You might be wondering, don't this still produce a large number of queries, just like rule combo? Not necessarily, as with semantics search, the relevance is defined as a numeric value, if a document's features produce a closer relevance score with the keyword, it should be considered. What's more, the underlying model powering semantic search is able to learn and evolve on its own, saving you bunch of efforts to tune your combo rules.

I will try to cover more on semantic search in later posts.



Tip

继续上周那个关于mysql回表的话题,本周遇到一个slow query:

select count(A.id)
from
A left join B on A.mid = B.mid
where B.id between 0 and 10000
and A.id between 0 and 10000
and B.status = 'ACTIVE'
and A.status = 'ACTIVE'
and B.country = 13

尝试增加了联合索引 on A (mid, status)以后,发现query快了一倍,目测是因为index省掉了回表的步骤:

cnblogs.com/myseries/p/11265849.html

这里我们需要的是count(A.id), 并没有去拿A里面的字段,因此用了index以后,不再需要回表。

Share

https://www.infoq.cn/article/ug9Uc8XapBfTQOEqXEVv

highlights : 管理游戏化、业务敏捷化、组织社区化

团队的年轻化,使得成员之间的认同感更加重要,管理游戏化这个想法很契合当下年轻人的心态。

用户头像

锈蠢刀

关注

还未添加个人签名 2018.12.25 加入

还未添加个人简介

评论

发布
暂无评论
ARTS week 3