ARTS week 3
This article is about building a simple semantic search engine from scratch and is super helpful and intuitive to folks who don't have context what semantic search means.
My impression of semantic search is that, it's one step further than naive text matching. What I mean by naive text matching is that you explicitly define rules for matches, e.g. for keyword "apple juice", there could be a simple rule that's saying "give me all items with name being apple juice", with this rule, your search returns some results.
In real systems without semantic understanding, we usually see a combination of such rules, e.g. "give me all items with name being apple juice + those with name containing apple juice + those with name containing apple and juice + those with name containing apple + those with name containing juice...", if we use elasticsearch as our underlying storage, each rule in above combo translates into a ES subquery.
The limitations with this approach are quite obvious :
In real life, string match isn't good enough, when I type apple juice I might actually mean a bar that sells apple juice, hence above combo rule won't work as apple juice is hidden in that bar's menu, not name.
Defining an exhausted list of rules isn't scalable and introduces unnecessary waste of your resources. Using above combo rule as an example, it's highly possible that those queries return same, or highly similar results.
Lack of understanding of keyword semantic leaves you no choice but to relying on large and dummy rules.
So what question is semantic search trying to answer? Still using "apple juice" as an example, a search engine that understands semantics would think like "looks like you want apple juice, so let me find out the stores that are most relevant with apple juice", such relevance could mean the store sells apple juice, or the store is named apple juice, or the store is located somewhere near a location called apple juice street. You might be wondering, don't this still produce a large number of queries, just like rule combo? Not necessarily, as with semantics search, the relevance is defined as a numeric value, if a document's features produce a closer relevance score with the keyword, it should be considered. What's more, the underlying model powering semantic search is able to learn and evolve on its own, saving you bunch of efforts to tune your combo rules.
I will try to cover more on semantic search in later posts.
尝试增加了联合索引 on A (mid, status)以后，发现query快了一倍，目测是因为index省掉了回表的步骤：
highlights ： 管理游戏化、业务敏捷化、组织社区化