写点什么

ARTS week 4

用户头像
锈蠢刀
关注
发布于: 2020 年 08 月 19 日

Reviews

Link to original article

design decisions that made sense under light loads are now suddenly technical debt

I think this quote should strike folks who has been involved in a org-wise / company-wise cultural transition from do-it-fast to do-it-right as a great summary. Being through such transition should be considered as a lucky thing, as it could mean that your org / company is becoming more mature, which is a positive signal of moving towards the next phase, whatever it is.

Out of all six practical advices given in the article, my favorite one is `Slow services are more evil than failed services`. Instead of waiting for a slow dependency, you'd rather want it to tell you simply "I failed, try again later or just give up", why?

  1. We live in a fail-fast world today, industry is no exception. Users are growing more tolerable on individual failures and trust me they know how to retry from their end.

  2. Slow requests are most likely to fail, most real-time systems today are built with tight SLA, it's highly possible that the slow request would violate someone's SLA in the upstream chain. Hence most of the time it's a better choice to fail immediately to allow for graceful degradation.

  3. Your resources will suffer if requests are hanging there waiting, CPU, RAM, threads, goroutines.



Tip

这里想总结记录一下ElasticSearch的基础知识点(零散),最近发现不光是国内,到处都缺精通ES的工程师



  1. ElasticSearch query DSL

很好的ElasticSearch DSL入门文章,DSL对大多数人来说有非常陡峭的学习曲线,此文章由浅入深,适合新手。

ES query总体划分成三种

1. Filtering by exact values

2. Searching on analyzed text (昂贵)

3. A combination of the two



  1. 常用Clause

match, 字段如果是analyzed text,会分析text relevance, 否则等同于filtering by exact values
{ "match": { "description": "Fourier analysis signals processing" }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "visible": true }}

match_all, 相当于select *, 不太常用
{ "match_all": {} }

terms, 基本用于filter by exact values, terms内部是OR的关系而不是AND!
{ "term": { "tag": "math" }}
{ "terms": { "tag": ["math", "statistics"] }}

multi_match, 多字段版本的match
{
"multi_match": {
"query": "probability theory",
"fields": ["title", "body"]
}
}

exists/missing
{
"exists" : {
"field" : "title"
}
}
{
"missing" : {
"field" : "title"
}
}

range,范围查询,常用于地理位置查询,它相当于一个filter,不计入评分的
{ "range" : { "age" : { "gt" : 30 } } }
{
"range": {
"born" : {
"gte": "01/01/2012",
"lte": "2013",
"format": "dd/MM/yyyy||yyyy"
}
}
}




  1. Compound query,业界基本都用,业务逻辑往往只能翻译成compound query

  • bool

{
"bool": {
"must": { "term": { "tag": "math" }},
"must_not": { "term": { "tag": "probability" }},
"should": [
{ "term": { "favorite": true }},
{ "term": { "unread": true }}
]
}
}
find all posts about math,
that is not probability,
where it is either unread or has been favorited

bool query内部的每一个子Clause必须有一个occurence(must, must not, should), 前两个先按下不表,主要说一下should:

In a query context, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document

简单来说should的优先级没有must/must not高



  • dis_max

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_best_fields.html

一句话总结,dis_max是一种组合查询的方式,将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回.

一种更好的理解方式是,dis_max底下会带一帮sub query,对于一个文档,它可能会满足多个sub query,这时此文档的最终评分会在所有满足的sub query里面取一个均值,这里会有不合理的地方,假如文档和一个sub query特别匹配怎么办,评分会被别的sub query摊薄,而dis_max不会做取均值这个操作,而是把最匹配的那个sub query的评分作为文档的最终评分。



用户头像

锈蠢刀

关注

还未添加个人签名 2018.12.25 加入

还未添加个人简介

评论

发布
暂无评论
ARTS week 4