写点什么

elasticsearch 实战三部曲之三:搜索操作

  • 2022 年 4 月 19 日
  • 本文字数:3712 字

    阅读完需:约 12 分钟

"description": "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "


}


}


]


}


}


  1. 如果我们的本意是只要"Core Java"的匹配结果,上面的结果显然是不符合要求的,此时可以给查询条件加个"operator":"and"属性,就会查询匹配了所有关键词的文档,注意 json 的结构略有变化,以前 title 的属性是搜索条件,现在变成了一个 json 对象,里面的 query 属性是原来的搜索条件:


GET englishbooks/_search


{


"query":{


"match":{


"title":{


"query":"Core Java",


"operator":"and"


}


}


}


}


这次的搜索结果就是同时匹配了"core"和"java"两个词项的记录了(为什么 core 和 java 是小写? 因为"Core Java"被分词后改为了小写,再去搜索的):


{


"took": 11,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 1,


"max_score": 0.5753642,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "3",


"_score": 0.5753642,


"_source": {


"id": "3",


"title": "Core Java",


"language": "java",


"author": "Horstmann",


"price": 85.9,


"publish_time": "2016-06-01",


"description": "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "


}


}


]


}


}

[](()match_phrase 搜索

match_phrase 搜索和前面的 match 搜索相似,并且有以下两个特点:


  1. 分词后的所有词项都要匹配上,也就是前面的"operator":"and"属性的效果;

  2. 分析后的词项顺序要和搜索字段的顺序一致,才能匹配上;


GET englishbooks/_search


{


"query":{


"match_phrase":{"title":"Core Java"}


}


}


上述查询可以搜索到结果,但如果将"Core Java"改成"Java Core"就搜不到结果了,但是 match query 用"Java Core"是可以搜到结果的;

[](()match_phrase_prefix 搜索

match_phrase_prefix 的功能和前面的 match_phrase 类似,不过 match_phrase_prefix 支持最后一个词项做前缀匹配,如下所示,"Core J"这个搜索条件用 match_phrase 是搜不到结果的,但是 match_phrase_prefix 可以,因为"J"可以作为前缀和"Java"匹配:


GET englishbooks/_search


{


"query":{


"match_phrase":{"title":"Core J"}


}


}

[](()multi_match 搜素

multi_match 是在 match 的基础上支持多字段搜索,以下查询就是用"1986"和"deep"这两个词项,同时搜索 title 和 description 两个字段:


GET englishbooks/_search


{


"query":{


"multi_match":{


"query":"1986 deep",


"fields":["title", "description"]


}


}


}


响应如下,可见 title 和 description 中含有词项"1986"或者"deep"的文档都被返回了:


{


"took": 4,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 2,


"max_score": 0.79237825,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "2",


"_score": 0.79237825,


"_source": {


"id": "2",


"title": "Compilers",


"language": "c",


"author": "Alfred V.Aho",


"price": 62.5,


"publish_time": "2011-01-01",


"description": "In the time since the 1986 edition of this book, the world of compiler designhas changed significantly."


}


},


{


"_index": "englishbooks",


"_type": "IT",


"_id": "1",


"_score": 0.2876821,


"_source": {


"id": "1",


"title": "Deep Learning",


"language": "python",


"author": "Yoshua Bengio",


"price": 549,


"publish_time": "2016-11-18",


"description": "written by three experts in the field, deep learning is the only comprehensive book on the subject."


}


}


]


}


}

[](()terms query

terms 是 term 查询的升级,用来查询多个词项:


GET englishbooks/_search


{


"query":{


"terms":{


"title":["deep", "core"]


}


}


}


响应如下,title 中含有 deep 和 core 的文档都被查到:


{


"took": 5,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 2,


"max_score": 1,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "1",


"_score": 1,


"_source": {


"id": "1",


"title": "Deep Learning",


"language": "python",


"author": "Yoshua Bengio",


"price": 549,


"publish_time": "2016-11-18",


"description": "written by three experts in the field, deep learning is the only comprehensive book on the subject."


}


},


{


"_index": "englishbooks",


"_type": "IT",


"_id": "3",


"_score": 1,


"_source": {


"id": "3",


"title": "Core Java",


"language": "java",


"author": "Horstmann",


"price": 85.9,


"publish_time": "2016-06-01",


"description": "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "


}


}


]


}


}

[](()范围查询

range query 是范围查询,例如查询 publish_time 在"2016-01-01"到"2016-12-31"之间的文档:


GET englishbooks/_search


{


"query":{


"range":{


"publish_time":{


"gte":"2016-01-01",


"lte":"2016-12 《一线大厂 Java 面试题解析+后端开发学习笔记+最新架构讲解视频+实战项目源码讲义》开源 -31",


"format":"yyyy-MM-dd"


}


}


}


}


篇幅所限,此处略去返回结果;

[](()exists query

exists query 返回的是字段中至少有一个非空值的文档:


GET englishbooks/_search


{


"query":{


"exists":{


"field":"author"


}


}


}

[](()前缀查询

用于查询某个字段是否以给定前缀开始:


GET englishbooks/_search


{


"query":{


"prefix":{


"title":"cor"


}


}


}


以上请求可以查到 title 字段为"Core Java"的文档:


{


"took": 6,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 1,


"max_score": 1,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "3",


"_score": 1,


"_source": {


"id": "3",


"title": "Core Java",


"language": "java",


"author": "Horstmann",


"price": 85.9,


"publish_time": "2016-06-01",


"description": "The book is aimed at experienced programmers who want to learn how to write useful Java applications and applets. "


}


}


]


}


}

[](()通配符查询

以下查询,可以搜到 title 字段中含有"core"的文档,另外需要注意的是,"?“匹配一个字符,”*"匹配零个或者多个字符:


GET englishbooks/_search


{


"query":{


"wildcard":{


"title":"cor?"


}


}


}

[](()正则表达式

使用属性 regexp 可以进行正则表达式查询,例如查找 description 字段带有 4 位数字的分词的文档:


GET englishbooks/_search


{


"query":{


"regexp":{


"description":"[0-9]{4}"


}


}


}


查找结果如下,description 字段中带有数字 1986:


{


"took": 4,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 1,


"max_score": 1,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "2",


"_score": 1,


"_source": {


"id": "2",


"title": "Compilers",


"language": "c",


"author": "Alfred V.Aho",


"price": 62.5,


"publish_time": "2011-01-01",


"description": "In the time since the 1986 edition of this book, the world of compiler designhas changed significantly."


}


}


]


}


}

[](()模糊查询(fuzzy query)

fuzzy 是通过计算词项与文档的编辑距离来得到结果的,例如查找 description 字段还有分词"1986"的时候,不小心输入了"1987",通过 fuzzy 查询也能得到结果,只是得分变低了,请求内容如下所示:


GET englishbooks/_search


{


"query":{


"fuzzy":{


"description":"1987"


}


}


}


搜索到的文档如下所示,得分只有 0.5942837,低于用"1986"查询的 0.79237825:


{


"took": 5,


"timed_out": false,


"_shards": {


"total": 5,


"successful": 5,


"skipped": 0,


"failed": 0


},


"hits": {


"total": 1,


"max_score": 0.5942837,


"hits": [


{


"_index": "englishbooks",


"_type": "IT",


"_id": "2",


"_score": 0.5942837,


"_source": {


"id": "2",


"title": "Compilers",


"language": "c",


"author": "Alfred V.Aho",


"price": 62.5,


"publish_time": "2011-01-01",


"description": "In the time since the 1986 edition of this book, the world of compiler designhas changed significantly."


}


}


]


}


}


需要注意的是,fuzzy 查询时消耗资源较大;

[](()复合查询

常用到的复合查询是 bool query,可以用下表中的条件组合查询:


| 属性 | 作用 |


| --- | --- |


| must | 必须匹配,相当于 SQL 中的 AND |


| should | 可以匹配,相当于 SQL 中的 OR |


| must_not | 必须不匹配 |


| filter | 和 must 一样,但是不评分 |


以下条件,搜索的是 title 中带有 java,但是不包含 core 的文档:


GET englishbooks/_search


{


"query":{


"bool":{


"must":{


"term":{"title":"java"}


},


"must_not":[


{"term":{"title":"core"}}


]


}


}


}


得到的文档中,带有 core 词项的已经被过滤了:


{


"took": 3,


"timed_out": false,


"_shards": {


"total": 5,

总结

如果你选择了 IT 行业并坚定的走下去,这个方向肯定是没有一丝问题的,这是个高薪行业,但是高薪是凭自己的努力学习获取来的,这次我把 P8 大佬用过的一些学习笔记(pdf)都整理在本文中了


《Java 中高级核心知识全面解析》



小米商场项目实战,别再担心面试没有实战项目:



用户头像

还未添加个人签名 2022.04.13 加入

还未添加个人简介

评论

发布
暂无评论
elasticsearch实战三部曲之三:搜索操作_Java_爱好编程进阶_InfoQ写作平台