写点什么

实战 Elasticsearch6 的 join 类型

作者:程序员欣宸
  • 2022 年 9 月 07 日
    广东
  • 本文字数:5678 字

    阅读完需:约 19 分钟

实战Elasticsearch6的join类型

欢迎访问我的 GitHub

这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos

本篇概览

  • 《Elasticsearch 实战》(英文名 Elasticsearch IN ACTION)是经典 es 教程,对应 demo 源码地址为:https://github.com/dakrone/elasticsearch-in-action ,最新分支 6.x,在使用源码时,发现索引_doc 的静态映射脚本增加了一个类型为 join 的字段,如下所示,:


"mappings" : {    "_doc" : {      "_source" : {        "enabled" : true      },      "properties" : {        "relationship_type": {          "type": "join",          "relations" : {            "group": "event"          }        },        ...
复制代码


  • 这是 es6 新增的类型,一起来通过实战学习这个 join;

环境信息

  1. 操作系统:Ubuntu 18.04.2 LTS

  2. elasticsearch:6.7.1

  3. kibana:6.7.1

《Elasticsearch 实战》demo 源码下载地址

  • 本文用到的源码一共两个文件,一个是创建静态映射的 mapping.json, 另一个是创建文档的 populate.sh , 地址分别如下:


  1. https://github.com/dakrone/elasticsearch-in-action/blob/6.x/mapping.json

  2. https://github.com/dakrone/elasticsearch-in-action/blob/6.x/populate.sh


  • 上述文件的用法:下载到同一个目录,执行命令**./populate.sh 192.168.1.101:9200** ,"192.168.1.101:9200"是 es6 的 http 地址和端口;

官方说法

  • 官方对 join 类型的说明如下:


  • 我的理解:


  1. join 类型用于建立索引内文档的父子关系;

  2. 用父子文档的名字来表示关系;


  • 接下来看看《Elasticsearch 实战》的 demo 中是怎么使用这个字段的;

《Elasticsearch 实战》的 demo

  • demo 中部分文档的创建脚本如下所示:


curl -s -XPOST "$ADDRESS/get-together/_doc/1" -H'Content-Type: application/json' -d'{  "relationship_type": "group",  "name": "Denver Clojure",  "organizer": ["Daniel", "Lee"],  "description": "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",  "created_on": "2012-06-15",  "tags": ["clojure", "denver", "functional programming", "jvm", "java"],  "members": ["Lee", "Daniel", "Mike"],  "location_group": "Denver, Colorado, USA"}'
curl -s -XPOST "$ADDRESS/get-together/_doc/100?routing=1" -H'Content-Type: application/json' -d'{ "relationship_type": { "name": "event", "parent": "1" }, "host": ["Lee", "Troy"], "title": "Liberator and Immutant", "description": "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.", "attendees": ["Lee", "Troy", "Daniel", "Tom"], "date": "2013-09-05T18:00", "location_event": { "name": "Stoneys Full Steam Tavern", "geolocation": "39.752337,-105.00083" }, "reviews": 4}'
复制代码


  • 如上所示,id 为 1 的记录,其 relationship_type 字段的值为"group",id 为 2 的记录,relationship_type 字段的值不是字符串,而是对象,parent 为 1 表示父文档 id 为 1,name 为"event"表示父子关系是"group:event"类型;

  • 注意:上述第二个文档的地址中携带了 routing 参数,以保持父子在同一个分片,这是在使用 join 类型是要格外注意的地方;

  • 接下来,确保前面提到的 populate.sh 脚本已经执行,使得_doc 索引及其文档数据在 es 环境中准备好,就可以实战了,实战环境是 Kibana 的 Det Tools:

查找所有父类型为"group"的文档(结果是子文档):

  • 执行如下脚本:


GET get-together/_search{  "query": {    "has_parent": {      "parent_type": "group",      "query": {        "match_all": {}      }    }  }}
复制代码


  • 可以得到所有父类型为"group"的子文档:


{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 2,    "successful" : 2,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 15,    "max_score" : 1.0,    "hits" : [      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "106",        "_score" : 1.0,        "_routing" : "3",        "_source" : {          "relationship_type" : {            "name" : "event",            "parent" : "3"          },          "host" : "Mik",          "title" : "Social management and monitoring tools",          "description" : "Shay Banon will be there to answer questions and we can talk about management tools.",          "attendees" : [            "Shay",            "Mik",            "John",            "Chris"          ],          "date" : "2013-03-06T18:00",          "location_event" : {            "name" : "Quid Inc",            "geolocation" : "37.798442,-122.399801"          },          "reviews" : 5        }      },      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "107",        "_score" : 1.0,        "_routing" : "3",        "_source" : {          "relationship_type" : {            "name" : "event",            "parent" : "3"          },          "host" : "Mik",          "title" : "Logging and Elasticsearch",          "description" : "Get a deep dive for what Elasticsearch is and how it can be used for logging with Logstash as well as Kibana!",          "attendees" : [            "Shay",            "Rashid",            "Erik",            "Grant",            "Mik"          ],          "date" : "2013-04-08T18:00",          "location_event" : {            "name" : "Salesforce headquarters",            "geolocation" : "37.793592,-122.397033"          },          "reviews" : 3        }      },     ...
复制代码

查找所有子类型为"event"的文档(结果是父文档)

  • 执行如下脚本:


GET get-together/_search{  "query": {    "has_child": {      "type": "event",      "query": {        "match_all": {}      }    }  }}
复制代码


  • 可以得到所有子类型为"event"的文档:


{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 2,    "successful" : 2,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 5,    "max_score" : 1.0,    "hits" : [      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "3",        "_score" : 1.0,        "_source" : {          "relationship_type" : "group",          "name" : "Elasticsearch San Francisco",          "organizer" : "Mik",          "description" : "Elasticsearch group for ES users of all knowledge levels",          "created_on" : "2012-08-07",          "tags" : [            "elasticsearch",            "big data",            "lucene",            "open source"          ],          "members" : [            "Lee",            "Igor"          ],          "location_group" : "San Francisco, California, USA"        }      },      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "1",        "_score" : 1.0,        "_source" : {          "relationship_type" : "group",          "name" : "Denver Clojure",          "organizer" : [            "Daniel",            "Lee"          ],          "description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",          "created_on" : "2012-06-15",          "tags" : [            "clojure",            "denver",            "functional programming",            "jvm",            "java"          ],          "members" : [            "Lee",            "Daniel",            "Mike"          ],          "location_group" : "Denver, Colorado, USA"        }      },     ...
复制代码

查找 parent 的 id 等于 1 的子文档

  • 执行如下脚本:


GET get-together/_search{  "query": {    "parent_id": {      "type": "event",      "id": "1"    }  }}
复制代码


  • 可以得到所有 parent 的 id 等于 1 的子文档:


{  "took" : 0,  "timed_out" : false,  "_shards" : {    "total" : 2,    "successful" : 2,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 3,    "max_score" : 1.3291359,    "hits" : [      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "100",        "_score" : 1.3291359,        "_routing" : "1",        "_source" : {          "relationship_type" : {            "name" : "event",            "parent" : "1"          },          "host" : [            "Lee",            "Troy"          ],          "title" : "Liberator and Immutant",          "description" : "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",          "attendees" : [            "Lee",            "Troy",            "Daniel",            "Tom"          ],          "date" : "2013-09-05T18:00",          "location_event" : {            "name" : "Stoneys Full Steam Tavern",            "geolocation" : "39.752337,-105.00083"          },          "reviews" : 4        }      },      ...
复制代码

用 script_fields 简化返回内容

  • 前面的查询,返回的内容是整个_source,如果不需要全部内容,可以用 script_fields 来简化;

  • 查找所有父文档 ID 等 1 的的子文档,并且返回内容只有三个字段:父文档 ID、子文档 ID、子文档 title 字段:


GET get-together/_search{   "query": {    "parent_id": {      "type": "event",      "id": "1"    }  },  "script_fields":{      "group_id":{        "script":{          "source":"doc['relationship_type#group']"        }      },"event_id":{        "script":{          "source":"doc['_id']"        }      },      "title":{        "script":"params['_source']['title']"      }    }}
复制代码


  • 得到结果如下:


{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 2,    "successful" : 2,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 3,    "max_score" : 1.3291359,    "hits" : [      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "100",        "_score" : 1.3291359,        "_routing" : "1",        "fields" : {          "event_id" : [            "100"          ],          "title" : [            "Liberator and Immutant"          ],          "group_id" : [            "1"          ]        }      },      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "101",        "_score" : 1.3291359,        "_routing" : "1",        "fields" : {          "event_id" : [            "101"          ],          "title" : [            "Sunday, Surly Sunday"          ],          "group_id" : [            "1"          ]        }      },      {        "_index" : "get-together",        "_type" : "_doc",        "_id" : "102",        "_score" : 1.3291359,        "_routing" : "1",        "fields" : {          "event_id" : [            "102"          ],          "title" : [            "10 Clojure coding techniques you should know, and project openbike"          ],          "group_id" : [            "1"          ]        }      }    ]  }}
复制代码

聚合

  • 执行以下查询,会将所有父文档为 group 的子文档做桶聚合聚合:


GET get-together/_search{  "query": {    "has_parent": {      "parent_type": "group",      "query": {        "match_all": {}      }    }  },   "aggs":{      "parents":{        "terms":{          "field":"relationship_type#group"        }      }    }}
复制代码


  • 得到的结果如下,按照父文档 ID 得到聚合结果:


"aggregations" : {    "parents" : {      "doc_count_error_upper_bound" : 0,      "sum_other_doc_count" : 0,      "buckets" : [        {          "key" : "1",          "doc_count" : 3        },        {          "key" : "2",          "doc_count" : 3        },        {          "key" : "3",          "doc_count" : 3        },        {          "key" : "4",          "doc_count" : 3        },        {          "key" : "5",          "doc_count" : 3        }      ]    }  }}
复制代码


  • 以上就是 join 类型的主要实战内容了,希望能帮助您理解这个新的类型;

欢迎关注 InfoQ:程序员欣宸

学习路上,你不孤单,欣宸原创一路相伴...


发布于: 刚刚阅读数: 4
用户头像

搜索"程序员欣宸",一起畅游Java宇宙 2018.04.19 加入

前腾讯、前阿里员工,从事Java后台工作,对Docker和Kubernetes充满热爱,所有文章均为作者原创,个人Github:https://github.com/zq2599/blog_demos

评论

发布
暂无评论
实战Elasticsearch6的join类型_elasticsearch_程序员欣宸_InfoQ写作社区