Elasticsearch 聚合学习之一：基本操作

作者：程序员欣宸

2022 年 9 月 13 日
广东
本文字数：3780 字
阅读完需：约 12 分钟

欢迎访问我的 GitHub

这里分类和汇总了欣宸的全部原创(含配套源码)：https://github.com/zq2599/blog_demos

本篇概览

聚合是我们在使用 elasticsearch 服务时常用的功能，从本篇起，一起通过实战来学习和掌握聚合的有关知识；

系列文章列表

关于聚合

通过搜索，我们可找到匹配查询条件的文档集；
通过聚合，我们会得到一个数据的概念，以汽车销售信息为例，以下都是聚合数据：

有多少中颜色；
每辆车的平均价格是多少；
按照汽车的颜色来划分，每个颜色的销售量是多少；

学习 Elasticsearch 聚合的第一步就是理解两个概念：桶(Buckets)和指标(Metrics)

桶(Buckets)

桶是指满足特定条件的文档的集合，例如按照汽车颜色分类，如下图，每个颜色都有一个桶，里面放的是所有这个颜色的文档：

指标(Metrics)

指标是对桶内的文档进行统计计算，如统计红色汽车的数量、最低价、最高价、平均售价、总销售额等，这些都是根据桶中的文档的值来计算的；
基本概念有所了解后一起通过实战来学习和掌握聚合的知识；

环境信息

以下是本次实战的环境信息，请确保您的 Elasticsearch 可以正常运行：

操作系统：Ubuntu 18.04.2 LTS
JDK：1.8.0_191
Elasticsearch：6.7.1
Kibana：6.7.1

导入实战数据

本次实战用到的数据来自《Elasticsearch 权威指南》的示例；
实战会用到名为 cars 的索引，里面的每个文档是一条汽车销售记录，具体字段定义如下：

通过静态映射的方式来创建索引，在 Kibana 的 Dev Tools 页面执行以下命令，就会创建 cars 索引和 transactions 类型，并且指定了每个字段的定义：

PUT /cars
{  "mappings" : {      "transactions" : {        "properties" : {          "color" : {            "type" : "keyword"          },          "make" : {            "type" : "keyword"          },          "price" : {            "type" : "long"          },          "sold" : {            "type" : "date"          }        }      }    }}

复制代码

导入数据：

POST /cars/transactions/_bulk{ "index": {}}{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }{ "index": {}}{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }{ "index": {}}{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }{ "index": {}}{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }{ "index": {}}{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }{ "index": {}}{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }{ "index": {}}{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }{ "index": {}}{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

复制代码

通过 head 插件看到新建的索引 cars 的所有数据如下图，例如第一条记录，表示售价 30000，汽车颜色是绿色，品牌是 ford，销售时间是 2014 年 5 月 8 日：

最简单的聚合：terms 桶

第一个聚合命令是 terms 桶，相当于 SQL 中的 group by，将所有记录按照颜色聚合，执行以下查询命令：

GET /cars/transactions/_search{  "size":0,  "aggs":{   "popular_colors":{     "terms": {       "field": "color"     }   }   }}

复制代码

收到响应如下：

{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 8,    "max_score" : 0.0,    "hits" : [ ]  },  "aggregations" : {    "popular_colors" : {      "doc_count_error_upper_bound" : 0,      "sum_other_doc_count" : 0,      "buckets" : [        {          "key" : "red",          "doc_count" : 4        },        {          "key" : "blue",          "doc_count" : 2        },        {          "key" : "green",          "doc_count" : 2        }      ]    }  }}

复制代码

现在对查询命令中的参数做出解释：
size 设置为 0，这样返回的 hits 字段为空（hits 不是我们本次查询关心的内容），这样可以提高查询速度；
aggs：聚合操作都被至于 aggs 之下，注意 aggs 是顶层参数，另外使用 aggregations 替代 aggs 也可以；
popular_colors：为聚合的类型指定名称，本次是按照颜色来聚合的，所以起名为 popular_colors，响应内容中可以看到该字段的聚合结果如下：

aggregations" : {    "popular_colors" : {      "doc_count_error_upper_bound" : 0,      "sum_other_doc_count" : 0,      "buckets" : [        {          "key" : "red",          "doc_count" : 4        },        {          "key" : "blue",          "doc_count" : 2        },        ...

复制代码

terms：在聚合的时候，桶的类型有很多种，terms 是常用的一种，作用是按照指定字段来聚合，例如本例指定了 color 字段，所以所有 color 为 red 的文档聚合到一个桶，green 的文档聚合到另一个桶，实际上桶类型是有很多种的，常见的类型在后面的实战中会用到，更多详细内容请参考官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-aggregations-bucket.html
field 的值就是 terms 桶指定的聚合字段，这里是 color 字段；
接下来看看返回的信息，aggregations 就是聚合结果，popular_colors 是我们指定的别名，buckets 是个 json 数组，里面的每个 json 对象都是一个桶，里面的 doc_count 就是记录数；例如结果中的第一条记录就是红色汽车的销售记录；

添加度量指标

上面的示例返回的是每个桶中的文档数量，接下 es 支持丰富的指标，例如平均值(Avg)、最大值(Max)、最小值(Min)、累加和(Sum)等，接下来试试累加和的用法；
下面请求的作用是统计每种颜色汽车的销售总额：

GET /cars/transactions/_search{  "size":0,  "aggs":{   "colors":{     "terms": {       "field": "color"     },     "aggs":{       "sales":{         "sum":{           "field":"price"         }       }     }   }   }}

复制代码

收到响应如下：

{  "took" : 17,  "timed_out" : false,  "_shards" : {    "total" : 5,    "successful" : 5,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 8,    "max_score" : 0.0,    "hits" : [ ]  },  "aggregations" : {    "colors" : {      "doc_count_error_upper_bound" : 0,      "sum_other_doc_count" : 0,      "buckets" : [        {          "key" : "red",          "doc_count" : 4,          "sales" : {            "value" : 130000.0          }        },        {          "key" : "blue",          "doc_count" : 2,          "sales" : {            "value" : 40000.0          }        },        {          "key" : "green",          "doc_count" : 2,          "sales" : {            "value" : 42000.0          }        }      ]    }  }}

复制代码

请求参数和第一次的请求相比，按颜色聚合的参数不变，但是内部多了个 aggs 对象，详细的说明如下：

GET /cars/transactions/_search{  "size":0,  "aggs":{         ------和前面一样，指定聚合操作   "colors":{      ------别名     "terms": {    ------桶类型是按指定字段聚合       "field": "color" ------按照color字段聚合     },     "aggs":{      ------新增的aggs对象，用于处理聚合在每个桶内的文档       "sales":{   ------别名         "sum":{   ------度量指标是指定字段求和           "field":"price" ---求和的字段是price         }       }     }   }   }}

复制代码

对响应的数据说明如下：

 "aggregations" : {               ------聚合结果    "colors" : {      "doc_count_error_upper_bound" : 0,      "sum_other_doc_count" : 0,      "buckets" : [               ------这个json数组的每个对象代表一个桶        {          "key" : "red",        ------该桶将所有color等于red的文档聚合进来          "doc_count" : 4,      ------有4个color等于red的文档          "sales" : {           ------这里面是sum计算后的结果              "value" : 130000.0  ------所有color等于red的汽车销售总额          }        },        {          "key" : "blue",          "doc_count" : 2,          "sales" : {            "value" : 40000.0  ------所有color等于blue的汽车销售总额          }        },

复制代码

对于其他度量类型和 sum 也是相似的，您可以参考官方文档了解更多信息：https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-aggregations.html
至此，Elasticsearch6 的基本聚合操作就完成了，接下来的文章我们会接触到更复杂的聚合操作；

欢迎关注 InfoQ：程序员欣宸

学习路上，你不孤单，欣宸原创一路相伴...

发布于: 刚刚阅读数: 3

原文链接:【http://xie.infoq.cn/article/227b10d195c36a62da6644f2d】。文章转载请联系作者。

程序员欣宸

关注

搜索"程序员欣宸"，一起畅游Java宇宙 2018.04.19 加入

前腾讯、前阿里员工，从事Java后台工作，对Docker和Kubernetes充满热爱，所有文章均为作者原创，个人Github：https://github.com/zq2599/blog_demos

发布

暂无评论

创作场景

Elasticsearch 聚合学习之一：基本操作

欢迎访问我的 GitHub

本篇概览

系列文章列表

关于聚合

桶(Buckets)

指标(Metrics)

环境信息

导入实战数据

最简单的聚合：terms 桶

添加度量指标

欢迎关注 InfoQ：程序员欣宸

程序员欣宸

评论