Elasticsearch 聚合学习之一：基本操作

作者：爱好编程进阶

2022 年 5 月 13 日
本文字数：1800 字
阅读完需：约 6 分钟

"price" : {

"type" : "long"

"sold" : {

"type" : "date"

}

导入数据：

POST /cars/transactions/_bulk

{ "index": {}}

{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }

{ "index": {}}

{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }

{ "index": {}}

{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }

{ "index": {}}

{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

通过 head 插件看到新建的索引 cars 的所有数据如下图，例如第一条记录，表示售价 30000，汽车颜色是绿色，品牌是 ford，销售时间是 2014 年 5 月 8 日：

[](()最简单的聚合：terms 桶

第一个聚合命令是 terms 桶，相当于 SQL 中的 group by，将所有记录按照颜色聚合，执行以下查询命令：

GET /cars/transactions/_search

{

"size":0,

"aggs":{

"popular_colors":{

"terms": {

"field": "color"

}

收到响应如下：

{

"took" : 1,

" 《一线大厂 Java 面试题解析+后端开发学习笔记+最新架构讲解视频+实战项目源码讲义》无偿开源威信搜索公众号【编程进阶路】 timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : 8,

"max_score" : 0.0,

"hits" : [ ]

"aggregations" : {

"popular_colors" : {

"doc_count_error_upper_bound" : 0,

"sum_other_doc_count" : 0,

"buckets" : [

{

"key" : "red",

"doc_count" : 4

{

"key" : "blue",

"doc_count" : 2

{

"key" : "green",

"doc_count" : 2

}

]

}

现在对查询命令中的参数做出解释：

size 设置为 0，这样返回的 hits 字段为空（hits 不是我们本次查询关心的内容），这样可以提高查询速度；
aggs：聚合操作都被至于 aggs 之下，注意 aggs 是顶层参数，另外使用 aggregations 替代 aggs 也可以；
popular_colors：为聚合的类型指定名称，本次是按照颜色来聚合的，所以起名为 popular_colors，响应内容中可以看到该字段的聚合结果如下：

aggregations" : {

"popular_colors" : {

"doc_count_error_upper_bound" : 0,

"sum_other_doc_count" : 0,

"buckets" : [

{

"key" : "red",

"doc_count" : 4

{

"key" : "blue",

"doc_count" : 2

...

terms：在聚合的时候，桶的类型有很多种，terms 是常用的一种，作用是按照指定字段来聚合，例如本例指定了 color 字段，所以所有 color 为 red 的文档聚合到一个桶，green 的文档聚合到另一个桶，实际上桶类型是有很多种的，常见的类型在后面的实战中会用到，更多详细内容请参考官方文档：[https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-aggregations-bucket.html](()
field 的值就是 terms 桶指定的聚合字段，这里是 color 字段；
接下来看看返回的信息，aggregations 就是聚合结果，popular_colors 是我们指定的别名，buckets 是个 json 数组，里面的每个 json 对象都是一个桶，里面的 doc_count 就是记录数；例如结果中的第一条记录就是红色汽车的销售记录；

[](()添加度量指标

上面的示例返回的是每个桶中的文档数量，接下 es 支持丰富的指标，例如平均值(Avg)、最大值(Max)、最小值(Min)、累加和(Sum)等，接下来试试累加和的用法；
下面请求的作用是统计每种颜色汽车的销售总额：

GET /cars/transactions/_search

{

"size":0,

"aggs":{

"colors":{

"terms": {

"field": "color"

"aggs":{

"sales":{

"sum":{

"field":"price"

}

收到响应如下：

{

"took" : 17,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : 8,

"max_score" : 0.0,

"hits" : [ ]

"aggregations" : {

"colors" : {

"doc_count_error_upper_bound" : 0,

"sum_other_doc_count" : 0,

"buckets" : [

{

"key" : "red",

"doc_count" : 4,

"sales" : {

"value" : 130000.0

}

{

"key" : "blue",

"doc_count" : 2,

"sales" : {

"value" : 40000.0

}

{

"key" : "green",

"doc_count" : 2,

"sales" : {

"value" : 42000.0

}

发布于: 刚刚阅读数: 3

爱好编程进阶

关注

还未添加个人签名 2022.04.13 加入

还未添加个人简介

发布

暂无评论

创作场景

Elasticsearch 聚合学习之一：基本操作

[](()最简单的聚合：terms 桶

[](()添加度量指标

爱好编程进阶

评论