本文是 MongoDB 新手入门 系列的第二篇,在本文中,我们将会讲解 MongoDB 的聚合框架,在看完本文后,读者可以掌握使用 MongoDB 进行常用的数据统计分析方法。
本文将会持续修正和更新,最新内容请参考我的 GITHUB 上的 程序猿成长计划 项目,欢迎 Star,更多精彩内容请 follow me。
简介
聚合管道(Aggregation Pipelines)中包含一个或多个用于处理文档的步骤(stages):
MongoDB 4.2 开始,可以使用聚合管道来更新文档了。
聚合管道的语法为
db.collection.aggregate( [ { <stage> }, ... ] )
复制代码
为了演示聚合管道的功能,我们现在 MongoDB 中创建一个 orders
集合,插入以下数据
db.orders.insertMany( [
{ _id: 0, name: "Pepperoni", size: "small", price: 19,
quantity: 10, date: ISODate( "2021-03-13T08:14:30Z" ) },
{ _id: 1, name: "Pepperoni", size: "medium", price: 20,
quantity: 20, date : ISODate( "2021-03-13T09:13:24Z" ) },
{ _id: 2, name: "Pepperoni", size: "large", price: 21,
quantity: 30, date : ISODate( "2021-03-17T09:22:12Z" ) },
{ _id: 3, name: "Cheese", size: "small", price: 12,
quantity: 15, date : ISODate( "2021-03-13T11:21:39.736Z" ) },
{ _id: 4, name: "Cheese", size: "medium", price: 13,
quantity:50, date : ISODate( "2022-01-12T21:23:13.331Z" ) },
{ _id: 5, name: "Cheese", size: "large", price: 14,
quantity: 10, date : ISODate( "2022-01-12T05:08:13Z" ) },
{ _id: 6, name: "Vegan", size: "small", price: 17,
quantity: 10, date : ISODate( "2021-01-13T05:08:13Z" ) },
{ _id: 7, name: "Vegan", size: "medium", price: 18,
quantity: 10, date : ISODate( "2021-01-13T05:10:13Z" ) }
] )
复制代码
下面的示例会计算两个日期之间每天的的披萨订单总价值和平均数量
// SQL:SELECT
// DATE_FORMAT(date, '%Y-%m-%d') AS _id,
// SUM(price * quantity) AS totalOrderValue,
// AVG(quantity) AS averageOrderQuantity
// FROM orders
// WHERE date >= '2020-01-30' AND date < '2022-01-30'
// GROUP BY DATE_FORMAT(date, '%Y-%m-%d')
// ORDER BY SUM(price * quantity) DESC
db.orders.aggregate( [
// Stage 1: 通过时间范围过滤披萨订单
{
$match:
{
"date": { $gte: new ISODate( "2020-01-30" ), $lt: new ISODate( "2022-01-30" ) }
}
},
// Stage 2: 对匹配的订单进行分组,并且计算总价值和平均数量
{
$group:
{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalOrderValue: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageOrderQuantity: { $avg: "$quantity" }
}
},
// Stage 3: 按照 totalOrderValue 对文档进行反向排序
{
$sort: { totalOrderValue: -1 }
}
] )
复制代码
命令输出如下所示
[
{ _id: '2022-01-12', totalOrderValue: 790, averageOrderQuantity: 30 },
{ _id: '2021-03-13', totalOrderValue: 770, averageOrderQuantity: 15 },
{ _id: '2021-03-17', totalOrderValue: 630, averageOrderQuantity: 30 },
{ _id: '2021-01-13', totalOrderValue: 350, averageOrderQuantity: 10 }
]
复制代码
系统变量
在聚合管道的步骤中可以使用系统变量或者用户自定义的变量,变量可以是任意的 BSON 类型数据,要访问变量的值,使用前缀 $$
, 如 $$<variable>
。如果变量引用的是一个对象,可以这样访问指定的字段 $$<variable>.<field>
。
MongoDB 中定义了以下系统变量
这里以 $$REMOVE
为例,说明系统变量的使用
db.books.aggregate( [
{
$project: {
title: 1,
"author.first": 1,
"author.last" : 1,
"author.middle": {
// 这里判断 $author.middle 是否为空,为空则将该字段移除,否则返回该字段
$cond: {
if: { $eq: [ "", "$author.middle" ] },
then: "$$REMOVE",
else: "$author.middle"
}
}
}
}
] )
复制代码
这里的 $cond
操作符用于计算一个 Bool 表达式,类似于编程语言中的三元运算符。
聚合管道中常用的步骤
在 db.collection.aggreagte()
方法中,除了 $out
,$merge
,$geoNear
之外,其它的 stage 都可以出现多次。
本文只对常用的几个 stage 进行重点介绍,它们分别是 $match
,$count
,$limit
,$project
,$lookup
,$group
,$facet
,$unwind
,$bucket
,$bucketAuto
。
文档过滤 $match
$match
用于过滤筛选文档,语法如下
在 MongoDB 中创建名为 articles
的集合
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b257"), "author" : "ahn", "score" : 60, "views" : 1000 }
{ "_id" : ObjectId("55f5a192d4bede9ac365b258"), "author" : "li", "score" : 55, "views" : 5000 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b259"), "author" : "annT", "score" : 60, "views" : 50 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25a"), "author" : "li", "score" : 94, "views" : 999 }
{ "_id" : ObjectId("55f5a1d3d4bede9ac365b25b"), "author" : "ty", "score" : 95, "views" : 1000 }
复制代码
执行查询
// SQL:SELECT * FROM articles WHERE author = "dave"
db.articles.aggregate(
[ { $match : { author : "dave" } } ]
);
复制代码
查询结果
{ "_id" : ObjectId("512bc95fe835e68f199c8686"), "author" : "dave", "score" : 80, "views" : 100 }
{ "_id" : ObjectId("512bc962e835e68f199c8687"), "author" : "dave", "score" : 85, "views" : 521 }
复制代码
文档计数 $count
$count
用于统计输入中的文档数量,语法如下
这里的 <string>
是输出字段的名称。
db.getCollection("orders").aggregate([
{ $match: {price: {$gt: 15}} },
{ $count: "price_gt_15_count" }
])
复制代码
输出
{"price_gt_15_count" : NumberInt(5) }
复制代码
文档数量限制 $limit
$limit
用于控制传递给下一个 stage 的文档数量,语法为
{ $limit: <positive 64-bit integer> }
复制代码
比如只返回 2 条数据
db.getCollection("orders").aggregate([{$limit: 2}])
复制代码
文档字段映射 $project
$project
用于控制文档中包含的字段,类似于 SQL 中的 AS
,它会把文档中指定的字段传递个下一个 stage。
语法为
{ $project: { <specification(s)> } }
复制代码
这里的 <specification(s)>
支持以下形式
查询订单,只返回名称和尺寸
// SQL:SELECT name, size FROM orders WHERE quantity > 20
db.orders.aggregate([
{ $match: { quantity: { $gt: 20 } } },
{ $project: { name: true, size: 1, _id: false } }
])
复制代码
返回值如下
{ "name" : "Pepperoni", "size" : "large" }
{ "name" : "Cheese", "size" : "medium" }
复制代码
左外连接 $lookup
$lookup
用于对同一个数据库中的集合进行 left outer join
操作。
单个 Join 条件的等值匹配
语法如下
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
复制代码
参数
from
: 指定要进行关联的集合,from
集合不能是分片集合
localField
:输入文档中用于关联的字段,localField
的值与 from
集合中的 foreignField
相等,如果输入文档中不包含 localField
,则该值为 null
foreignField
: 指定 from
集合中的关联字段,如果集合中没有该字段,则认为其为 null
as
: 指定要添加到输入文档中的数组字段名称。这个数组字段包含了 from
集合中匹配的文档。如果指定的字段名在输入文档中已经存在,则覆盖该字段
这个操作等价于以下的伪 SQL:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT *
FROM <collection to join>
WHERE <foreignField> = <collection.localField>
);
复制代码
我们先在 orders
集合中插入几个文档
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 },
{ "_id" : 3 }
] )
复制代码
然后创建另外一个 inventory
集合
db.inventory.insertMany( [
{ "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 },
{ "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 },
{ "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 },
{ "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 },
{ "_id" : 5, "sku": null, "description": "Incomplete" },
{ "_id" : 6 }
] )
复制代码
下面的查询使用 orders
集合来关联 inventory
集合,使用 item 和 sku 来进行关联
// SQL:SELECT *, inventory_docs
// FROM orders
// WHERE inventory_docs IN (
// SELECT *
// FROM inventory
// WHERE sku = orders.item
// );
db.orders.aggregate( [
{
$lookup:
{
from: "inventory",
localField: "item",
foreignField: "sku",
as: "inventory_docs"
}
}
] )
复制代码
该操作返回值如下
{
"_id" : 1,
"item" : "almonds",
"price" : 12,
"quantity" : 2,
"inventory_docs" : [
{ "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 }
]
}
{
"_id" : 2,
"item" : "pecans",
"price" : 20,
"quantity" : 1,
"inventory_docs" : [
{ "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 }
]
}
{
"_id" : 3,
"inventory_docs" : [
{ "_id" : 5, "sku" : null, "description" : "Incomplete" },
{ "_id" : 6 }
]
}
复制代码
联表后的集合上的 Join 条件和子查询
语法如下
{
$lookup:
{
from: <joined collection>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run on joined collection> ],
as: <output array field>
}
}
复制代码
参数:
该操作等价于下面的伪 SQL:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
SELECT <documents as determined from the pipeline>
FROM <collection to join>
WHERE <pipeline>
);
复制代码
我们首先创建下面两个集合
db.orders.insertMany( [
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 },
{ "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 },
{ "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 }
] )
db.warehouses.insertMany( [
{ "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 },
{ "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 },
{ "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 },
{ "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 },
{ "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 }
] )
复制代码
执行查询
// SQL: SELECT *, stockdata
// FROM orders
// WHERE stockdata IN (
// SELECT warehouse, instock
// FROM warehouses
// WHERE stock_item = orders.item
// AND instock >= orders.ordered
// );
db.orders.aggregate( [
{
$lookup:
{
from: "warehouses",
let: { order_item: "$item", order_qty: "$ordered" },
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$stock_item", "$$order_item" ] },
{ $gte: [ "$instock", "$$order_qty" ] }
]
}
}
},
{ $project: { stock_item: 0, _id: 0 } }
],
as: "stockdata"
}
}
] )
复制代码
该操作返回以下结果
{
_id: 1,
item: 'almonds',
price: 12,
ordered: 2,
stockdata: [
{ warehouse: 'A', instock: 120 },
{ warehouse: 'B', instock: 60 }
]
},
{
_id: 2,
item: 'pecans',
price: 20,
ordered: 1,
stockdata: [ { warehouse: 'A', instock: 80 } ]
},
{
_id: 3,
item: 'cookies',
price: 10,
ordered: 60,
stockdata: [ { warehouse: 'A', instock: 80 } ]
}
复制代码
使用简洁语法的相关子查询
该特性为 MongoDB 5.0 的新功能。从 MongoDB 5.0 开始,可以使用简洁的语法进行相关子查询,相关子查询的子查询文档字段来自于连接的 foreign 和 local 集合。
下面是新的简洁的语法,它移除了 $expr
表达式中 foreign 和 local 字段的等值匹配:
{
$lookup:
{
from: <foreign collection>,
localField: <field from local collection's documents>,
foreignField: <field from foreign collection's documents>,
let: { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline: [ <pipeline to run> ],
as: <output array field>
}
}
复制代码
该操作的伪 SQL 如下
SELECT *, <output array field>
FROM localCollection
WHERE <output array field> IN (
SELECT <documents as determined from the pipeline>
FROM <foreignCollection>
WHERE <foreignCollection.foreignField> = <localCollection.localField>
AND <pipeline match condition>
);
复制代码
分组 $group
$group
对输入的文档按照指定的 _id
表达式进行分组,语法如下
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
复制代码
选项 _id
指定了用于分组的 key 表达式,类似于 SQL 中的 group by
,如果设置为 null
或者任何常数值,则对所有的文档作为一个整体进行计算。
accumulator
支持以下操作
默认情况下, $group
步骤有 100M 的内存限制,如果超过这个限制将会报错。可以使用 allowDiskUse 选项来启用磁盘临时文件来解决这个问题。
统计不同大小的披萨订单销售总量
db.getCollection("orders").aggregate(
[
{
$group : {
_id : "$size",
count : { $sum : 1 }
}
}
],
{
"allowDiskUse" : true
}
);
复制代码
输出如下
{ "_id" : "medium", "count" : 3.0 }
{ "_id" : "small", "count" : 3.0 }
{ "_id" : "large", "count" : 2.0 }
复制代码
查询订单中有几种尺寸的披萨
db.getCollection("orders").aggregate([
{
$group: {_id: "$size"}
}
]);
复制代码
输出如下
{ "_id" : "medium" }
{ "_id" : "large" }
{ "_id" : "small" }
复制代码
查询销量大于等于 3 个的披萨尺寸
类似于 SQL 中的 GROUP BY ... HAVING COUNT(*) >= 3
// SQL: SELECT size as _id, count(*) as count FROM orders GROUP BY size HAVING COUNT(*) >= 3
db.getCollection("orders").aggregate(
[
{
$group : {
_id : "$size",
count : { $sum : 1 }
}
},
{
$match: { count: { $gte: 3} }
}
]
);
复制代码
输出如下
{ "_id" : "medium", "count" : 3.0 }
{ "_id" : "small", "count" : 3.0 }
复制代码
对披萨订单按照尺寸分组,返回每个组中披萨的名称集合
db.getCollection("orders").aggregate([
{
$group: {
_id: "$size",
names: { $push: "$name" }
}
}
])
复制代码
输出如下
{ "_id" : "large", "names" : [ "Pepperoni", "Cheese" ] }
{ "_id" : "small", "names" : [ "Pepperoni", "Cheese", "Vegan" ] }
{ "_id" : "medium", "names" : [ "Pepperoni", "Cheese", "Vegan" ] }
复制代码
按照披萨订单尺寸分组,返回包含的订单以及披萨数量
db.getCollection("orders").aggregate([
{ $group: { _id: "$size", orders: { $push: "$$ROOT" } } },
{
$addFields: {
totalQuantity: { $sum: "$orders.quantity" }
}
}
])
复制代码
输出如下
这里的 $$ROOT
是 MongoDB 中内置的系统变量,引用了根文档(顶级文档),这里通过该变量和 $push
操作,将文档放到了分组后新文档的 orders
字段,更多系统变量见下一章节。
多切面文档聚合 $facet
$facet
用于在一个 stage 中对同一批文档执行多个聚合管道处理。每一个聚合管道的输出文档都有自己的字段,最终输出是这些管道的结果数组。
输入文档只传递给 $facet
阶段一次,它可以在同一批输入文档集合上执行不同的聚合操作。
语法如下
{ $facet:
{
<outputField1>: [ <stage1>, <stage2>, ... ],
<outputField2>: [ <stage1>, <stage2>, ... ],
...
}
}
复制代码
创建一个名为 artwork
的集合
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99"),
"tags" : [ "painting", "satire", "Expressionism", "caricature" ] }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00"),
"tags" : [ "woodcut", "Expressionism" ] }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04"),
"tags" : [ "oil", "Surrealism", "painting" ] }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30"),
"tags" : [ "woodblock", "ukiyo-e" ] }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00"),
"tags" : [ "Surrealism", "painting", "oil" ] }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00"),
"tags" : [ "oil", "painting", "abstract" ] }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893,
"tags" : [ "Expressionism", "painting", "oil" ] }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42"),
"tags" : [ "abstract", "painting" ] }
复制代码
使用 $facet
对数据按照三个维度进行统计
db.artwork.aggregate( [
{
$facet: {
"categorizedByTags": [
{ $unwind: "$tags" },
{ $sortByCount: "$tags" }
],
"categorizedByPrice": [
// Filter out documents without a price e.g., _id: 7
{ $match: { price: { $exists: 1 } } },
{
$bucket: {
groupBy: "$price",
boundaries: [ 0, 150, 200, 300, 400 ],
default: "Other",
output: {
"count": { $sum: 1 },
"titles": { $push: "$title" }
}
}
}
],
"categorizedByYears(Auto)": [
{
$bucketAuto: {
groupBy: "$year",
buckets: 4
}
}
]
}
}
])
复制代码
输出文档
{
"categorizedByYears(Auto)" : [
// First bucket includes the document without a year, e.g., _id: 4
{ "_id" : { "min" : null, "max" : 1902 }, "count" : 2 },
{ "_id" : { "min" : 1902, "max" : 1918 }, "count" : 2 },
{ "_id" : { "min" : 1918, "max" : 1926 }, "count" : 2 },
{ "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2 }
],
"categorizedByPrice" : [
{
"_id" : 0,
"count" : 2,
"titles" : [
"Dancer",
"Blue Flower"
]
},
{
"_id" : 150,
"count" : 2,
"titles" : [
"The Pillars of Society",
"The Great Wave off Kanagawa"
]
},
{
"_id" : 200,
"count" : 1,
"titles" : [
"Melancholy III"
]
},
{
"_id" : 300,
"count" : 1,
"titles" : [
"Composition VII"
]
},
{
// Includes document price outside of bucket boundaries, e.g., _id: 5
"_id" : "Other",
"count" : 1,
"titles" : [
"The Persistence of Memory"
]
}
],
"categorizedByTags" : [
{ "_id" : "painting", "count" : 6 },
{ "_id" : "oil", "count" : 4 },
{ "_id" : "Expressionism", "count" : 3 },
{ "_id" : "Surrealism", "count" : 2 },
{ "_id" : "abstract", "count" : 2 },
{ "_id" : "woodblock", "count" : 1 },
{ "_id" : "woodcut", "count" : 1 },
{ "_id" : "ukiyo-e", "count" : 1 },
{ "_id" : "satire", "count" : 1 },
{ "_id" : "caricature", "count" : 1 }
]
}
复制代码
数组元素拆分为文档 $unwind
$unwind
用于将输入文档中的数组字段解构,为数组中的每一个元素生成一个独立的文档,简单说就是将一条数据拆分为多条。
语法如下
{
$unwind:
{
path: <field path>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
}
}
复制代码
参数说明
path
:数组字段的路径,字段路径需要使用前缀 $
includeArrayIndex
:可选,数组元素的作为新的字段,这里指定了字段名
preserveNullAndEmptyArrays
:可选,如果设置为 true
,则如果 path
参数为 null,没有该字段或者是一个空数组时,$unwind
会输出文档,否则不输出,默认值为 false
创建集合 inventory
,插入一条数据
db.inventory.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
复制代码
执行以下命令
db.inventory.aggregate([ { $unwind: "$sizes" } ])
复制代码
该命令会将一条数据拆分为 3 条
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
复制代码
文档分桶 $bucket
按照指定的表达式和边界对输入的文档进行分组,这里的分组称之为 存储桶,每个桶作为一个文档输出。每一个输出的文档都包含了一个 _id
字段,该字段表示了桶的下边界。
语法如下
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
复制代码
参数说明
groupBy
文档分组表达式
boundaries
:基于 groupBy
表达式分组的值数组,数组中的值指定了每一个桶的边界。相邻的两个值分别为桶的上边界和下边界,指定的值类型必须相同并且正序排列。例如 [0, 5, 10]
创建了两个桶,[0, 5)
和 [5, 10)
default
:可选,当 groupBy
结果不在 boundaries
的范围内时,将结果放在 default
指定的桶中(该参数指定了桶的 _id
)
output
:可选,指定文档中包含到输出文档中的字段,默认只有 _id
字段
$bucket
的使用必须满足以下条件之一
在 MongoDB 中插入以下文档
db.artists.insertMany([
{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])
复制代码
下面的操作将会把文档基于 year_born
字段分组,然后基于桶中的文档数量进行过滤
db.artists.aggregate( [
// First Stage
{
$bucket: {
groupBy: "$year_born", // Field to group by
boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
default: "Other", // Bucket id for documents which do not fall into a bucket
output: { // Output for each bucket
"count": { $sum: 1 },
"artists" :
{
$push: {
"name": { $concat: [ "$first_name", " ", "$last_name"] },
"year_born": "$year_born"
}
}
}
}
},
// Second Stage
{
$match: { count: {$gt: 3} }
}
] )
复制代码
输出如下
{ "_id" : 1860, "count" : 4, "artists" :
[
{ "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 }
]
}
复制代码
文档自动分桶 $bucketAuto
与 $bucket
功能一样,不过 $bucketAuto
会自动的确定桶的边界,并将文档均匀的分布到桶中。
每一个桶中包含以下内容
语法如下
{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}
复制代码
参数说明
查询不同年份范围死亡人口统计
db.artists.aggregate([
{ $bucketAuto: { groupBy: "$year_died", buckets: 4} }
])
复制代码
输出如下
{ "_id" : { "min" : 1890.0, "max" : 1927.0 }, "count" : 2 }
{ "_id" : { "min" : 1927.0, "max" : 1932.0 }, "count" : 2 }
{ "_id" : { "min" : 1932.0, "max" : 1944.0 }, "count" : 2 }
{ "_id" : { "min" : 1944.0, "max" : 1955.0 }, "count" : 2 }
复制代码
参考文档
评论