MongoDB 用实例学习聚合操作

时间：2016-01-16 23:57:55 阅读：487 评论：0 收藏：0 [点我收藏+]

Mongodb官方网站提供了一个美国人口统计数据，下载地址如下

http://media.mongodb.org/zips.json

数据示例：

[root@localhost cluster]# head zips.json 
{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
{ "_id" : "01005", "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA" }
{ "_id" : "01007", "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA" }
{ "_id" : "01008", "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA" }
{ "_id" : "01010", "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA" }
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }

使用mongoimport将数据导入mongodb数据库

[root@localhost cluster]# mongoimport -d test -c "zipcodes" --file zips.json -h 192.168.199.219:27020
2016-01-16T18:31:29.424+0800	connected to: 192.168.199.219:27020
2016-01-16T18:31:32.420+0800	[################........] test.zipcodes	2.1 MB/3.0 MB (68.5%)
2016-01-16T18:31:34.471+0800	[########################] test.zipcodes	3.0 MB/3.0 MB (100.0%)
2016-01-16T18:31:34.471+0800	imported 29353 documents

一、单一目的的聚合操作

求count，distinct等简单操作

实例1.1：求zipcodes集合的文档数

db.zipcodes.count()

实例1.2 求MA州的文档总数

db.zipcodes.count({state:"MA"})

实例1.3 求zipcodes中有哪些州

db.zipcodes.distinct("state")

二、使用aggregate函数，进行更复杂的聚合操作

实例2.1：统计每个州的人口总数

db.zipcodes.aggregate(
   [
     { $group: { _id: "$state", total: { $sum: "$pop" } } }
   ]
)

使用集合的aggregate方法，进行聚合查询。

$group关键字后面指定分组的字段(引用字段时，一定要用$前缀)，以及聚合函数。

_id:是关键字，代表返回结果集的主键。

该查询等价的SQL为

select state as _id,sum(pop) as total
  from zipcodes
 group by state

实例2.2：统计每个州每个城市的人口总数

db.zipcodes.aggregate(
   [
     { $group: { _id: {state:"$state",city:"$city"}, pop: { $sum: "$pop" } } },
   ]
)

分组的字段如果多于一个，那么每个字段都要给定一个别名,如 state:"$state"

实例2.3：统计每个州人口多于10000的城市的人口总和

db.zipcodes.aggregate(
   [
     { $match: {"pop":{$gt: 10000} }},
     { $group: { _id: {state:"$state"}, pop: { $sum: "$pop" } } },
   ]
)

$match 关键字后面跟上集合的过滤条件。该语句等价于如下SQL

select state,sum(pop) as pop
  from zipcodes
 where pop>10000
 group by state

实例2.4：查询人口总数超过1千万的州

db.zipcodes.aggregate(
   [
     { $group: { _id: {state:"$state"}, pop: { $sum: "$pop" } } },
     { $match: {"pop":{$gt: 1000*10000} }}
   ]
)

将$match放在$group后面，相当于是先执行group操作，再对结果集进行过滤。等价的sql如下

select state,sum(pop) as pop
  from zipcodes
 group by state
 having sum(pop)>1000*10000

实例5：求每个州城市的平均人口

db.zipcodes.aggregate(
   [
     { $group: { _id: {state:"$state",city:"$city"}, pop: { $sum: "$pop" } } },
     { $group: {_id:"$_id.state",avgPop:{$avg: "$pop"}}}
   ]
)

我们的aggregate函数支持多次迭代，该语句的等价sql为

select state,avg(pop) as avgPop
  from
  (select state,city,sum(pop) pop
     from zipcodes
 group by state,city)
 group by state

实例2.5 ：求每个州人口最多及最少的城市名及对应的人口数量

db.zipcodes.aggregate(
   [
     { $group: { _id: {state:"$state",city:"$city"}, cityPop: { $sum: "$pop" } } },
     { $sort: { cityPop: 1 } },
     { $group: {
         _id:"$_id.state",
         biggestCity:{$last:"$_id.city"},
         biggestPop:{$last:"$cityPop"},
         smallestCity:{$first:"$_id.city"},
         smallestPop:{$first:"$cityPop"}    
     }}
   ]
)

第一个$group求出按state，city分组的人口数。

$sort操作按照人口数排序

第二个$group 按照state分组，此时每个state分组的数据已经安装cityPop排序。每个组的第一行数据（$first 取得）是人口最少的city，最后一行（$last 取得）是人口最多的city。

实例2.6 利用$project重新格式化结果

db.zipcodes.aggregate(
   [
     { $group: { _id: {state:"$state",city:"$city"}, cityPop: { $sum: "$pop" } } },
     { $sort: { cityPop: 1 } },
     {
         $group: {
         _id:"$_id.state",
         biggestCity:{$last:"$_id.city"},
         biggestPop:{$last:"$cityPop"},
         smallestCity:{$first:"$_id.city"},
         smallestPop:{$first:"$cityPop"}    
        }
     }, 
     {
         $project: {
             _id:0,
             state: "$_id", 
             biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
             smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
         }
     }
   ]
)

实例2.7 对数组中的内容做聚合统计

我们假设有一个学生选课的集合，数据示例如下

db.course.insert({name:"张三",age:10,grade:"四年级",course:["数学","英语","政治"]})

db.course.insert({name:"李四",age:9,grade:"三年级",course:["数学","语文","自然"]})

db.course.insert({name:"王五",age:11,grade:"四年级",course:["数学","英语","语文"]})

db.course.insert({name:"赵六",age:9,grade:"四年级",course:["数学","历史","政治"]})

求每门课程有多少人选修

db.course.aggregate(
   [
     { $unwind: "$course" },
     { $group: { _id: "$course", sum: { $sum: 1 } } },
     { $sort: { sum: -1 } }
   ]
)

$unwind，用来将数组中的内容拆包，然后再按照拆包后的数据进行分组，另外aggregate中没有$count关键字，使用$sum:1 来计算count 。

本文出自 “叮咚” 博客，请务必保留此出处http://lqding.blog.51cto.com/9123978/1735674

MongoDB 用实例学习聚合操作

原文：http://lqding.blog.51cto.com/9123978/1735674

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)