SlideShare a Scribd company logo
Aggregation Framework
Senior Solutions Architect, MongoDB
Norberto Leite
#mongodbdays @nleite #aggfwk
Agenda
• What is theAggregation Framework?
• The Aggregation Pipeline
• Usage and Limitations
• Aggregation and Sharding
• Summary
What is the Aggregation
Framework?
Aggregation Framework
Aggregation in Nutshell
• We're storing our data in MongoDB
• Our applications need to run ad-hoc queries for
grouping, summarizations, reporting, etc.
• We must have a way to reshape data easily to
support these access patterns
• You can useAggregation Framework for this!
• Extremely versatile, powerful
• Overkill for simple aggregation
tasks
• Averages
• Summation
• Grouping
• Reshaping
MapReduce is great, but…
• High level of complexity
• Difficult to program and debug
Aggregation Framework
• Plays nice with sharding
• Executes in native code
– Written in C++
– JSON parameters
• Flexible, functional, and simple
– Operation pipeline
– Computational expressions
Aggregation Pipeline
Pipeline
What is an Aggregation Pipeline?
• ASeries of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a cursor or a collection
• Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
$match $project $group $sort
Pipeline Operators
• $sort
• Order documents
• $limit / $skip
• Paginate documents
• $redact
• Restrict documents
• $geoNear
• Proximity sort
documents
• $let, $map
• Define variables
• $match
• Filter documents
• $project
• Reshape documents
• $group
• Summarize documents
• $unwind
• Expand documents
{
"_id" : ObjectId("54523d2d25784427c6fabce1"),
"From" : "norberto@mongodb.com",
"To" : "mongodb-user@googlegroups.com",
"Date" : ISODate("2012-08-15T22:32:34Z"),
"body" : {
"text/plain" : ”Hello Munich, nice to see yalll!"
},
"Subject" : ”Live From MongoDB World"
}
Our Example Data
$match
• Filter documents
– Uses existing query syntax
– No $where (server side Javascript)
Matching Field Values
{
subject: "Hello There",
words: 218,
from: "norberto@mongodb.com"
}
{ $match: {
from: "hipster@somemail.com"
}}
{
subject: "I love Hofbrauhaus",
words: 90,
from: "norberto@mongodb.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
Matching with Query Operators
{
subject: "Hello There",
words: 218,
from: "norberto@mongodb.com"
}
{ $match: {
words: {$gt: 100}
}}
{
subject: "I love Hofbrauhaus",
words: 90,
from: "norberto@mongodb.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
{
subject: "Hello There",
words: 218,
from: "norberto@mongodb.com"
}
$project
• Reshape Documents
– Include, exclude or rename
fields
– Inject computed fields
– Create sub-document fields
Including and Excluding Fields
{
_id: 12345,
subject: "Hello There",
words: 218,
from:"norberto@mongodb.com"
to: [ "marc@mongodb.com",
"sam@mongodb.com" ],
account: "mongodb mail",
date: ISODate("2012-08-05"),
replies: 3,
folder: "Inbox",
...
}
{ $project: {
_id: 0,
subject: 1,
from: 1
}}
{
subject: "Hello There",
from:"norberto@mongodb.com"
}
Including and Excluding Fields
{
_id: 12345,
subject: "Hello There",
words: 218,
from:"norberto@mongodb.com"
to: [ "marc@mongodb.com",
"sam@mongodb.com" ],
account: "mongodb mail",
date: ISODate("2012-08-05"),
replies: 3,
folder: "Inbox",
...
}
{ $project: {
_id: 0,
subject: 1,
from: 1
}}
{
subject: "Hello There",
from:"norberto@mongodb.com"
}
Renaming and Computing Fields
{ $project: {
spamIndex: {
$mul: ["$words",
"$replies"]
},
user: "$from"
}}
{
_id: 12345,
spamIndex: 72.6666 ,
user: "norberto@mongodb.com"
}
{
_id: 12345,
subject: "Hello There",
words: 218,
from:"norberto@mongodb.com"
to: [ "marc@mongodb.com",
"sam@mongodb.com" ],
account: "mongodb mail",
date: ISODate("2012-08-05"),
replies: 3,
folder: "Inbox",
...
}
Creating Sub-Document Fields
{ $project: {
subject: 1,
stats: {
replies: "$replies",
from: "$from",
date: "$date"
}}}
{
_id: 375,
subject: "Hello There",
stats: {
replies: 3,
from: "norberto@mongodb.com",
date: ISODate("2012-08-05")
}}
{
_id: 12345,
subject: "Hello There",
words: 218,
from:"norberto@mongodb.com"
to: [ "marc@mongodb.com",
"sam@mongodb.com" ],
account: "mongodb mail",
date: ISODate("2012-08-05"),
replies: 3,
folder: "Inbox",
...
}
$group
• Group documents by value
– Field reference, object, constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
Calculating An Average
{ $group: {
_id: "$from",
avgWords: { $avg:
"$words" }
}}
{
_id: "norberto@mongodb.com",
avgPages: 154
}
{
_id: "hipster@somemail.com",
avgPages: 100
}
{
subject: "Hello There",
words: 218,
from: "norberto@mongodb.com"
}
{
subject: "I love Hofbrauhaus",
words: 90,
from: "norberto@mongodb.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
Summing Fields and Counting
{ $group: {
_id: "$from",
words: { $sum: "$words" },
mails: { $sum: 1 }
}}
{
_id: "norberto@mongodb.com",
words: 308,
mails: 2
}
{
_id: "hipster@somemail.com",
words: 100,
mails: 1
}
{
subject: "Hello There",
words: 218,
from: "norberto@mongodb.com"
}
{
subject: "I love Hofbrauhaus",
words: 90,
from: "norberto@mongodb.com"
}
{
subject: "MongoDB Rules!",
words: 100,
from: "hipster@somemail.com"
}
$unwind
• Operate on an array field
– Create documents from array elements
• Array replaced by element value
• Missing/empty fields → no output
• Non-array fields → error
– Pipe to $group to aggregate
Collecting Distinct Values
{ subject: "2.8 will be great!",
to: "marc@mongodb.com",
account : "mongodb mail” }
{ $unwind: "$to" }
{
_id: 2222,
subject: "2.8 will be great!",
to: [ "marc@mongodb.com",
"eliot@mongodb.com",
"asya@mongodb.com",
],
account: "mongodb mail"
}
{ subject: "2.8 will be great!",
to: "eliot@mongodb.com",
account : "mongodb mail” }
{ subject: "2.8 will be great!",
to: "asya@mongodb.com",
account : "mongodb mail” }
$sort, $limit, $skip
• Sort documents by one or more fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to return
– In-memory unless early and indexed
• Limit and skip follow cursor behavior
$redact
• Restrict access to Documents
– Use document fields to define privileges
– Apply conditional queries to validate users
• Field LevelAccess Control
– $$DESCEND, $$PRUNE, $$KEEP
– Applies to root and subdocument fields
{
_id: 375,
item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
Manufacturer: "Sony",
security: 0,
quantity: 12,
list: 4999,
pricing: {
security: 1,
sale: 2698,
wholesale: {
security: 2,
amount: 2300 }
}
}
$redact Example Data
Query by Security Level
security =
0
db.catalog.aggregate([
{
$match: {item: /^.*XBR55X900A*/}
},
{
$redact: {
$cond: {
if: { $lte: [ "$security", ?? ] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}])
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
"Manufacturer" : "Sony”,
"security" : 0,
"quantity" : 12,
"list" : 4999
}
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition
TV",
"Manufacturer" : "Sony",
"security" : 0,
"quantity" : 12,
"list" : 4999,
"pricing" : {
"security" : 1,
"sale" : 2698,
"wholesale" : {
"security" : 2,
"amount" : 2300
}
}
}
security =
2
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
{
"_id" : 35089,
"city" : “Sony”,
"loc" : [
-86.048397,
32.979068
],
"pop" : 1584,
"state" : "AL”
}
$geonear Example Data
Query by Proximity
db.catalog.aggregate([
{
$geoNear : {
near: [ -86.000, 33.000 ],
distanceField: "dist",
maxDistance: .050,
spherical: true,
num: 3
}
}])
{
"_id" : "35089",
"city" : "KELLYTON",
"loc" : [ -86.048397, 32.979068 ],
"pop" : 1584,
"state" : "AL",
"dist" : 0.0007971432165364155
},
{
"_id" : "35010",
"city" : "NEW SITE",
"loc" : [ -85.951086, 32.941445 ],
"pop" : 19942,
"state" : "AL",
"dist" : 0.0012479615347306806
},
{
"_id" : "35072",
"city" : "GOODWATER",
"loc" : [ -86.078149, 33.074642 ],
"pop" : 3813,
"state" : "AL",
"dist" : 0.0017333719627032555
}
Usage and Limitations
Usage
• collection.aggregate([…], {<options>})
– Returns a cursor
– Takes an optional document to specify aggregation options
• allowDiskUse, explain
– Use $out to send results to a Collection
• db.runCommand({aggregate:<collection>, pipeline:[…]})
– Returns a document, limited to 16 MB
Collection
db.books.aggregate([
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1 }}}
])
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
Database Command
db.runCommand({
aggregate: "books",
pipeline: [
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1
}}}
]
})
{
result : [
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
],
“ok” : 1
}
Limitations
• Pipeline operator memory limits
– Stages limited to 100 MB
– Use “allowDiskUse” option to use disk for larger data sets
• Some BSON types unsupported
– Symbol, MinKey, MaxKey, DBRef, Code, and
CodeWScope
Aggregation and Sharding
Sharding
Result
mongos
Shard 1
(Primary)
$match,
$project, $group
Shard 2
$match,
$project, $group
Shard 3
excluded
Shard 4
$match,
$project, $group
• Workload split between shards
– Shards execute pipeline up to a point
– Primary shard merges cursors and
continues processing*
– Use explain to analyze pipeline split
– Early $match may excuse shards
– Potential CPU and memory implications
for primary shard host
* Priortov2.6secondstagepipelineprocessingwasdonebymongos
Summary
Framework Use Cases
• Basic aggregation queries
• Ad-hoc reporting
• Real-time analytics
• Visualizing and reshaping data
Extending the Framework
• Adding new pipeline operators, expressions
• $out and $tee for output control
– https://jira.mongodb.org/browse/SERVER-3253
Future Enhancements
• Automatically move $match earlier if possible
• Pipeline explain facility
• Memory usage improvements
– Grouping input sorted by _id
– Sorting with limited output
Enabling Developers
• Doing more within MongoDB, faster
• Refactoring MapReduce and groupings
– Replace pages of JavaScript
– Longer aggregation pipelines
• Quick aggregations from the shell
Obrigado!
SA | Eng – norberto@mongodb.com
Norberto Leite
#mongodbdays #aggfwk #devs @mongodb

More Related Content

What's hot (20)

MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Aggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
Mongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
zahid-mian
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Data Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
Introduction to MongoDB
Introduction to MongoDB
antoinegirbal
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Aggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
Mongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
zahid-mian
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Data Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
Introduction to MongoDB
Introduction to MongoDB
antoinegirbal
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 

Similar to The Aggregation Framework (20)

Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
MongoDB at FrozenRails
MongoDB at FrozenRails
Mike Dirolf
 
MongoDB at RuPy
MongoDB at RuPy
Mike Dirolf
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
Mongodb intro
Mongodb intro
christkv
 
MongoDB and Ruby on Rails
MongoDB and Ruby on Rails
rfischer20
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
MongoDB at ZPUGDC
MongoDB at ZPUGDC
Mike Dirolf
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
antoinegirbal
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
MongoDB 3.2 - Analytics
MongoDB 3.2 - Analytics
Massimo Brignoli
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
MongoDB Hadoop DC
MongoDB Hadoop DC
Mike Dirolf
 
Building your first app with MongoDB
Building your first app with MongoDB
Norberto Leite
 
Introduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
MongoDB
 
MongoDB at Scale
MongoDB at Scale
MongoDB
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
MongoDB at FrozenRails
MongoDB at FrozenRails
Mike Dirolf
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
Mongodb intro
Mongodb intro
christkv
 
MongoDB and Ruby on Rails
MongoDB and Ruby on Rails
rfischer20
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
MongoDB at ZPUGDC
MongoDB at ZPUGDC
Mike Dirolf
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
antoinegirbal
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
MongoDB Hadoop DC
MongoDB Hadoop DC
Mike Dirolf
 
Building your first app with MongoDB
Building your first app with MongoDB
Norberto Leite
 
Introduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
MongoDB
 
MongoDB at Scale
MongoDB at Scale
MongoDB
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 

The Aggregation Framework

  • 1. Aggregation Framework Senior Solutions Architect, MongoDB Norberto Leite #mongodbdays @nleite #aggfwk
  • 2. Agenda • What is theAggregation Framework? • The Aggregation Pipeline • Usage and Limitations • Aggregation and Sharding • Summary
  • 3. What is the Aggregation Framework?
  • 5. Aggregation in Nutshell • We're storing our data in MongoDB • Our applications need to run ad-hoc queries for grouping, summarizations, reporting, etc. • We must have a way to reshape data easily to support these access patterns • You can useAggregation Framework for this!
  • 6. • Extremely versatile, powerful • Overkill for simple aggregation tasks • Averages • Summation • Grouping • Reshaping MapReduce is great, but… • High level of complexity • Difficult to program and debug
  • 7. Aggregation Framework • Plays nice with sharding • Executes in native code – Written in C++ – JSON parameters • Flexible, functional, and simple – Operation pipeline – Computational expressions
  • 10. What is an Aggregation Pipeline? • ASeries of Document Transformations – Executed in stages – Original input is a collection – Output as a cursor or a collection • Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order $match $project $group $sort
  • 11. Pipeline Operators • $sort • Order documents • $limit / $skip • Paginate documents • $redact • Restrict documents • $geoNear • Proximity sort documents • $let, $map • Define variables • $match • Filter documents • $project • Reshape documents • $group • Summarize documents • $unwind • Expand documents
  • 12. { "_id" : ObjectId("54523d2d25784427c6fabce1"), "From" : "[email protected]", "To" : "[email protected]", "Date" : ISODate("2012-08-15T22:32:34Z"), "body" : { "text/plain" : ”Hello Munich, nice to see yalll!" }, "Subject" : ”Live From MongoDB World" } Our Example Data
  • 13. $match • Filter documents – Uses existing query syntax – No $where (server side Javascript)
  • 14. Matching Field Values { subject: "Hello There", words: 218, from: "[email protected]" } { $match: { from: "[email protected]" }} { subject: "I love Hofbrauhaus", words: 90, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" }
  • 15. Matching with Query Operators { subject: "Hello There", words: 218, from: "[email protected]" } { $match: { words: {$gt: 100} }} { subject: "I love Hofbrauhaus", words: 90, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" } { subject: "Hello There", words: 218, from: "[email protected]" }
  • 16. $project • Reshape Documents – Include, exclude or rename fields – Inject computed fields – Create sub-document fields
  • 17. Including and Excluding Fields { _id: 12345, subject: "Hello There", words: 218, from:"[email protected]" to: [ "[email protected]", "[email protected]" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... } { $project: { _id: 0, subject: 1, from: 1 }} { subject: "Hello There", from:"[email protected]" }
  • 18. Including and Excluding Fields { _id: 12345, subject: "Hello There", words: 218, from:"[email protected]" to: [ "[email protected]", "[email protected]" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... } { $project: { _id: 0, subject: 1, from: 1 }} { subject: "Hello There", from:"[email protected]" }
  • 19. Renaming and Computing Fields { $project: { spamIndex: { $mul: ["$words", "$replies"] }, user: "$from" }} { _id: 12345, spamIndex: 72.6666 , user: "[email protected]" } { _id: 12345, subject: "Hello There", words: 218, from:"[email protected]" to: [ "[email protected]", "[email protected]" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... }
  • 20. Creating Sub-Document Fields { $project: { subject: 1, stats: { replies: "$replies", from: "$from", date: "$date" }}} { _id: 375, subject: "Hello There", stats: { replies: 3, from: "[email protected]", date: ISODate("2012-08-05") }} { _id: 12345, subject: "Hello There", words: 218, from:"[email protected]" to: [ "[email protected]", "[email protected]" ], account: "mongodb mail", date: ISODate("2012-08-05"), replies: 3, folder: "Inbox", ... }
  • 21. $group • Group documents by value – Field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default
  • 22. Calculating An Average { $group: { _id: "$from", avgWords: { $avg: "$words" } }} { _id: "[email protected]", avgPages: 154 } { _id: "[email protected]", avgPages: 100 } { subject: "Hello There", words: 218, from: "[email protected]" } { subject: "I love Hofbrauhaus", words: 90, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" }
  • 23. Summing Fields and Counting { $group: { _id: "$from", words: { $sum: "$words" }, mails: { $sum: 1 } }} { _id: "[email protected]", words: 308, mails: 2 } { _id: "[email protected]", words: 100, mails: 1 } { subject: "Hello There", words: 218, from: "[email protected]" } { subject: "I love Hofbrauhaus", words: 90, from: "[email protected]" } { subject: "MongoDB Rules!", words: 100, from: "[email protected]" }
  • 24. $unwind • Operate on an array field – Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error – Pipe to $group to aggregate
  • 25. Collecting Distinct Values { subject: "2.8 will be great!", to: "[email protected]", account : "mongodb mail” } { $unwind: "$to" } { _id: 2222, subject: "2.8 will be great!", to: [ "[email protected]", "[email protected]", "[email protected]", ], account: "mongodb mail" } { subject: "2.8 will be great!", to: "[email protected]", account : "mongodb mail” } { subject: "2.8 will be great!", to: "[email protected]", account : "mongodb mail” }
  • 26. $sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior
  • 27. $redact • Restrict access to Documents – Use document fields to define privileges – Apply conditional queries to validate users • Field LevelAccess Control – $$DESCEND, $$PRUNE, $$KEEP – Applies to root and subdocument fields
  • 28. { _id: 375, item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", Manufacturer: "Sony", security: 0, quantity: 12, list: 4999, pricing: { security: 1, sale: 2698, wholesale: { security: 2, amount: 2300 } } } $redact Example Data
  • 29. Query by Security Level security = 0 db.catalog.aggregate([ { $match: {item: /^.*XBR55X900A*/} }, { $redact: { $cond: { if: { $lte: [ "$security", ?? ] }, then: "$$DESCEND", else: "$$PRUNE" } } }]) { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony”, "security" : 0, "quantity" : 12, "list" : 4999 } { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony", "security" : 0, "quantity" : 12, "list" : 4999, "pricing" : { "security" : 1, "sale" : 2698, "wholesale" : { "security" : 2, "amount" : 2300 } } } security = 2
  • 30. $geoNear • Order/Filter Documents by Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage
  • 31. { "_id" : 35089, "city" : “Sony”, "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL” } $geonear Example Data
  • 32. Query by Proximity db.catalog.aggregate([ { $geoNear : { near: [ -86.000, 33.000 ], distanceField: "dist", maxDistance: .050, spherical: true, num: 3 } }]) { "_id" : "35089", "city" : "KELLYTON", "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL", "dist" : 0.0007971432165364155 }, { "_id" : "35010", "city" : "NEW SITE", "loc" : [ -85.951086, 32.941445 ], "pop" : 19942, "state" : "AL", "dist" : 0.0012479615347306806 }, { "_id" : "35072", "city" : "GOODWATER", "loc" : [ -86.078149, 33.074642 ], "pop" : 3813, "state" : "AL", "dist" : 0.0017333719627032555 }
  • 34. Usage • collection.aggregate([…], {<options>}) – Returns a cursor – Takes an optional document to specify aggregation options • allowDiskUse, explain – Use $out to send results to a Collection • db.runCommand({aggregate:<collection>, pipeline:[…]}) – Returns a document, limited to 16 MB
  • 35. Collection db.books.aggregate([ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ]) { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 }
  • 36. Database Command db.runCommand({ aggregate: "books", pipeline: [ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ] }) { result : [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], “ok” : 1 }
  • 37. Limitations • Pipeline operator memory limits – Stages limited to 100 MB – Use “allowDiskUse” option to use disk for larger data sets • Some BSON types unsupported – Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
  • 39. Sharding Result mongos Shard 1 (Primary) $match, $project, $group Shard 2 $match, $project, $group Shard 3 excluded Shard 4 $match, $project, $group • Workload split between shards – Shards execute pipeline up to a point – Primary shard merges cursors and continues processing* – Use explain to analyze pipeline split – Early $match may excuse shards – Potential CPU and memory implications for primary shard host * Priortov2.6secondstagepipelineprocessingwasdonebymongos
  • 41. Framework Use Cases • Basic aggregation queries • Ad-hoc reporting • Real-time analytics • Visualizing and reshaping data
  • 42. Extending the Framework • Adding new pipeline operators, expressions • $out and $tee for output control – https://jira.mongodb.org/browse/SERVER-3253
  • 43. Future Enhancements • Automatically move $match earlier if possible • Pipeline explain facility • Memory usage improvements – Grouping input sorted by _id – Sorting with limited output
  • 44. Enabling Developers • Doing more within MongoDB, faster • Refactoring MapReduce and groupings – Replace pages of JavaScript – Longer aggregation pipelines • Quick aggregations from the shell
  • 45. Obrigado! SA | Eng [email protected] Norberto Leite #mongodbdays #aggfwk #devs @mongodb