SlideShare a Scribd company logo
Aggregation Framework
Senior Solutions Architect, MongoDB
Rick Houlihan
MongoDB World
Agenda
• What is theAggregation Framework?
• The Aggregation Pipeline
• Usage and Limitations
• Aggregation and Sharding
• Summary
What is the Aggregation
Framework?
Aggregation Framework
Aggregation in Nutshell
• We're storing our data in
MongoDB
• Our applications need ad-hoc
queries
• We must have a way to reshape
data easily
• You can use Aggregation Framework for
this!
• Extremely versatile, powerful
• Overkill for simple aggregation
tasks
• Averages
• Summation
• Grouping
• Reshaping
MapReduce is great, but…
• High level of complexity
• Difficult to program and debug
Aggregation Framework
• Plays nice with sharding
• Executes in native code
– Written in C++
– JSON parameters
• Flexible, functional, and simple
– Operation pipeline
– Computational expressions
Aggregation Pipeline
What is an Aggregation Pipeline?
• ASeries of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a document, cursor or a collection
• Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
$match $project $group $sort
Pipeline Operators
• $sort
• Order documents
• $limit / $skip
• Paginate documents
• $redact
• Restrict documents
• $geoNear
• Proximity sort
documents
• $let, $map
• Subexpression variables
• $match
• Filter documents
• $project
• Reshape documents
• $group
• Summarize documents
• $unwind
• Expand documents
{
_id: 375,
title: "The Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
Our Example Data
$match
• Filter documents
– Uses existing query syntax
– Can facilitate shard exclusion
– No $where (server side Javascript)
Matching Field Values
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{ $match: {
language: "Russian"
}}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
Matching with Query Operators
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{ $match: {
pages: {$gt:100}
}}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: ”Atlas Shrugged",
pages: 1088,
language: “English"
}
$project
• Reshape Documents
– Include, exclude or rename
fields
– Inject computed fields
– Create sub-document fields
Including and Excluding Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
_id: 0,
title: 1,
language: 1
}}
{
title: "Great Gatsby",
language: "English"
}
Renaming and Computing Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
avgChapterLength: {
$divide: ["$pages",
"$chapters"]
},
lang: "$language"
}}
{
_id: 375,
avgChapterLength: 24.2222,
lang: "English"
}
Creating Sub-Document Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
title: 1,
stats: {
pages: "$pages",
language: "$language",
}
}}
{
_id: 375,
title: "Great Gatsby",
stats: {
pages: 218,
language: "English"
}
}
$group
• Group documents by value
– Field reference, object, constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
Calculating An Average
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
avgPages: { $avg:
"$pages" }
}}
{
_id: "Russian",
avgPages: 1440
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
avgPages: 653
}
Summing Fields and Counting
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
pages: { $sum: "$pages" },
books: { $sum: 1 }
}}
{
_id: "Russian",
pages: 1440,
books: 1
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
pages: 1316,
books: 2
}
Collecting Distinct Values
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
titles: { $addToSet: "$title" }
}}
{
_id: "Russian",
titles: [“War and Peace”]
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
titles: [
"Atlas Shrugged",
"The Great Gatsby” ]
}
$unwind
• Operate on an array field
– Create documents from array elements
• Array replaced by element value
• Missing/empty fields → no output
• Non-array fields → error
– Pipe to $group to aggregate
Collecting Distinct Values
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: [
"Long Island",
"New York",
"1920s"
]
}
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "Long Island” }
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "New York” }
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "1920s” }
{ $unwind: "$subjects" }
$sort, $limit, $skip
• Sort documents by one or more fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to return
– In-memory unless early and indexed
• Limit and skip follow cursor behavior
Sort All the Documents in the
Pipeline
{ title: “Animal Farm” }
{ $sort: {title: 1} }
{ title: “Brave New World” }
{ title: “Great Gatsby” }
{ title: “Grapes of Wrath, The” }
{ title: “Lord of the Flies” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
Limit Documents Through the
Pipeline
{ title: “Great Gatsby, The” }
{ $limit: 5 }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
Skip Documents in the Pipeline
{ title: “Animal Farm” }
{ $skip: 3 }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
$redact
• Restrict access to Documents
– Use document fields to define privileges
– Apply conditional queries to validate users
• Field LevelAccess Control
– $$DESCEND, $$PRUNE, $$KEEP
– Applies to root and subdocument fields
{
_id: 375,
item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
Manufacturer: "Sony",
security: 0,
quantity: 12,
list: 4999,
pricing: {
security: 1,
sale: 2698,
wholesale: {
security: 2,
amount: 2300 }
}
}
$redact Example Data
Query by Security Level
security =
0
db.catalog.aggregate([
{
$match: {item: /^.*XBR55X900A*/}
},
{
$redact: {
$cond: {
if: { $lte: [ "$security", ?? ] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}])
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
"Manufacturer" : "Sony”,
"security" : 0,
"quantity" : 12,
"list" : 4999
}
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition
TV",
"Manufacturer" : "Sony",
"security" : 0,
"quantity" : 12,
"list" : 4999,
"pricing" : {
"security" : 1,
"sale" : 2698,
"wholesale" : {
"security" : 2,
"amount" : 2300
}
}
}
security =
2
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
{
"_id" : 10021,
"city" : “NEW YORK”,
"loc" : [
-73.958805,
40.768476
],
"pop" : 106564,
"state" : ”NY”
}
$geonear Example Data
Query by Proximity
db.catalog.aggregate([
{
$geoNear : {
near: [ -86.000, 33.000 ],
distanceField: "dist",
maxDistance: .050,
spherical: true,
num: 3
}
}])
{
"_id" : "35089",
"city" : "KELLYTON",
"loc" : [ -86.048397, 32.979068 ],
"pop" : 1584,
"state" : "AL",
"dist" : 0.0007971432165364155
},
{
"_id" : "35010",
"city" : "NEW SITE",
"loc" : [ -85.951086, 32.941445 ],
"pop" : 19942,
"state" : "AL",
"dist" : 0.0012479615347306806
},
{
"_id" : "35072",
"city" : "GOODWATER",
"loc" : [ -86.078149, 33.074642 ],
"pop" : 3813,
"state" : "AL",
"dist" : 0.0017333719627032555
}
$let / $map
• Bind variables to subexpressions
– Apply conditional logic
– Define complex calculations
– Operate on array field values
{
"_id" : 1,
”price" : 10,
”tax" : 0.50,
”discount" : true
}
$let Example Data
Subexpression Calculations
db.sales.aggregate( [
{
$project: {
finalPrice: {
$let: {
vars: {
total: { $cond: {
if: '$applyDiscount',
then: { $multiply: [0.9, '$price’] },
else: '$price'
}
}
},
in: { $add: [ "$$total", '$tax'] }
}}}}])
{ "_id" : 1, "finalPrice" : 9.5 }
{ "_id" : 2, "finalPrice" : 10.25 }
{
"_id" : 1,
”price" : 10,
”tax" : 0.50,
”discount" : true,
”units" : [ 1, 0, 3, 4, 0, 0, 10, 12, 6, 5 ]
}
$map Example Data
Subexpressions on Arrays
db.sales.aggregate( [ {
$project: {
finalPrice: {
$map: {
input: "$units",
as: "unit",
in: {
$multiply: [ “$$unit”, {
$cond: {
if: '$applyDiscount', then: {
$add : [
{ $multiply: [ 0.9, '$price'] }, '$tax’ ] },
else: { $add: [ '$price', '$tax’ ] }
} } ] } } } } } ] )
{
"_id" : 1,
"finalPrice" :
[ 9.5, 0, 28.5, 38, 0, 0, 95, 114, 57, 47.5 ]
}
{
"_id" : 2,
"finalPrice" :
[ 51.25, 30.75, 20.5, 51.25, 0, 0, 0, 30.75, 41, 71.75 ]
}
Aggregation and Sharding
Sharding
Result
mongos
Shard 1
(Primary)
$match,
$project, $group
Shard 2
$match,
$project, $group
Shard 3
excluded
Shard 4
$match,
$project, $group
• Workload split between shards
– Shards execute pipeline up to a point
– Primary shard merges cursorsand
continues processing*
– Use explain to analyze pipeline split
– Early $match may excuse shards
– Potential CPU and memory implications
for primary shard host
* Priortov2.6secondstagepipelineprocessingwasdonebymongos
Usage and Limitations
Usage
• collection.aggregate([…], {<options>})
– Returns a cursor
– Takes an optional document to specify aggregation options
• allowDiskUse, explain
– Use $out to send results to a Collection
• db.runCommand({aggregate:<collection>, pipeline:[…]})
– Returns a document, limited to 16 MB
Collection
db.books.aggregate([
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1 }}}
])
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
Database Command
db.runCommand({
aggregate: "books",
pipeline: [
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1
}}}
]
})
{
result : [
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
],
“ok” : 1
}
Limitations
• Pipeline operator memory limits
– Stages limited to 100 MB
– “allowDiskUse” for larger data sets
• Some BSON types unsupported
– Symbol, MinKey, MaxKey, DBRef, Code, and
CodeWScope
Summary
Aggregation Use Cases
Ad-hoc reporting
Real-timeAnalytics
Transforming Data
Enabling Developers and DBA’s
• Do more with MongoDB and do it
faster
• Eliminate MapReduce
– Replace pages of JavaScript
– More efficient data processing
• Not just a nice feature
– Enabler for real time big data analytics
Thank You

More Related Content

PDF
MongoDB Aggregation Framework
PPTX
MongoDB - Aggregation Pipeline
PDF
Webinar: Working with Graph Data in MongoDB
PPTX
Indexing with MongoDB
PDF
Understanding PostgreSQL LW Locks
PDF
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
PDF
MongodB Internals
PDF
MongoDB World 2019: MongoDB Atlas Security 101 for Developers
MongoDB Aggregation Framework
MongoDB - Aggregation Pipeline
Webinar: Working with Graph Data in MongoDB
Indexing with MongoDB
Understanding PostgreSQL LW Locks
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongodB Internals
MongoDB World 2019: MongoDB Atlas Security 101 for Developers

What's hot (20)

PDF
Workshop 4: NodeJS. Express Framework & MongoDB.
PPTX
MongoDB Aggregation
PDF
MongoDB WiredTiger Internals: Journey To Transactions
PDF
MongoDB and Node.js
PDF
PostgreSQL WAL for DBAs
PDF
Get to know PostgreSQL!
PPTX
Introduction to MongoDB
PPT
MongoDB Replica Sets
PDF
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
PPTX
ProxySQL for MySQL
PDF
MongoDB Performance Tuning
PPTX
Basics of MongoDB
PDF
Mongodb Aggregation Pipeline
PDF
EDB Postgres with Containers
 
PDF
PostgreSQL, performance for queries with grouping
PDF
ElasticSearch in action
PDF
PostgreSQL HA
PDF
PostgreSQL Replication Tutorial
ODP
OpenGurukul : Database : PostgreSQL
PPTX
MongoDB
Workshop 4: NodeJS. Express Framework & MongoDB.
MongoDB Aggregation
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB and Node.js
PostgreSQL WAL for DBAs
Get to know PostgreSQL!
Introduction to MongoDB
MongoDB Replica Sets
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
ProxySQL for MySQL
MongoDB Performance Tuning
Basics of MongoDB
Mongodb Aggregation Pipeline
EDB Postgres with Containers
 
PostgreSQL, performance for queries with grouping
ElasticSearch in action
PostgreSQL HA
PostgreSQL Replication Tutorial
OpenGurukul : Database : PostgreSQL
MongoDB
Ad

Viewers also liked (9)

PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
PDF
Creating a Modern Data Architecture for Digital Transformation
PPTX
Back to Basics Webinar 3: Introduction to Replica Sets
PDF
Design, Scale and Performance of MapR's Distribution for Hadoop
PPTX
Seattle Scalability Meetup - Ted Dunning - MapR
PDF
Webinar: 10-Step Guide to Creating a Single View of your Business
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PPTX
Back to Basics: My First MongoDB Application
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
Creating a Modern Data Architecture for Digital Transformation
Back to Basics Webinar 3: Introduction to Replica Sets
Design, Scale and Performance of MapR's Distribution for Hadoop
Seattle Scalability Meetup - Ted Dunning - MapR
Webinar: 10-Step Guide to Creating a Single View of your Business
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics: My First MongoDB Application
Ad

Similar to The Aggregation Framework (20)

PPTX
The Aggregation Framework
PDF
Aggregation Framework MongoDB Days Munich
PPTX
MongoDB's New Aggregation framework
PPTX
Aggregation Framework
PPTX
MongoDB 3.2 - Analytics
PDF
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
PPTX
mongodb-aggregation-may-2012
PPTX
Joins and Other MongoDB 3.2 Aggregation Enhancements
PPTX
Mongo - an intermediate introduction
PPTX
Aggregation in MongoDB
PPTX
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
KEY
MongoDB Aggregation Framework
PDF
Doing More with MongoDB Aggregation
PPTX
Aggregation Presentation for databses (1).pptx
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
PDF
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
PPTX
Powerful Analysis with the Aggregation Pipeline
PDF
Webinar: Data Processing and Aggregation Options
PPTX
Query for json databases
The Aggregation Framework
Aggregation Framework MongoDB Days Munich
MongoDB's New Aggregation framework
Aggregation Framework
MongoDB 3.2 - Analytics
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
mongodb-aggregation-may-2012
Joins and Other MongoDB 3.2 Aggregation Enhancements
Mongo - an intermediate introduction
Aggregation in MongoDB
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB Aggregation Framework
Doing More with MongoDB Aggregation
Aggregation Presentation for databses (1).pptx
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
Powerful Analysis with the Aggregation Pipeline
Webinar: Data Processing and Aggregation Options
Query for json databases

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB

Recently uploaded (20)

PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Getting Started with Data Integration: FME Form 101
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Mushroom cultivation and it's methods.pdf
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
Univ-Connecticut-ChatGPT-Presentaion.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Tartificialntelligence_presentation.pptx
1. Introduction to Computer Programming.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Unlocking AI with Model Context Protocol (MCP)
Getting Started with Data Integration: FME Form 101
Per capita expenditure prediction using model stacking based on satellite ima...

The Aggregation Framework

  • 1. Aggregation Framework Senior Solutions Architect, MongoDB Rick Houlihan MongoDB World
  • 2. Agenda • What is theAggregation Framework? • The Aggregation Pipeline • Usage and Limitations • Aggregation and Sharding • Summary
  • 3. What is the Aggregation Framework?
  • 5. Aggregation in Nutshell • We're storing our data in MongoDB • Our applications need ad-hoc queries • We must have a way to reshape data easily • You can use Aggregation Framework for this!
  • 6. • Extremely versatile, powerful • Overkill for simple aggregation tasks • Averages • Summation • Grouping • Reshaping MapReduce is great, but… • High level of complexity • Difficult to program and debug
  • 7. Aggregation Framework • Plays nice with sharding • Executes in native code – Written in C++ – JSON parameters • Flexible, functional, and simple – Operation pipeline – Computational expressions
  • 9. What is an Aggregation Pipeline? • ASeries of Document Transformations – Executed in stages – Original input is a collection – Output as a document, cursor or a collection • Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order $match $project $group $sort
  • 10. Pipeline Operators • $sort • Order documents • $limit / $skip • Paginate documents • $redact • Restrict documents • $geoNear • Proximity sort documents • $let, $map • Subexpression variables • $match • Filter documents • $project • Reshape documents • $group • Summarize documents • $unwind • Expand documents
  • 11. { _id: 375, title: "The Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } Our Example Data
  • 12. $match • Filter documents – Uses existing query syntax – Can facilitate shard exclusion – No $where (server side Javascript)
  • 13. Matching Field Values { title: "Atlas Shrugged", pages: 1088, language: "English" } { title: "The Great Gatsby", pages: 218, language: "English" } { title: "War and Peace", pages: 1440, language: "Russian" } { $match: { language: "Russian" }} { title: "War and Peace", pages: 1440, language: "Russian" }
  • 14. Matching with Query Operators { title: "Atlas Shrugged", pages: 1088, language: "English" } { title: "The Great Gatsby", pages: 218, language: "English" } { title: "War and Peace", pages: 1440, language: "Russian" } { $match: { pages: {$gt:100} }} { title: "War and Peace", pages: 1440, language: "Russian" } { title: ”Atlas Shrugged", pages: 1088, language: “English" }
  • 15. $project • Reshape Documents – Include, exclude or rename fields – Inject computed fields – Create sub-document fields
  • 16. Including and Excluding Fields { _id: 375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { _id: 0, title: 1, language: 1 }} { title: "Great Gatsby", language: "English" }
  • 17. Renaming and Computing Fields { _id: 375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { avgChapterLength: { $divide: ["$pages", "$chapters"] }, lang: "$language" }} { _id: 375, avgChapterLength: 24.2222, lang: "English" }
  • 18. Creating Sub-Document Fields { _id: 375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { title: 1, stats: { pages: "$pages", language: "$language", } }} { _id: 375, title: "Great Gatsby", stats: { pages: 218, language: "English" } }
  • 19. $group • Group documents by value – Field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default
  • 20. Calculating An Average { title: "The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", avgPages: { $avg: "$pages" } }} { _id: "Russian", avgPages: 1440 } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", avgPages: 653 }
  • 21. Summing Fields and Counting { title: "The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", pages: { $sum: "$pages" }, books: { $sum: 1 } }} { _id: "Russian", pages: 1440, books: 1 } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", pages: 1316, books: 2 }
  • 22. Collecting Distinct Values { title: "The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", titles: { $addToSet: "$title" } }} { _id: "Russian", titles: [“War and Peace”] } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", titles: [ "Atlas Shrugged", "The Great Gatsby” ] }
  • 23. $unwind • Operate on an array field – Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error – Pipe to $group to aggregate
  • 24. Collecting Distinct Values { title: "The Great Gatsby", ISBN: "9781857150193", subjects: [ "Long Island", "New York", "1920s" ] } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "Long Island” } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "New York” } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "1920s” } { $unwind: "$subjects" }
  • 25. $sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior
  • 26. Sort All the Documents in the Pipeline { title: “Animal Farm” } { $sort: {title: 1} } { title: “Brave New World” } { title: “Great Gatsby” } { title: “Grapes of Wrath, The” } { title: “Lord of the Flies” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” }
  • 27. Limit Documents Through the Pipeline { title: “Great Gatsby, The” } { $limit: 5 } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” }
  • 28. Skip Documents in the Pipeline { title: “Animal Farm” } { $skip: 3 } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” }
  • 29. $redact • Restrict access to Documents – Use document fields to define privileges – Apply conditional queries to validate users • Field LevelAccess Control – $$DESCEND, $$PRUNE, $$KEEP – Applies to root and subdocument fields
  • 30. { _id: 375, item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", Manufacturer: "Sony", security: 0, quantity: 12, list: 4999, pricing: { security: 1, sale: 2698, wholesale: { security: 2, amount: 2300 } } } $redact Example Data
  • 31. Query by Security Level security = 0 db.catalog.aggregate([ { $match: {item: /^.*XBR55X900A*/} }, { $redact: { $cond: { if: { $lte: [ "$security", ?? ] }, then: "$$DESCEND", else: "$$PRUNE" } } }]) { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony”, "security" : 0, "quantity" : 12, "list" : 4999 } { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony", "security" : 0, "quantity" : 12, "list" : 4999, "pricing" : { "security" : 1, "sale" : 2698, "wholesale" : { "security" : 2, "amount" : 2300 } } } security = 2
  • 32. $geoNear • Order/Filter Documents by Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage
  • 33. { "_id" : 10021, "city" : “NEW YORK”, "loc" : [ -73.958805, 40.768476 ], "pop" : 106564, "state" : ”NY” } $geonear Example Data
  • 34. Query by Proximity db.catalog.aggregate([ { $geoNear : { near: [ -86.000, 33.000 ], distanceField: "dist", maxDistance: .050, spherical: true, num: 3 } }]) { "_id" : "35089", "city" : "KELLYTON", "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL", "dist" : 0.0007971432165364155 }, { "_id" : "35010", "city" : "NEW SITE", "loc" : [ -85.951086, 32.941445 ], "pop" : 19942, "state" : "AL", "dist" : 0.0012479615347306806 }, { "_id" : "35072", "city" : "GOODWATER", "loc" : [ -86.078149, 33.074642 ], "pop" : 3813, "state" : "AL", "dist" : 0.0017333719627032555 }
  • 35. $let / $map • Bind variables to subexpressions – Apply conditional logic – Define complex calculations – Operate on array field values
  • 36. { "_id" : 1, ”price" : 10, ”tax" : 0.50, ”discount" : true } $let Example Data
  • 37. Subexpression Calculations db.sales.aggregate( [ { $project: { finalPrice: { $let: { vars: { total: { $cond: { if: '$applyDiscount', then: { $multiply: [0.9, '$price’] }, else: '$price' } } }, in: { $add: [ "$$total", '$tax'] } }}}}]) { "_id" : 1, "finalPrice" : 9.5 } { "_id" : 2, "finalPrice" : 10.25 }
  • 38. { "_id" : 1, ”price" : 10, ”tax" : 0.50, ”discount" : true, ”units" : [ 1, 0, 3, 4, 0, 0, 10, 12, 6, 5 ] } $map Example Data
  • 39. Subexpressions on Arrays db.sales.aggregate( [ { $project: { finalPrice: { $map: { input: "$units", as: "unit", in: { $multiply: [ “$$unit”, { $cond: { if: '$applyDiscount', then: { $add : [ { $multiply: [ 0.9, '$price'] }, '$tax’ ] }, else: { $add: [ '$price', '$tax’ ] } } } ] } } } } } ] ) { "_id" : 1, "finalPrice" : [ 9.5, 0, 28.5, 38, 0, 0, 95, 114, 57, 47.5 ] } { "_id" : 2, "finalPrice" : [ 51.25, 30.75, 20.5, 51.25, 0, 0, 0, 30.75, 41, 71.75 ] }
  • 41. Sharding Result mongos Shard 1 (Primary) $match, $project, $group Shard 2 $match, $project, $group Shard 3 excluded Shard 4 $match, $project, $group • Workload split between shards – Shards execute pipeline up to a point – Primary shard merges cursorsand continues processing* – Use explain to analyze pipeline split – Early $match may excuse shards – Potential CPU and memory implications for primary shard host * Priortov2.6secondstagepipelineprocessingwasdonebymongos
  • 43. Usage • collection.aggregate([…], {<options>}) – Returns a cursor – Takes an optional document to specify aggregation options • allowDiskUse, explain – Use $out to send results to a Collection • db.runCommand({aggregate:<collection>, pipeline:[…]}) – Returns a document, limited to 16 MB
  • 44. Collection db.books.aggregate([ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ]) { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 }
  • 45. Database Command db.runCommand({ aggregate: "books", pipeline: [ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ] }) { result : [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], “ok” : 1 }
  • 46. Limitations • Pipeline operator memory limits – Stages limited to 100 MB – “allowDiskUse” for larger data sets • Some BSON types unsupported – Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
  • 48. Aggregation Use Cases Ad-hoc reporting Real-timeAnalytics Transforming Data
  • 49. Enabling Developers and DBA’s • Do more with MongoDB and do it faster • Eliminate MapReduce – Replace pages of JavaScript – More efficient data processing • Not just a nice feature – Enabler for real time big data analytics