SlideShare a Scribd company logo
Technical Support Engineer, 10gen
Gianfranco Palumbo
#bigdatajaspersoft
How to leverage MongoDB for
Big Data Analysis and
Operations
@MongoDBDublin
Join us this evening at Dublin
MUG
meetup.com/DublinMUG/
Big Data
http://www.worldwidewebsize.com/
Exponential Data Growth
MongoDB solves our needs
• Ideal operational database
• Provides high performance for storage and
retrieval at large scale
• Has a robust query interface permitting intelligent
operations
• Is not a data processing engine, but provides
processing functionality
Data Processing in MongoDB
• Process in MongoDB using Map/Reduce
• Process in MongoDB using Aggregation
Framework
• Process outside MongoDB using Hadoop and
other external tools
The goal
Real Time
Analytics Engine
Data
SourceData
SourceData
Source
Sample Customers
Solution goals
• Lots of data sources
• Lots of data from each source
High write volume
• Users can drill down into dataDynamic queries
• Lots of clients
• High request rate
Fast queries
• How long before an event appears
in a report?
Minimize delay
between collection &
query
System architecture
Systems Architecture
Data
Sources
Asynchronous writes
Upserts avoid
unnecessary reads
Writes buffered in
RAM and flushed to
disk in bulk
Data
Sources
Data
Sources
Data
Sources
Spread writes over
multiple shards
Simple log storage
Design Pattern
Sample data
Original
Event Data
127.0.0.1 - frank [10/Jun/2013:13:55:36 -0700] "GET
/apache_pb.gif HTTP/1.0" 200 2326
“http://www.example.com/start.html" Mozilla/5.0 (Macintosh; U; Intel
Mac OS X 10_7_4; en-US)”
As JSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2013-06-10T20:55:36Z"),
path: “/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/5.0 (Macintosh; U; Intel Mac OS X
10_7_4; en-US)”
}
Insert to
MongoDB
db.logs.insert( doc )
Dynamic Queries
Find all logs for a
URL
db.logs.find( { ‘path’ : ‘/index.html’ } )
Find all logs for a
time range
db.logs.find( {
‘time’ : {
‘$gte’: new Date(2013, 0),
‘$lt’: new Date(2013, s1) }
} )
Find all logs for a
host over a range of
dates
db.logs.find( {
‘host’ : ‘127.0.0.1’,
‘time’ : {
‘$gte’: new Date(2013, 0),
‘$lt’: new Date(2013, 1) }
} )
Aggregation
Framework
MongoDB Aggregation
Framework
Aggregation Framework
Requests
per day by
URL
db.logs.aggregate( [
{ '$match': {
'time': {
'$gte': new Date(2013, 0),
'$lt': new Date(2013, 1) } } },
{ '$project': {
'path': 1,
'date': {
'y': { '$year': '$time' },
'm': { '$month': '$time' },
'd': { '$dayOfMonth': '$time' } } } },
{ '$group': {
'_id': {
'p': '$path',
'y': '$date.y',
'm': '$date.m',
'd': '$date.d' },
'hits': { '$sum': 1 } } },
])
Aggregation Framework
{
‘ok’: 1,
‘result’: [
{ '_id': {'p':’/index.html’,'y': 2013,'m': 1,'d': 1 },
'hits’: 124 },
{ '_id': {'p':’/index.html’,'y': 2013,'m': 1,'d': 2 },
'hits’: 245 },
{ '_id': {'p':’/index.html’,'y': 2013,'m': 1,'d': 3 },
'hits’: 322 },
{ '_id': {'p':’/index.html’,'y': 2013,'m': 1,'d': 4 },
'hits’: 175 },
{ '_id': {'p':’/index.html’,'y': 2013,'m': 1,'d': 5 },
'hits’: 94 }
]
}
Aggregation Framework
Benefits
• Real-time
• Simple yet powerful interface
• Declared in JSON, executes in C++
• Runs inside MongoDB on local data

• Adds load to your DB
• Limited how much data it can return
Roll-ups with map-
reduce
Design Pattern
MongoDB Map/Reduce
Map Reduce – Map Phase
Generate hourly
rollups from log
data
var map = function() {
var key = {
p: this.path,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0) };
emit( key, { hits: 1 } );
}
Map Reduce – Reduce Phase
Generate hourly
rollups from log
data
var reduce = function(key, values) {
var r = { hits: 0 };
values.forEach(function(v) {
r.hits += v.hits;
});
return r;
}
)
MongoDB Map/Reduce
• Real-time
• Output directly to document or collection
• Runs inside MongoDB on local data
• V8 engine
• Adds load to your DB
• In JavaScript
Integrations
REPORTING
Charting
APACHE HADOOP
Log
Aggregation
with MongoDB
as sink
More complex
aggregations or
integration with tools like
Mahout
MongoDB
MongoDB with Hadoop
MongoDB with Hadoop
MongoDB with Hadoop
MongoDB and Hadoop
• Away from data store
• Can leverage existing data processing infrastructure
• Can horizontally scale your data processing
• Offline batch processing
• Requires synchronization between store &
processor
• Infrastructure is much more complex
The Future of Big Data and
MongoDB
What is Big?
Big today is normal
tomorrow
http://www.worldwidewebsize.com/
Big is only getting bigger
IBM - http://www-01.ibm.com/software/data/bigdata/
90% of the data in the
world today has been
created in the last two
years
MongoDB enables
you to scale to the
redefinition of BIG
MongoDB is evolving
to enable you to process
the new BIG
Gianfranco Palumbo – slides tweeted from @MongoDBDublin
MongoDB is committed to
working with the best data
processing tools
• Map Reduce
• Aggregation Framework
• Hadoop adapter
– docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/
• Storm
– github.com/christkv/mongo-storm
• Disco
– github.com/mongodb/mongo-disco
• Spark (coming soon)
Technical Support Engineer, 10gen
Gianfranco Palumbo
#bigdatajaspersoft
Thank you
@MongoDBDublin

More Related Content

What's hot (20)

Introduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
High Performance Applications with MongoDB
High Performance Applications with MongoDB
MongoDB
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
Introduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
MongoDB and Python
MongoDB and Python
Norberto Leite
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick Database
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
MongoDB
 
Python and MongoDB
Python and MongoDB
Norberto Leite
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
NodeXperts
 
Basics of MongoDB
Basics of MongoDB
HabileLabs
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
MongoDB
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
rogerbodamer
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
High Performance Applications with MongoDB
High Performance Applications with MongoDB
MongoDB
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
Introduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
MongoDB
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick Database
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
MongoDB
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
NodeXperts
 
Basics of MongoDB
Basics of MongoDB
HabileLabs
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
MongoDB
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
rogerbodamer
 

Viewers also liked (20)

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB at the energy frontier
MongoDB at the energy frontier
Valentin Kuznetsov
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
MongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Pierre-Alban DEWITTE
 
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
Ajay Gupte
 
Analytic Data Report with MongoDB
Analytic Data Report with MongoDB
Li Jia Li
 
Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation Framework
Денис Кравченко
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
AhmedabadJavaMeetup
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Collaborative Line of Business Applications on IBM Bluemix
Collaborative Line of Business Applications on IBM Bluemix
Niklas Heidloff
 
MongoDB et Elasticsearch, meilleurs ennemis ?
MongoDB et Elasticsearch, meilleurs ennemis ?
Sébastien Prunier
 
MongoDB - Ekino PHP
MongoDB - Ekino PHP
Florent DENIS
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
MongoDB
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB at the energy frontier
MongoDB at the energy frontier
Valentin Kuznetsov
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
MongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB
Pierre-Alban DEWITTE
 
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
Ajay Gupte
 
Analytic Data Report with MongoDB
Analytic Data Report with MongoDB
Li Jia Li
 
Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation Framework
Денис Кравченко
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
Chris Westin
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
AhmedabadJavaMeetup
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Collaborative Line of Business Applications on IBM Bluemix
Collaborative Line of Business Applications on IBM Bluemix
Niklas Heidloff
 
MongoDB et Elasticsearch, meilleurs ennemis ?
MongoDB et Elasticsearch, meilleurs ennemis ?
Sébastien Prunier
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
MongoDB
 
Ad

Similar to How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's Aggregation Framework and Map Reduce (20)

Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
Using MongoDB and Python
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
Michael Bright
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
MongoDB_Spark
MongoDB_Spark
Mat Keep
 
MongoDB_ppt.pptx
MongoDB_ppt.pptx
1AP18CS037ShirishKul
 
MongoDB FabLab León
MongoDB FabLab León
Juan Antonio Roy Couto
 
MongoDB.pdf
MongoDB.pdf
KuldeepKumar778733
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Data as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDB
Mitch Pirtle
 
MongoDB Meetup
MongoDB Meetup
Maxime Beugnet
 
MongoDB
MongoDB
Steven Francia
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docx
Technogeeks
 
introtomongodb
introtomongodb
saikiran
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB
 
Which Questions We Should Have
Which Questions We Should Have
Oracle Korea
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
MongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
Using MongoDB and Python
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
Michael Bright
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
MongoDB_Spark
MongoDB_Spark
Mat Keep
 
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB
 
Data as Documents: Overview and intro to MongoDB
Data as Documents: Overview and intro to MongoDB
Mitch Pirtle
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docx
Technogeeks
 
introtomongodb
introtomongodb
saikiran
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB
 
Which Questions We Should Have
Which Questions We Should Have
Oracle Korea
 
Ad

Recently uploaded (20)

vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 

How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's Aggregation Framework and Map Reduce

Editor's Notes

  • #2: i'm going to talk about How to leveragei hope you end with learning about mongodb
  • #4: Start by saying "I want to start asking a question what is it?”
  • #5: Google, Amazon, and Facebook built custom tools to handle massive amounts of data.MongoDB led an opensource movement to provide a viable alternative to proprietary solutions to handle big dataIt’s just Data! Don’t panic
  • #7: We will be demonstrating how to do each of these today and discuss why and when you would use each.
  • #8: Chart, Trends, Insights
  • #9: Traackr social mediaIntuit small business, personal finance and tax software
  • #10: Not only data size but also the rate the data comes inFor example twitterWhat is the tolerable delay? How complex is the processing of the data?
  • #17: I often think of Map reduce as the Marmite of MongoDB - people either love it or hate it.For that very reason we've produced the aggregation framework in 2.2 and its only getting better in 2.4!
  • #18: $project, $match, $unwind, $group - $limit, $skip, $sortNojavascript code$outMore operators coming soon
  • #22: The original aggregation utility in mongodb.Simplified view -> from mongodbc++ to the js runtime1) You create a map function2) MAP returns results mongoDB then groups and sorts the results3) Then passes the values to reduce4) Finialise is optionalBack to the c++ runtime
  • #23: Summarise by hour and save that in a collection.
  • #24: Map and reduce need to return the same object. Because it can the reduce can be run again.
  • #25: V8 in 2.4 & muiltithreaded
  • #28: JobsHigher latency
  • #36: The mongodbhadoop adapter allows you to stream data into hadoop and outSo you can scale data processing across many machines for batch processing.
  • #37: Another common usecase we see is warehousing of data - again the connector allows you to utilise existing libraries via hadoop
  • #38: The third most common usecase is an ETL - extract transform load - function.Then putting the aggregated data into mongodb for further analysis.
  • #42: Google, Amazon, and Facebook built custom tools to handle massive amounts of data.MongoDB led an opensource movement to provide a viable alternative to proprietary solutions to handle big data
  • #43: Horizontally scale out and providing sharding tools out the box
  • #44: Horizontally scale out and providing sharding tools out the box
  • #45: Our next challenge is helping you make sense of your data
  • #46: Map / Reduce - allows complex programable aggregationsAggregation Framework - easy and simple access to aggregationHadoop - the start of our integration with external toolsStorm Distributed and fault-tolerant realtime computation system. - used by Twitter, Groupon, etcmore flexible, incremental processingDisco is an open-source implementation of the Map-Reduce framework for distributed computing. - developed by Nokia Research Center
  • #47: Meetupeducation.10gen.com