SlideShare a Scribd company logo
Real Time Analytics
Chad Tindel
chad.tindel@10gen.com
The goal
Real Time
Analytics Engine
Real Time
Analytics Engine
Data
Sourc
e
Data
Sourc
e
Data
Sourc
e
Solution goals
Simple log storage
Design Pattern
Aggregation - PipelinesAggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
are passed through a pipeline to produce
a result
– Similar to a Unix command-line pipe
Aggregation PipelineAggregation Pipeline
Aggregation - PipelinesAggregation - Pipelines
db.collection.aggregate(
[ {$match: … },
{$group: … },
{$limit: …}, etc
]
Pipeline OperationsPipeline Operations
• $match
– Uses a query predicate (like .find({…})) as a
filter
{ $match : { author : "dave" } }
{ $match : { score : { $gt : 50, $lte : 90 } } }
Pipeline OperationsPipeline Operations
• $project
– Uses a sample document to determine the
shape of the result (similar to .find()’s 2nd
optional argument)
• Include or exclude fields
• Compute new fields
– Arithmetic expressions, including built-in functions
– Pull fields from nested documents to the top
– Push fields from the top down into new virtual documents
Pipeline OperationsPipeline Operations
• $unwind
– Hands out array elements one at a time
{ $unwind : {"$myarray" } }
• $unwind “streams” arrays
– Array values are doled out one at time in the
context of their surrounding document
– Makes it possible to filter out elements before
returning
Pipeline OperationsPipeline Operations
• $group
– Aggregates items into buckets defined by a
key
GroupingGrouping
• $group aggregation expressions
– Define a grouping key as the _id of the result
– Total grouped column values: $sum
– Average grouped column values: $avg
– Collect grouped column values in an array or
set: $push, $addToSet
– Other functions
• $min, $max, $first, $last
Pipeline OperationsPipeline Operations
• $sort
– Sort documents
– Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
{ $sort : {“total”:-1} }
Pipeline OperationsPipeline Operations
• $limit
– Only allow the specified number of documents
to pass
{ $limit : 20 }
Pipeline OperationsPipeline Operations
• $skip
– Skip over the specified number of documents
{ $skip : 10 }
Computed ExpressionsComputed Expressions
• Available in $project operations
• Prefix expression language
– Add two fields: $add:[“$field1”, “$field2”]
– Provide a value for a missing field: $ifNull:
[“$field1”, “$field2”]
– Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]
(continued)
Computed ExpressionsComputed Expressions
(continued)(continued)
• String functions
– toUpper, toLower, substr
• Date field extraction
– Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
Oracle nvl())
• Ternary conditional
– Return one of two values based on a predicate
• Other functions….
– And we can easily add more as required
Sample data
Original
Event
Data
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”
As JSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}
Insert to
MongoDB
db.logs.insert( doc )
Dynamic Queries
Find all
logs for
a URL
db.logs.find( { ‘path’ : ‘/index.html’ } )
Find all
logs for
a time
range
db.logs.find( { ‘time’ :
{ ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012,1) } } );
Find all
logs for
a host
over a
range of
dates
db.logs.find( {
‘host’ : ‘127.0.0.1’,
‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );
Aggregation Framework
Request
s per
day by
URL
db.logs.aggregate( [
{ '$match': {
'time': {
'$gte': new Date(2012,0),
'$lt': new Date(2012,1) } } },
{ '$project': {
'path': 1,
'date': {
'y': { '$year': '$time' },
'm': { '$month': '$time' },
'd': { '$dayOfMonth': '$time' } } } },
{ '$group': {
'_id': {
'p':'$path’,
'y': '$date.y',
'm': '$date.m',
'd': '$date.d' },
'hits': { '$sum': 1 } } },
])
Aggregation Framework
{
‘ok’: 1,
‘result’: [
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} }
]
}
Roll-ups with map-
reduce
Design Pattern
Map Reduce – Map Phase
Generat
e hourly
rollups
from log
data
var map = function() {
var key = {
p: this.path,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0) };
emit( key, { hits: 1 } );
}
Map Reduce – Reduce Phase
Generat
e hourly
rollups
from log
data
var reduce = function(key, values) {
var r = { hits: 0 };
values.forEach(function(v) {
r.hits += v.hits;
});
return r;
}
)
Map Reduce
Generat
e hourly
rollups
from log
data
cutoff = new Date(2012,0,1)
query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
db.logs.mapReduce( map, reduce, {
‘query’: query,
‘out’: { ‘reduce’ : ‘stats.hourly’ } } )
last_run = cutoff
Map Reduce Output
> db.stats.hourly.find()
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) },
’value': { ’hits’: 124 } },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) },
’value': { ’hits’: 245} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) },
’value': { ’hits’: 322} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) },
’value': { ’hits’: 175} },
... More ...
Chained Map Reduce
Collection 1 :
Raw Logs
Collection 1 :
Raw Logs
Map
Reduce
Map
Reduce
Collection 2:
Hourly Stats
Collection 2:
Hourly Stats
Collection 3:
Daily Stats
Collection 3:
Daily Stats
Map
Reduce
Map
Reduce
Runs
every hour
Runs
every day
Pre-aggregated
documents
Design Pattern
Pre-Aggregation
Data for
URL /
Date
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": 3612,
"1": 3241,
...
"1439": 2819 }
}
Pre-Aggregation
Data for
URL /
Date
id_daily = dt_utc.strftime('%Y%m%d/') + site + page
hour = dt_utc.hour
minute = dt_utc.minute
# Get a datetime that only includes date info
d = datetime.combine(dt_utc.date(), time.min)
query = {
'_id': id_daily,
'metadata': { 'date': d, 'site': site, 'page': page } }
update = { '$inc': {
‘daily’ : 1,
'hourly.%d' % (hour,): 1,
'minute.%d.%d' % (hour,minute): 1 } }
db.stats.daily.update(query, update, upsert=True)
Pre-Aggregation
Data for
URL /
Date
db.stats.daily.findOne(
{'metadata': {'date':dt,
'site':'site-1',
'page':'/index.html'}},
{ 'minute': 1 }
);
Solution Architect, 10gen
Ad

Recommended

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Typelevel summit
Typelevel summit
Marina Sigaeva
 
ECMAScript 5: Новое в JavaScript
ECMAScript 5: Новое в JavaScript
Департамент Стратегических Технологий
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
Analysis of Algorithms-Heapsort
Analysis of Algorithms-Heapsort
Reetesh Gupta
 
Herding types with Scala macros
Herding types with Scala macros
Marina Sigaeva
 
The Ring programming language version 1.5.3 book - Part 40 of 184
The Ring programming language version 1.5.3 book - Part 40 of 184
Mahmoud Samir Fayed
 
Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
MongoDB
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Prasun Anand
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
MongoDB
 
Building a web application with mongo db
Building a web application with mongo db
MongoDB
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話
dokechin
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch Misc
Binary Studio
 
The elements of a functional mindset
The elements of a functional mindset
Eric Normand
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
InfluxData
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
MongoDB
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike
 
Mysql 4.0 casual
Mysql 4.0 casual
Masahiro Nagano
 
JavaScript Event Loop
JavaScript Event Loop
Derek Willian Stavis
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
SHRUG GIS
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
samthemonad
 
Query for json databases
Query for json databases
Binh Le
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Bubble in link list
Bubble in link list
university of Gujrat, pakistan
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Programs
Programs
kulwinderbawa007
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
 

More Related Content

What's hot (20)

Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
MongoDB
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Prasun Anand
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
MongoDB
 
Building a web application with mongo db
Building a web application with mongo db
MongoDB
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話
dokechin
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch Misc
Binary Studio
 
The elements of a functional mindset
The elements of a functional mindset
Eric Normand
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
InfluxData
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
MongoDB
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike
 
Mysql 4.0 casual
Mysql 4.0 casual
Masahiro Nagano
 
JavaScript Event Loop
JavaScript Event Loop
Derek Willian Stavis
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
SHRUG GIS
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
samthemonad
 
Query for json databases
Query for json databases
Binh Le
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
Hovhannes Kuloghlyan
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Bubble in link list
Bubble in link list
university of Gujrat, pakistan
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Programs
Programs
kulwinderbawa007
 
Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
MongoDB
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Prasun Anand
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
MongoDB
 
Building a web application with mongo db
Building a web application with mongo db
MongoDB
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話
dokechin
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch Misc
Binary Studio
 
The elements of a functional mindset
The elements of a functional mindset
Eric Normand
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
InfluxData
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
MongoDB
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
SHRUG GIS
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
samthemonad
 
Query for json databases
Query for json databases
Binh Le
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 

Similar to Schema Design by Chad Tindel, Solution Architect, 10gen (20)

How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
Chris Westin
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Meetup
MongoDB Meetup
Maxime Beugnet
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
MongoSF
 
Introduction to MongoDB
Introduction to MongoDB
Raghunath A
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using Funql
MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
Chris Westin
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB Use Cases: Healthcare, CMS, Analytics
MongoDB
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
MongoSF
 
Introduction to MongoDB
Introduction to MongoDB
Raghunath A
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using Funql
MongoDB
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
Analysis of the changes in the attitude of the news comments caused by knowin...
Analysis of the changes in the attitude of the news comments caused by knowin...
Matsushita Laboratory
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 

Schema Design by Chad Tindel, Solution Architect, 10gen

  • 2. The goal Real Time Analytics Engine Real Time Analytics Engine Data Sourc e Data Sourc e Data Sourc e
  • 5. Aggregation - PipelinesAggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a Unix command-line pipe
  • 7. Aggregation - PipelinesAggregation - Pipelines db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit: …}, etc ]
  • 8. Pipeline OperationsPipeline Operations • $match – Uses a query predicate (like .find({…})) as a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }
  • 9. Pipeline OperationsPipeline Operations • $project – Uses a sample document to determine the shape of the result (similar to .find()’s 2nd optional argument) • Include or exclude fields • Compute new fields – Arithmetic expressions, including built-in functions – Pull fields from nested documents to the top – Push fields from the top down into new virtual documents
  • 10. Pipeline OperationsPipeline Operations • $unwind – Hands out array elements one at a time { $unwind : {"$myarray" } } • $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning
  • 11. Pipeline OperationsPipeline Operations • $group – Aggregates items into buckets defined by a key
  • 12. GroupingGrouping • $group aggregation expressions – Define a grouping key as the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions • $min, $max, $first, $last
  • 13. Pipeline OperationsPipeline Operations • $sort – Sort documents – Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }
  • 14. Pipeline OperationsPipeline Operations • $limit – Only allow the specified number of documents to pass { $limit : 20 }
  • 15. Pipeline OperationsPipeline Operations • $skip – Skip over the specified number of documents { $skip : 10 }
  • 16. Computed ExpressionsComputed Expressions • Available in $project operations • Prefix expression language – Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] (continued)
  • 17. Computed ExpressionsComputed Expressions (continued)(continued) • String functions – toUpper, toLower, substr • Date field extraction – Get year, month, day, hour, etc, from ISODate • Date arithmetic • Null value substitution (like MySQL ifnull(), Oracle nvl()) • Ternary conditional – Return one of two values based on a predicate • Other functions…. – And we can easily add more as required
  • 18. Sample data Original Event Data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)” As JSON doc = { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)” } Insert to MongoDB db.logs.insert( doc )
  • 19. Dynamic Queries Find all logs for a URL db.logs.find( { ‘path’ : ‘/index.html’ } ) Find all logs for a time range db.logs.find( { ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012,1) } } ); Find all logs for a host over a range of dates db.logs.find( { ‘host’ : ‘127.0.0.1’, ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012, 1) } } );
  • 20. Aggregation Framework Request s per day by URL db.logs.aggregate( [ { '$match': { 'time': { '$gte': new Date(2012,0), '$lt': new Date(2012,1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p':'$path’, 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } }, ])
  • 21. Aggregation Framework { ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} } ] }
  • 23. Map Reduce – Map Phase Generat e hourly rollups from log data var map = function() { var key = { p: this.path, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } ); }
  • 24. Map Reduce – Reduce Phase Generat e hourly rollups from log data var reduce = function(key, values) { var r = { hits: 0 }; values.forEach(function(v) { r.hits += v.hits; }); return r; } )
  • 25. Map Reduce Generat e hourly rollups from log data cutoff = new Date(2012,0,1) query = { 'ts': { '$gt': last_run, '$lt': cutoff } } db.logs.mapReduce( map, reduce, { ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.hourly’ } } ) last_run = cutoff
  • 26. Map Reduce Output > db.stats.hourly.find() { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) }, ’value': { ’hits’: 124 } }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) }, ’value': { ’hits’: 245} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) }, ’value': { ’hits’: 322} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) }, ’value': { ’hits’: 175} }, ... More ...
  • 27. Chained Map Reduce Collection 1 : Raw Logs Collection 1 : Raw Logs Map Reduce Map Reduce Collection 2: Hourly Stats Collection 2: Hourly Stats Collection 3: Daily Stats Collection 3: Daily Stats Map Reduce Map Reduce Runs every hour Runs every day
  • 29. Pre-Aggregation Data for URL / Date { _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }
  • 30. Pre-Aggregation Data for URL / Date id_daily = dt_utc.strftime('%Y%m%d/') + site + page hour = dt_utc.hour minute = dt_utc.minute # Get a datetime that only includes date info d = datetime.combine(dt_utc.date(), time.min) query = { '_id': id_daily, 'metadata': { 'date': d, 'site': site, 'page': page } } update = { '$inc': { ‘daily’ : 1, 'hourly.%d' % (hour,): 1, 'minute.%d.%d' % (hour,minute): 1 } } db.stats.daily.update(query, update, upsert=True)
  • 31. Pre-Aggregation Data for URL / Date db.stats.daily.findOne( {'metadata': {'date':dt, 'site':'site-1', 'page':'/index.html'}}, { 'minute': 1 } );