SlideShare a Scribd company logo
Consulting Engineer, MongoDB
Bryan Reinero
#ConferenceHashTag
Time Series Data- Part 2
Aggregations in Action
Real Time Traffic Data Project
Our network of 16,000 speed sensors report
data every minute.
What we want from our data
Charting and Trending
What we want from our data
Historical & Predictive Analysis
What we want from our data
Real Time Traffic Dashboard
Document Structure
{ _id: ObjectId("5382ccdd58db8b81730344e2"),
linkId: 900006,
date: ISODate("2014-03-12T17:00:00Z"),
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
}
Sample Document Structure
Compound, unique
Index identifies the
Individual document
{ _id: ObjectId("5382ccdd58db8b81730344e2"),
linkId: 900006,
date: ISODate("2014-03-12T17:00:00Z"),
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
}
Sample Document Structure
Saves an extra index
{ _id: “900006:14031217”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
}
{ _id: “900006:14031217”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
}
Sample Document Structure
Range queries:
/^900006:1403/
Regex must be
left-anchored &
case-sensitive
{ _id: “900006:140312”,
data: [
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
{ speed: NaN, time: NaN },
...
],
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
}
Sample Document Structure
Pre-allocated,
60 element array of
per-minute data
Charts
0
10
20
30
40
50
60
70
MonMar10201404:57:00…
MonMar10201405:31:00…
MonMar10201406:05:00…
MonMar10201406:39:00…
MonMar10201407:13:00…
MonMar10201407:47:00…
MonMar10201408:21:00…
MonMar10201408:55:00…
MonMar10201409:29:00…
MonMar10201410:04:00…
MonMar10201410:38:00…
MonMar10201411:55:00…
TueMar11201402:41:00…
TueMar11201403:15:00…
TueMar11201403:49:00…
TueMar11201404:39:00…
TueMar11201405:13:00…
TueMar11201405:47:00…
TueMar11201406:21:00…
TueMar11201406:55:00…
TueMar11201407:29:00…
TueMar11201408:03:00…
TueMar11201408:37:00…
TueMar11201409:18:00…
TueMar11201410:44:00…
TueMar11201411:18:00…
TueMar11201411:53:00…
TueMar11201412:27:00…
TueMar11201413:04:00…
TueMar11201413:38:00…
TueMar11201414:15:00…
TueMar11201416:56:00…
WedMar12201401:45:00…
WedMar12201402:19:00…
WedMar12201402:53:00…
WedMar12201403:27:00…
WedMar12201406:46:00…
WedMar12201408:26:00…
WedMar12201409:00:00…
WedMar12201410:12:00…
WedMar12201410:46:00…
db.linkData.find( { _id : /^20484097:2014031/ } )
Rollups
{ _id: "20484097:20140204",
hours: [
{ speed: { sum: 1889, count: 60 }
time: { sum: 20562, count: 60 },
conditions: {
status: "Snow / Ice Conditions",
pavement: "Icy Spots",
weather: "Light Snow"
}
},
{ speed: {m: 1892, count: 60 },
time: {sum: 20442, count: 60 },
conditions: {
status: "Snow / Ice Conditions",
pavement: "Slush",
weather: "Light Snow"
}
}
]}
Document retention
Doc per hour
Doc per day
2 days
2 months
1year
Doc per Month
Analysis with The Aggregation
Framework
Pipelining operations
grep | sort | uniq
Piping command line operations
Pipelining operations
$match $group | $sort|
Piping aggregation operations
Stream of documents Result documents
What is the average speed for a
given road segment?
> db.linkData.aggregate(
{ $match: { ”_id" : /^20484097:/ } },
{ $project: { "data.speed": 1 } } ,
{ $unwind: "$data"},
{ $group: { _id: “”, ave: { $avg: "$data.speed"} } }
);
{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a
given road segment?
Select documents on the target segment
> db.linkData.aggregate(
{ $match: { ”_id" : /^20484097:/ } },
{ $project: { "data.speed": 1, linkId: 1 } } ,
{ $unwind: "$data"},
{ $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } }
);
{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a
given road segment?
Keep only the fields we really need
> db.linkData.aggregate(
{ $match: { ”_id" : /^20484097:/ } },
{ $project: { "data.speed": 1, linkId: 1 } } ,
{ $unwind: "$data"},
{ $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } }
);
{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a
given road segment?
Loop over the array of data points
> db.linkData.aggregate(
{ $match: { ”_id" : /^20484097:/ } },
{ $project: { "data.speed": 1, linkId: 1 } } ,
{ $unwind: "$data"},
{ $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } }
);
{ "_id" : 20484097, "ave" : 47.067650676506766 }
What is the average speed for a
given road segment?
Use the handy $avg operator
> db.linkData.aggregate(
{ $match: { ”_id" : /^20484097:/ } },
{ $project: { "data.speed": 1, linkId: 1 } } ,
{ $unwind: "$data"},
{ $group: { _id: "$linkId", ave: { $avg: "$data.speed"} } }
);
{ "_id" : 20484097, "ave" : 47.067650676506766 }
More Sophisticated Pipelines:
average speed with variance
{ "$project" : {
mean: "$meanSpd",
spdDiffSqrd : {
"$map" : {
"input": {
"$map" : {
"input" : "$speeds",
"as" : "samp",
"in" : { "$subtract" : [ "$$samp", "$meanSpd" ] }
}
},
as: "df", in: { $multiply: [ "$$df", "$$df" ] }
} } } },
{ $unwind: "$spdDiffSqrd" },
{ $group: { _id: mean: "$mean", variance: { $avg: "$spdDiffSqrd" } } }
Historic Analysis
How does weather and road conditions affect
traffic?
The Ask: what are the average speeds per
weather, status and pavement
MapReduce
function map() {
for( var i = 0; i < this.data.length; i++ ) {
emit (
this.conditions.weather,
{ speed : this.data[i].speed }
);
emit (
this.conditions.status,
{ speed : this.data[i].speed }
);
emit (
this.conditions.pavement,
{ speed : this.data[i].speed }
);
} }
MapReduce
function map() {
for( var i = 0; i < this.data.length; i++ ) {
emit (
this.conditions.weather,
{ speed : this.data[i].speed }
);
emit (
this.conditions.status,
{ speed : this.data[i].speed }
);
emit (
this.conditions.pavement,
{ speed : this.data[i].speed }
);
} }
“Snow”,
34
MapReduce
function map() {
for( var i = 0; i < this.data.length; i++ ) {
emit (
this.conditions.weather,
{ speed : this.data[i].speed }
);
emit (
this.conditions.status,
{ speed : this.data[i].speed }
);
emit (
this.conditions.pavement,
{ speed : this.data[i].speed }
);
} }
“Icy spots”, 34
MapReduce
function map() {
for( var i = 0; i < this.data.length; i++ ) {
emit (
this.conditions.weather,
{ speed : this.data[i].speed }
);
emit (
this.conditions.status,
{ speed : this.data[i].speed }
);
emit (
this.conditions.pavement,
{ speed : this.data[i].speed }
);
} }
“Delays”, 34
MapReduce
MapReduce
Weather: “Rain”, speed: 44
MapReduce
Weather: “Rain”, speed: 39
MapReduce
Weather: “Rain”, speed: 46
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 };
values.forEach( function( v ){
result.speedSum += v.speed;
result.count++;
});
return result;
}
MapReduce
function reduce ( key, values ) {
var result = { count : 1, speedSum : 0 };
values.forEach( function( v ){
result.speedSum += v.speed;
result.count++;
});
return result;
}
Results
results: [
{
"_id" : "Generally Clear and Dry Conditions",
"value" : {
"count" : 902,
"speedSum" : 45100
}
},
{
"_id" : "Icy Spots",
"value" : {
"count" : 242,
"speedSum" : 9438
}
},
{
"_id" : "Light Snow",
"value" : {
"count" : 122,
"speedSum" : 7686
}
},
{
"_id" : "No Report",
"value" : {
"count" : 782,
"speedSum" : NaN
}
}
Processing Large Data Sets
• Need to break data into smaller pieces
• Process data across multiple nodes
Hadoop Hadoop Hadoop Hadoop
Hadoop Hadoop Hadoop HadoopHadoop
Hadoop
Benefits of the Hadoop Connector
• Increased parallelism
• Access to analytics libraries
• Separation of concerns
• Integrates with existing tool chains
• Drivers will be accessing the data via web, mobile
devices, and navigation systems
• We need to provide current average speed, travel time
and weather per road segment
Real-time Dashboard
Current Real-Time Conditions
Last ten minutes of speeds and
times
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
Current Real-Time Conditions
Pre-aggregated
metrics
{ _id : “I-87:10656”,
description : "NYS Thruway Harriman Section Exits 14A - 16",
update : ISODate(“2013-10-10T23:06:37.000Z”),
speeds : [ 52, 49, 45, 51, ... ],
times : [ 237, 224, 246, 233,... ],
pavement: "Wet Spots",
status: "Wet Conditions",
weather: "Light Rain”,
averageSpeed: 50.23,
averageTime: 234,
maxSafeSpeed: 53.1,
location" : {
"type" : "LineString",
"coordinates" : [
[ -74.056, 41.098 ],
[ -74.077, 41.104 ] }
}
Current Real-Time Conditions
Geo-spatially indexed
road segment
db.linksAvg.update(
{"_id" : linkId},
{ "$set" : {"update " : date},
"$push" : {
"times" : { "$each" : [ time ], "$slice" : -10 },
"speeds" : {"$each" : [ speed ], "$slice" : -10}
}
})
Maintaining the current conditions
Each update pops the last element off the
array and pushes the new value
Putting it all together
Patterns common to time series
data:
• You need to store and manage an incoming
stream of data samples
• You need to compute derivative data sets based
on these samples
• You need low latency access to up-to-date data
Patterns common to time series
data:
• You need to store and manage an incoming
stream of data samples
• You need to compute derivative data sets based
on these samples
• You need low latency access to up-to-date data
Introducing The High Volume Data
Feed
HVDF: Reference Implementation
Screech -- High Volume Data Feed engine
REST
Service API
Processor
Plugins
Inline
Batch
Stream
Channel Data Storage
Raw
Channel
Data
Aggregated
Rollup T1
Aggregated
Rollup T2
Query Processor Streaming spout
Custom Stream
Processing Logic
Incoming Sample Stream
POST /feed/channel/data
GET
/feed/channeldata?time=XX
X&range=YYY
Real-time Queries
HVDF:
https://github.com/10gen-labs/hvdf
Hadoop Connector:
https://github.com/mongodb/mongo-hadoop
Consulting Engineer, MongoDB Inc.
Bryan Reinero
#MongoDBWorld
Thank You

More Related Content

What's hot (20)

Spark - Alexis Seigneurin (Français)
Spark - Alexis Seigneurin (Français)
Alexis Seigneurin
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
InfluxData
 
PostgreSQL and RAM usage
PostgreSQL and RAM usage
Alexey Bashtanov
 
MongoDB for Time Series Data
MongoDB for Time Series Data
MongoDB
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQL
Open Gurukul
 
TEI HAL - import SWORD
TEI HAL - import SWORD
OAccsd
 
Exactly once with spark streaming
Exactly once with spark streaming
Quentin Ambard
 
PostgreSQL HA
PostgreSQL HA
haroonm
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
 
Construisez votre première application MongoDB
Construisez votre première application MongoDB
MongoDB
 
Spark overview
Spark overview
Lisa Hua
 
Building an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Geospatial and MongoDB
Geospatial and MongoDB
Norberto Leite
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
MongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
 
Grid view control
Grid view control
Paneliya Prince
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
Mydbops
 
Sequelize
Sequelize
Tarek Raihan
 
Spark - Alexis Seigneurin (Français)
Spark - Alexis Seigneurin (Français)
Alexis Seigneurin
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
InfluxData
 
MongoDB for Time Series Data
MongoDB for Time Series Data
MongoDB
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQL
Open Gurukul
 
TEI HAL - import SWORD
TEI HAL - import SWORD
OAccsd
 
Exactly once with spark streaming
Exactly once with spark streaming
Quentin Ambard
 
PostgreSQL HA
PostgreSQL HA
haroonm
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
 
Construisez votre première application MongoDB
Construisez votre première application MongoDB
MongoDB
 
Spark overview
Spark overview
Lisa Hua
 
Building an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Geospatial and MongoDB
Geospatial and MongoDB
Norberto Leite
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
MongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
Mydbops
 

Viewers also liked (17)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithm
DataminingTools Inc
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
MongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
clive boulton
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
mcsrivas
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithm
DataminingTools Inc
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
MongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
clive boulton
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
mcsrivas
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
Ad

Similar to MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop (20)

MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Building your first Java Application with MongoDB
Building your first Java Application with MongoDB
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Final_Report
Final_Report
Chandrasekar Hariharan
 
Past, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache Hadoop
Codemotion
 
Internet of things
Internet of things
Bryan Reinero
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
MongoDB 3.2 - Analytics
MongoDB 3.2 - Analytics
Massimo Brignoli
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
Randall Hunt
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environment
Siddharth Chaudhary
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
ajhannan
 
Fortuna 2012 physical_mashup_artificial_intelligence
Fortuna 2012 physical_mashup_artificial_intelligence
carolninap
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Building your first Java Application with MongoDB
Building your first Java Application with MongoDB
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Past, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache Hadoop
Codemotion
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
Randall Hunt
 
Project on nypd accident analysis using hadoop environment
Project on nypd accident analysis using hadoop environment
Siddharth Chaudhary
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
ajhannan
 
Fortuna 2012 physical_mashup_artificial_intelligence
Fortuna 2012 physical_mashup_artificial_intelligence
carolninap
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Oracle Cloud and AI Specialization Program
Oracle Cloud and AI Specialization Program
VICTOR MAESTRE RAMIREZ
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
Introduction to Typescript - GDG On Campus EUE
Introduction to Typescript - GDG On Campus EUE
Google Developer Group On Campus European Universities in Egypt
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Oracle Cloud and AI Specialization Program
Oracle Cloud and AI Specialization Program
VICTOR MAESTRE RAMIREZ
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
FME for Good: Integrating Multiple Data Sources with APIs to Support Local Ch...
Safe Software
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

Editor's Notes

  • #3: Reports (group, summing, averaging) Analytics(incremental reporting, rollups) Analysis (trends, segmentation, anomalies) Analytics (regression, forecasting, filtering) Warehousing (long term storage and simplified querying)
  • #7: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #8: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #9: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #10: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #11: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #12: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #13: Compound unique index on linkId & Interval update field used to identify new documents for aggregation
  • #18: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #19: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #20: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #21: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #22: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #23: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #39: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #40: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #41: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #42: Priority Floating point number between 0..1000 Highest member that is up to date wins Up to date == within 10 seconds of primary If a higher priority member catches up, it will force election and win Slave Delay Lags behind master by configurable time delay Automatically hidden from clients Protects against operator errors Fat fingering Application corrupts data
  • #44: Reports (group, summing, averaging) Analytics(incremental reporting, rollups) Analysis (trends, segmentation, anomalies) Analytics (regression, forecasting, filtering) Warehousing (long term storage and simplified querying)
  • #45: Reports (group, summing, averaging) Analytics(incremental reporting, rollups) Analysis (trends, segmentation, anomalies) Analytics (regression, forecasting, filtering) Warehousing (long term storage and simplified querying)
  • #46: Reports (group, summing, averaging) Analytics(incremental reporting, rollups) Analysis (trends, segmentation, anomalies) Analytics (regression, forecasting, filtering) Warehousing (long term storage and simplified querying)