SlideShare a Scribd company logo
Mongo Analytics – 
Learn aggregation by example 
Exploratory Analytics and 
Visualization using Flight Data 
www.jsonstudio.com
Analyzing Flight Data 
• JSON data imported from CSV downloaded from www.transtats.bts.gov 
• Sample document for a flight: 
{ 
"_id": { "$oid": "534205f61c479f6149a92709" }, 
"YEAR": 2013, "QUARTER": 1, 
"MONTH": 1, 
"DAY_OF_MONTH": 18, 
"DAY_OF_WEEK": 5, 
"FL_DATE": "2013-01-18”, 
"UNIQUE_CARRIER": "DL”, 
"AIRLINE_ID": 19790, 
"CARRIER": "DL", 
"TAIL_NUM": "N325US”, 
"FL_NUM": 1497, 
"ORIGIN_AIRPORT_ID": 14100, 
"ORIGIN_AIRPORT_SEQ_ID": 1410002, 
"ORIGIN_CITY_MARKET_ID": 34100, 
"ORIGIN": "PHL", 
"ORIGIN_CITY_NAME": "Philadelphia, PA", 
"ORIGIN_STATE_ABR": "PA”, 
"ORIGIN_STATE_FIPS": 42, 
"DEST_AIRPORT_ID": 13487, 
"DEST_AIRPORT_SEQ_ID": 1348702, 
"DEST_CITY_MARKET_ID": 31650, 
"DEST": "MSP", 
"DEST_CITY_NAME": "Minneapolis, MN", 
"DEST_STATE_ABR": "MN", 
"DEST_STATE_FIPS": 27, 
"DEST_STATE_NM": "Minnesota", 
"DEST_WAC": 63, 
"CRS_DEP_TIME": 805, 
"DEP_TIME": 758, 
"DEP_DELAY": -7, 
"DEP_DELAY_NEW": 0, 
"DEP_DEL15": 0, 
"DEP_DELAY_GROUP": -1, 
"DEP_TIME_BLK": "0800-0859", 
"TAXI_OUT": 24, 
"WHEELS_OFF": 822, 
"WHEELS_ON": 958, 
"TAXI_IN": 4, 
"CRS_ARR_TIME": 1015, 
"ARR_TIME": 1002, 
"ARR_DELAY": -13, 
"ARR_DELAY_NEW": 0, 
"ARR_DEL15": 0, 
"ARR_DELAY_GROUP": -1, 
"ARR_TIME_BLK": "1000-1059", 
"CANCELLED": 0, 
"CANCELLATION_CODE": "", 
"DIVERTED": 0, 
"CRS_ELAPSED_TIME": 190, 
"ACTUAL_ELAPSED_TIME": 184, 
"AIR_TIME": 156, 
"FLIGHTS": 1, 
"DISTANCE": 980, 
"DISTANCE_GROUP": 4, 
"CARRIER_DELAY": "", 
"WEATHER_DELAY": "", 
"NAS_DELAY": "", 
"SECURITY_DELAY": "", 
"LATE_AIRCRAFT_DELAY": "", 
"FIRST_DEP_TIME": "", 
"TOTAL_ADD_GTIME": "", 
"LONGEST_ADD_GTIME": "", 
"": "" 
} 
• We will build aggregation pipelines and visualize data using JSON Studio (www.jsonstudio.com) 
• You will fall madly in love with the Aggregation Framework and it’s capabilities
MongoDB aggregation steps/stages 
• Grouping 
• Matching/filtering 
• Projection 
• Sorting 
• Unwind 
• Limit, skip 
• Added in 2.6 
– Out 
– Redact
Who are the largest carriers?
Some Carrier Stats { 
"$group": { 
"_id": { 
"CARRIER": "$CARRIER" 
}, 
"_avg_DEP_DELAY": { 
"$avg": "$DEP_DELAY" 
}, 
"_avg_ARR_DELAY": { 
"$avg": "$ARR_DELAY" 
}, 
"_avg_DISTANCE_GROUP": { 
"$avg": "$DISTANCE_GROUP" 
}, 
"_avg_TAXI_IN": { 
"$avg": "$TAXI_IN" 
}, 
"_avg_TAXI_OUT": { 
"$avg": "$TAXI_OUT" 
} 
} 
} 
{ 
"_id": { 
"CARRIER": "9E" 
}, 
"_avg_DEP_DELAY": 8.45451754385965, 
"_avg_ARR_DELAY": 3.3237368838726744, 
"_avg_DISTANCE_GROUP": 2.2188688815622624, 
"_avg_TAXI_IN": 7.082464246424642, 
"_avg_TAXI_OUT": 20.558167120639663 
}
Which airports have the most cancellations?
Which carriers are most at fault for cancellations?
Arrival delays by distance
Delays by distance by carrier
Delays by distance by carrier – long haul only
Words of caution (courtesy of David Weisman)
Words of caution (courtesy of David Weisman)
What to do? 
“Touch” the data – e.g. Histograms
Words of caution (courtesy of David Weisman)
Words of caution (courtesy of David Weisman)
Order Does Matter 
http://docs.mongodb.org/manual/core/aggregation-pipeline-optimization/
An example for $unwind 
Count how many airports each carrier lands in 
{ 
"_id": { 
"$oid": "5383623b7bfb8767e2e9ca1f" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": [ 
"AA", 
"UA", 
"DL", 
"BA" 
] 
} 
… 
[ 
{ 
"_id": { 
"$oid": "5383623b7bfb8767e2e9ca1f" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": "AA" 
}, 
{ 
"_id": { 
"$oid": "542217ffc026b858b47a6640" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": "UA" 
} 
… 
] 
[ 
{ 
"_id": { 
"carriers": "BA" 
}, 
"count": 10 
}, 
{ 
"_id": { 
"carriers": "DL" 
}, 
"count": 10 
} 
… 
] 
airports2 
$unwind $group
Hub airports – try1
Hub airports – try2
Hub airports – try 3 
{ $group: { _id: { ORIGIN: "$ORIGIN", CARRIER: "$CARRIER" }, count: { $sum: 1 } } }, 
{ $project: { airport: "$_id.ORIGIN", carrier: "$_id.CARRIER", "count": 1 } }, 
{ $match: { "count": { $gte: "$$hub_threshold" } } }, 
{ $group: { 
_id: { airport: "$airport" }, 
airlines: { $sum: 1 }, 
flights: { $sum: "$count" }, 
avg_airline: { $avg: "$count" }, 
max_airline: { $max: "$count" } } }, 
{ $project: { 
"airlines": 1, 
"flights": 1, 
"avg_airline": 1, 
"max_airline": 1, 
"avg_no_max": { $divide: [ { $subtract: [ "$flights", "$max_airline" ] }, "$airlines" ] } } }, 
{ $sort: { "flights": -1 } }
Hub airports
From-to Insensitive 
{ $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN", 
DEST: "$DEST" }, count: { $sum: 1 } } }, 
{ $match: { "count": { $gt: "$$count_threshold" } } }, 
{ $project: { _id_UNIQUE_CARRIER: "$_id.UNIQUE_CARRIER", "count": 1, 
rroute: { 
$cond: [ 
{ $lt: [ { $cmp: [ "$_id.ORIGIN", "$_id.DEST" ] }, 0 ] }, 
{ $concat: [ "$_id.ORIGIN", "$_id.DEST" ] }, 
{ $concat: [ "$_id.DEST", "$_id.ORIGIN" ] } 
] } } 
}, 
{ $group: { _id: { _id_UNIQUE_CARRIER: "$_id_UNIQUE_CARRIER", rroute: "$rroute" }, 
_sum_count: { $sum: "$count" } } }
Hub visualization (using routes – from/to, $$count=1, origin treemap)
Using “R” for Advanced Analytics 
• Using a MongoDB driver for “R” 
• Using the JSON Studio Gateway (including using aggregation output) 
install.packages("jSonarR") 
library(’jSonarR') 
con2 <- sonarR::new.SonarConnection('https://localhost:8443', 'localhost', 'flights', port=47017, username="ron", 
pwd=”<pwd>”) 
nyc_by_day <- sonarR::sonarAgg(con2, 'delays_by_day', 'NYCFlights', 
colClasses=c(X_avg_AirTime='numeric', X_avg_ArrDelay='numeric',X_avg_DepDelay='numeric')) 
lm.out = lm(nyc_by_day$X_sum_ArrDelay ~ nyc_by_day$X_sum_AirTime) 
MongoDB
Recommendation engine example: jsonstudio.com
NYC Flights – Quiz Questions 
• Of the three airports, who has the most flights? 
– Nyc1 
• Who has the most cancellations and highest cancellation ratio? 
– Nyc2 
• Taxi in/out times? 
– Nyc3 
• What about delays? 
– Nyc4 
• How do delays differ by month? 
– Nyc5 + nyc5 
– (summer vs. winter / bubble size vs. y-axis) 
• What about weather delays only? Which months are worse? Are the three airports 
equivalent? 
– Nyc7 + nyc7 
• Where can I fly to if I work for Boeing and am very loyal (and on which aicraft)? 
– Nyc8 + map
www.jsonstudio.com 
(download – presentation and eval copy) 
Discount code: MUGTX* 
(* Good for 1 month after event) 
ron@jsonar.com
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

More Related Content

What's hot (20)

MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
Aggregation Framework
Aggregation Framework
MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Querying Nested JSON Data Using N1QL and Couchbase
Querying Nested JSON Data Using N1QL and Couchbase
Brant Burnett
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Data Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
MongoDB
 
Using a mobile phone as a therapist - Superweek 2018
Using a mobile phone as a therapist - Superweek 2018
Peter Meyer
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
Aggregation Framework
Aggregation Framework
MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
Querying Nested JSON Data Using N1QL and Couchbase
Querying Nested JSON Data Using N1QL and Couchbase
Brant Burnett
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Data Governance with JSON Schema
Data Governance with JSON Schema
MongoDB
 
Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
MongoDB
 
Using a mobile phone as a therapist - Superweek 2018
Using a mobile phone as a therapist - Superweek 2018
Peter Meyer
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 

Similar to MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data (20)

Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
Taras Romanyk
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
tristansokol
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
Drupal Mobile
Drupal Mobile
Ruben Teijeiro
 
Agile Testing Days 2018 - API Fundamentals - postman collection
Agile Testing Days 2018 - API Fundamentals - postman collection
JoEllen Carter
 
Peggy elasticsearch應用
Peggy elasticsearch應用
LearningTech
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
Keshav Murthy
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practice
Jano Suchal
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
MongoDB 3.2 - Analytics
MongoDB 3.2 - Analytics
Massimo Brignoli
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
Keshav Murthy
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
Uwe Printz
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Spark - Citi Bike NYC
Spark - Citi Bike NYC
Sushmanth Sagala
 
Database api
Database api
InternetDevels
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
Taras Romanyk
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
tristansokol
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
Agile Testing Days 2018 - API Fundamentals - postman collection
Agile Testing Days 2018 - API Fundamentals - postman collection
JoEllen Carter
 
Peggy elasticsearch應用
Peggy elasticsearch應用
LearningTech
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
Keshav Murthy
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practice
Jano Suchal
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
Keshav Murthy
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
Uwe Printz
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
1403 app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB
 
Ad

Recently uploaded (20)

vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
Supporting the NextGen 911 Digital Transformation with FME
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
High Availability On-Premises FME Flow.pdf
High Availability On-Premises FME Flow.pdf
Safe Software
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

  • 1. Mongo Analytics – Learn aggregation by example Exploratory Analytics and Visualization using Flight Data www.jsonstudio.com
  • 2. Analyzing Flight Data • JSON data imported from CSV downloaded from www.transtats.bts.gov • Sample document for a flight: { "_id": { "$oid": "534205f61c479f6149a92709" }, "YEAR": 2013, "QUARTER": 1, "MONTH": 1, "DAY_OF_MONTH": 18, "DAY_OF_WEEK": 5, "FL_DATE": "2013-01-18”, "UNIQUE_CARRIER": "DL”, "AIRLINE_ID": 19790, "CARRIER": "DL", "TAIL_NUM": "N325US”, "FL_NUM": 1497, "ORIGIN_AIRPORT_ID": 14100, "ORIGIN_AIRPORT_SEQ_ID": 1410002, "ORIGIN_CITY_MARKET_ID": 34100, "ORIGIN": "PHL", "ORIGIN_CITY_NAME": "Philadelphia, PA", "ORIGIN_STATE_ABR": "PA”, "ORIGIN_STATE_FIPS": 42, "DEST_AIRPORT_ID": 13487, "DEST_AIRPORT_SEQ_ID": 1348702, "DEST_CITY_MARKET_ID": 31650, "DEST": "MSP", "DEST_CITY_NAME": "Minneapolis, MN", "DEST_STATE_ABR": "MN", "DEST_STATE_FIPS": 27, "DEST_STATE_NM": "Minnesota", "DEST_WAC": 63, "CRS_DEP_TIME": 805, "DEP_TIME": 758, "DEP_DELAY": -7, "DEP_DELAY_NEW": 0, "DEP_DEL15": 0, "DEP_DELAY_GROUP": -1, "DEP_TIME_BLK": "0800-0859", "TAXI_OUT": 24, "WHEELS_OFF": 822, "WHEELS_ON": 958, "TAXI_IN": 4, "CRS_ARR_TIME": 1015, "ARR_TIME": 1002, "ARR_DELAY": -13, "ARR_DELAY_NEW": 0, "ARR_DEL15": 0, "ARR_DELAY_GROUP": -1, "ARR_TIME_BLK": "1000-1059", "CANCELLED": 0, "CANCELLATION_CODE": "", "DIVERTED": 0, "CRS_ELAPSED_TIME": 190, "ACTUAL_ELAPSED_TIME": 184, "AIR_TIME": 156, "FLIGHTS": 1, "DISTANCE": 980, "DISTANCE_GROUP": 4, "CARRIER_DELAY": "", "WEATHER_DELAY": "", "NAS_DELAY": "", "SECURITY_DELAY": "", "LATE_AIRCRAFT_DELAY": "", "FIRST_DEP_TIME": "", "TOTAL_ADD_GTIME": "", "LONGEST_ADD_GTIME": "", "": "" } • We will build aggregation pipelines and visualize data using JSON Studio (www.jsonstudio.com) • You will fall madly in love with the Aggregation Framework and it’s capabilities
  • 3. MongoDB aggregation steps/stages • Grouping • Matching/filtering • Projection • Sorting • Unwind • Limit, skip • Added in 2.6 – Out – Redact
  • 4. Who are the largest carriers?
  • 5. Some Carrier Stats { "$group": { "_id": { "CARRIER": "$CARRIER" }, "_avg_DEP_DELAY": { "$avg": "$DEP_DELAY" }, "_avg_ARR_DELAY": { "$avg": "$ARR_DELAY" }, "_avg_DISTANCE_GROUP": { "$avg": "$DISTANCE_GROUP" }, "_avg_TAXI_IN": { "$avg": "$TAXI_IN" }, "_avg_TAXI_OUT": { "$avg": "$TAXI_OUT" } } } { "_id": { "CARRIER": "9E" }, "_avg_DEP_DELAY": 8.45451754385965, "_avg_ARR_DELAY": 3.3237368838726744, "_avg_DISTANCE_GROUP": 2.2188688815622624, "_avg_TAXI_IN": 7.082464246424642, "_avg_TAXI_OUT": 20.558167120639663 }
  • 6. Which airports have the most cancellations?
  • 7. Which carriers are most at fault for cancellations?
  • 8. Arrival delays by distance
  • 9. Delays by distance by carrier
  • 10. Delays by distance by carrier – long haul only
  • 11. Words of caution (courtesy of David Weisman)
  • 12. Words of caution (courtesy of David Weisman)
  • 13. What to do? “Touch” the data – e.g. Histograms
  • 14. Words of caution (courtesy of David Weisman)
  • 15. Words of caution (courtesy of David Weisman)
  • 16. Order Does Matter http://docs.mongodb.org/manual/core/aggregation-pipeline-optimization/
  • 17. An example for $unwind Count how many airports each carrier lands in { "_id": { "$oid": "5383623b7bfb8767e2e9ca1f" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": [ "AA", "UA", "DL", "BA" ] } … [ { "_id": { "$oid": "5383623b7bfb8767e2e9ca1f" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": "AA" }, { "_id": { "$oid": "542217ffc026b858b47a6640" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": "UA" } … ] [ { "_id": { "carriers": "BA" }, "count": 10 }, { "_id": { "carriers": "DL" }, "count": 10 } … ] airports2 $unwind $group
  • 20. Hub airports – try 3 { $group: { _id: { ORIGIN: "$ORIGIN", CARRIER: "$CARRIER" }, count: { $sum: 1 } } }, { $project: { airport: "$_id.ORIGIN", carrier: "$_id.CARRIER", "count": 1 } }, { $match: { "count": { $gte: "$$hub_threshold" } } }, { $group: { _id: { airport: "$airport" }, airlines: { $sum: 1 }, flights: { $sum: "$count" }, avg_airline: { $avg: "$count" }, max_airline: { $max: "$count" } } }, { $project: { "airlines": 1, "flights": 1, "avg_airline": 1, "max_airline": 1, "avg_no_max": { $divide: [ { $subtract: [ "$flights", "$max_airline" ] }, "$airlines" ] } } }, { $sort: { "flights": -1 } }
  • 22. From-to Insensitive { $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN", DEST: "$DEST" }, count: { $sum: 1 } } }, { $match: { "count": { $gt: "$$count_threshold" } } }, { $project: { _id_UNIQUE_CARRIER: "$_id.UNIQUE_CARRIER", "count": 1, rroute: { $cond: [ { $lt: [ { $cmp: [ "$_id.ORIGIN", "$_id.DEST" ] }, 0 ] }, { $concat: [ "$_id.ORIGIN", "$_id.DEST" ] }, { $concat: [ "$_id.DEST", "$_id.ORIGIN" ] } ] } } }, { $group: { _id: { _id_UNIQUE_CARRIER: "$_id_UNIQUE_CARRIER", rroute: "$rroute" }, _sum_count: { $sum: "$count" } } }
  • 23. Hub visualization (using routes – from/to, $$count=1, origin treemap)
  • 24. Using “R” for Advanced Analytics • Using a MongoDB driver for “R” • Using the JSON Studio Gateway (including using aggregation output) install.packages("jSonarR") library(’jSonarR') con2 <- sonarR::new.SonarConnection('https://localhost:8443', 'localhost', 'flights', port=47017, username="ron", pwd=”<pwd>”) nyc_by_day <- sonarR::sonarAgg(con2, 'delays_by_day', 'NYCFlights', colClasses=c(X_avg_AirTime='numeric', X_avg_ArrDelay='numeric',X_avg_DepDelay='numeric')) lm.out = lm(nyc_by_day$X_sum_ArrDelay ~ nyc_by_day$X_sum_AirTime) MongoDB
  • 26. NYC Flights – Quiz Questions • Of the three airports, who has the most flights? – Nyc1 • Who has the most cancellations and highest cancellation ratio? – Nyc2 • Taxi in/out times? – Nyc3 • What about delays? – Nyc4 • How do delays differ by month? – Nyc5 + nyc5 – (summer vs. winter / bubble size vs. y-axis) • What about weather delays only? Which months are worse? Are the three airports equivalent? – Nyc7 + nyc7 • Where can I fly to if I work for Boeing and am very loyal (and on which aicraft)? – Nyc8 + map
  • 27. www.jsonstudio.com (download – presentation and eval copy) Discount code: MUGTX* (* Good for 1 month after event) [email protected]