SlideShare a Scribd company logo
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB
Graph Operations
With MongoDB
Agenda
MongoDB
Introduction
01 New Lookup
Operators
03Graph Use &
Concepts
02
Example Scenarios
04 Wrap-up
06Design &
Performance
Considerations
05
MongoDB Introduction
Documents
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Fields
Typed field values
Fields can contain arrays
Number
Query Language
db.collection.find({'city':'London'})
db.collection.find({'profession':{'$in':['banking','trader']}},{'surname':1,'profession':1})
db.collection.find({'cars.year':{'$lte':1968}}).sort({'surname':1}).limit(10)
db.collection.find({'cars.model':'Bentley','cars.year':{'$lt':1966}})
db.collection.find({'cars':{'$elemMatch':{'model':'Bentley','year':{'$lt':1966}}}})
db.collection.find({'location':{'$geoWithin': { '$geometry': {
'type': 'Polygon',
coordinates: [ <array-of-coordinates> ]
}}}})
SecondaryIndexes
compound, geospatial, text, multikey, hashed,
unique, sparse, partial, TTL
Query Language
db.collection.aggregate ( [
{$match:{'profession':{'$in':['banking','trader']}}},
{$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}},
{$group: { ... } },
{$sort: { ... } },
{$limit: { ... } },
{$match: { ... } },
...
] )
Aggregation pipeline
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
Schema Design
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Embed
same
document
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
profession: [‘banking’, ‘finance’, ‘trader’]
}
cars:
{ owner_id: 146
model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ owner_id: 146
model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
Separate
Collection
with reference
Webinar: Working with Graph Data in MongoDB
Functionality Timeline
2.0 – 2.2
Geospatial Polygon support
Aggregation Framework
New 2dsphere index
Aggregation Framework
efficiency optimisations
Full text search
2.4 – 2.6
3.0 – 3.2
Join functionality
Increased geo accuracy
New Aggregation operators
Improved case insensitivity
Recursive graph traversal
Faceted search
Multiple collations
3.4
MongoDB 3.4 - Multi-Model Database
Document
Rich	JSON	Data	Structures
Flexible	Schema
Global	Scale
Relational
Left-Outer	Join
Views
Schema	Validation
Key/Value
Horizontal	Scale
In-Memory
Search
Text	Search
Multiple	Languages
Faceted	Search
Binaries
Files	&	Metadata
Encrypted
Graph
Graph	&	Hierarchical
Recursive	Lookups
GeoSpatial
GeoJSON
2D	&	2DSphere
Graph Use & Concepts
Common Use Cases
• Networks
• Social – circle of friends/colleagues
• Computer network – physical/virtual/application layer
• Mapping / Routes
• Shortest route A to B
• Cybersecurity & Fraud Detection
• Real-time fraud/scam recognition
• Personalisation/Recommendation Engine
• Product, social, service, professional etc.
Graph Key Concepts
• Vertices (nodes)
• Edges (relationships)
• Nodes have properties
• Relationships have name & direction
Relational DBs Lack Relationships
• “Relationships” are actually JOINs
• Raw business or storage logic and constraints – not semantic
• JOIN tables, sparse columns, null-checks
• More JOINS = degraded performance and flexibility
Relational DBs Lack Relationships
• How expensive/complex is:
– Find my friends?
– Find friends of my friends?
– Find mutual friends?
– Find friends of my friends of my friends?
– And so on…
Native Graph Database Strengths
• Relationships are first class citizens of the database
• Index-free adjacency
• Nodes “point” directly to other nodes
• Efficient relationship traversal
Native Graph Database Challenges
• Complex query languages
• Poorly optimized for non-traversal queries
• Difficult to express
• May be memory intensive
• Less often used as System Of Record
• Synchronisation with SOR required
• Increased operational complexity
• Consistency concerns
NoSQL DBs Lack Relationships
• “Flat” disconnected documents or key/value pairs
• “Foreign keys” inferred at application layer
• Data integrity/quality onus is on the application
• Suggestions re difficulty of modeling ANY relationships efficiently with
aggregate stores.
• However…
Friends Network – Document Style
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
Why MongoDB For Graph?
Lookup Operators
$lookup
Syntax
$lookup: {
from: <target lookup collection>,
localField: <field from the input document>,
foreignField: <field from the target collection to connect to>,
as: <field name for resulting array>
}
$graphLookup
Syntax
$graphLookup: {
from: <target lookup collection>,
startWith: <expression for value to start from>,
connectToField: <field name in target collection to connect to>,
connectFromField: <field name in target collection to connect from – recurse from here>,
as: <field name for resulting array>,
maxDepth: <max number of iterations to perform>,
depthField: <field name for number of recursive iterations required to reach this node>,
restrictSearchWithMatch: <match condition to apply to lookup>
}
Things To Note
• startWith value is an expression
• Referencing value of a field requires the ‘$’ prefix
• Can do things like {$toLower: "$name" }
• Handles array fields automatically
• connectToField and connectFromField take field names
• restrictSearchWithMatch takes a standard query expressions
Things To Note
• Cycles are automatically detected
• Can be used with 3.4 views:
• Define a view
• Recurse across existing view (‘base’ or ‘from’)
• Can be used multiple times per Aggregation pipeline
Schema Design – before $graphLookup
• Options
• Store an array of direct children in each node
• Store parent in each node
• Store parent and array of ancestors
• Trade-offs
• Simple queries…
• …vs simple updates
5 13 14 16 176
3 15121094
2 7 8 11
1
• Options
• Store immediate parent in each node
• Store immediate children in each node
• Traverse in multiple directions
• Recurse in same collection
• Join/recurse into another collection
5 13 14 16 176
3 15121094
2 7 8 11
1
Schema Design – with $graphLookup
75%
of use cases*
*based on beta test user feedback
So just how suitable is MongoDB for
the many varied graph use cases I
have then?”
Example Scenarios
Scenario: Calculate Friend Network
{
_id: 0,
name: "Bob Smith",
friends: ["Anna Jones", "Chris Green"]
},
{
_id: 1,
name: "Anna Jones",
friends: ["Bob Smith", "Chris Green", "Joe Lee"]
},
{
_id: 2,
name: "Chris Green",
friends: ["Anna Jones", "Bob Smith"]
}
Scenario: Calculate Friend Network
[
{
$match: { "name": "Bob Smith" }
},
{
$graphLookup: {
from: "contacts",
startWith: "$friends",
connectToField: "name",
connectFromField: "friends”,
as: "socialNetwork"
}
},
{
$project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"}
}
]
This field is an array
No maxDepth set
Scenario: Calculate Friend Network
{
"_id" : 0,
"name" : "Bob Smith",
"friends" : [
"Anna Jones",
"Chris Green"
],
"socialNetwork" : [
"Joe Lee",
"Fred Brown",
"Bob Smith",
"Chris Green",
"Anna Jones"
]
}
Array
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Friends Network - Social
Bob
Smith
Chris
Greenfriends
Anna
Jones
Joe Lee
Recommendation ?
Acme
Soda
Scenario: Determine Air Travel Options
ORD
JFK
BOS
PWM
LHR
{ "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] }
{ "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] }
{ "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] }
{ "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] }
{ "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
Scenario: Determine Air Travel Options
Meet Lucy
{ "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
[
{
"$match": {"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects",
maxDepth: 2,
depthField: "numFlights",
as: "destinations”
}
}
]
Scenario: Determine Air Travel Options
Record the number of
recursions
{
name: "Lucy”,
nearestAirport: "JFK",
destinations: [
{ _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 },
{ _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 },
{ _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 },
{ _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 }
]
}
Scenario: Determine Air Travel Options
How many flights this
would take
ORD
JFK
BOS
PWM
LHR
ATL
Scenario: Determine Air Travel Options
{ "_id" : 0, "airport" : "JFK", "connects" : [
{ "to" : "BOS", "airlines" : [ "UA", "AA" ] },
{ "to" : "ORD", "airlines" : [ "UA", "AA" ] },
{ "to" : "ATL", "airlines" : [ "AA", "DL" ] }] }
{ "_id" : 1, "airport" : "BOS", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA", "AA" ] },
{ "to" : "PWM", "airlines" : [ "AA" ] } ]] }
{ "_id" : 2, "airport" : "ORD", "connects" : [
{ "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] }
{ "_id" : 3, "airport" : "PWM", "connects" : [
{ "to" : "BOS", "airlines" : [ "AA" ] }] }
Scenario: Determine Air Travel Options
[
{
"$match":{"name":"Lucy"}
},
{
"$graphLookup": {
from: "airports",
startWith: "$nearestAirport",
connectToField: "airport",
connectFromField: "connects.to”,
maxDepth: 2,
depthField: "numFlights”,
restrictSearchWithMatch: {"connects.airlines":"UA"},
as: ”UAdestinations"
}
}
]
Scenario: Determine Air Travel Options
We’ve added a filter
{
"name" : "Lucy",
"from" : "JFK",
"UAdestinations" : [
{ "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) },
{ "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) }
]
}
Scenario: Determine Air Travel Options
Scenario: Product Categories
Mugs
Kitchen &
Dining
Commuter &
Travel
Glassware &
Drinkware
Outdoor
Recreation
Camping
Mugs
Running
Thermos
Red Run
Thermos
White Run
Thermos
Blue Run
Thermos
Scenario: Product Categories
Get all children 2 levels deep – flat result
Scenario: Product Categories
Get all children 2 levels deep – nested result
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Scenario: Article Recommendation
1
98
9
1
8
15
7
2
6
8
5
38
4
12
3
4
2
75
Depth 1
Depth 2
Depth 0
43
19
content id
conversion rate
recommendation
Recommendations
for Target #1
Recommendation for
Targets #2 and #3
Target #1 (best)
Target #2
Target #3
Syntax
Syntax
Design & Performance
Considerations
The Tale of Two Biebers
VS
Follower Churn
• Everyone worries about scaling content
• But follow requests can be >> message send rates
• Twitter enforces per day follow limits
Edge Metadata
• Models – friends/followers
• Requirements typically start simple
• Add Groups, Favorites, Relationships
Options for Storing Graphs in MongoDB
Option One – Embedding Edges
Embedded Edge Arrays
• Storing connections with user (popular choice)
üMost compact form
üEfficient for reads
• However….
• User documents grow
• Upper limit on degree (document size)
• Difficult to annotate (and index) edge
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"followers" : [ "jsr", "ian"],
"following" : [ "jsr", "pete"]
}
Embedded Edge Arrays
• Creating Rich Graph Information
• Can become cumbersome
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [
{"uid" : "jsr", "grp" : "school"},
{"uid" : "ian", "grp" : "work"} ]
}
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"friends" : [ "jsr", "ian"],
"group" : [ ”school", ”work"]
}
Option Two – Edge Collection
Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr"
}
> db.friends.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr",
"grp" : "work",
"ts" : Date("2013-07-10")
}
Edge Collection
Indexing Strategies
Finding Followers
Find followers in single edge collection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "jsr"
}
Using index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
Covered index when
searching on "from" for all
followers
Specify only if multiple
edges cannot exist
Finding Following
What about who a user is following?
Could use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
Notice the flipped field
order here
Wait ! There may be an issue with the reverse index…..
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
If we shard this collection by "from",
looking up followers for a specific
user is "targeted" to a shard
To find who the user is following
however, it must scatter-gather the
query to all shards
SHARDING!
Finding Following
Dual Edge Collections
Dual Edge Collections
• When "following" queries are common
• Not always the case
• Consider overhead carefully
• Can use dual collections storing
• One for each direction
• Edges are duplicated reversed
• Can be sharded independently
Wrap-up
MongoDB $graphLookup
• Efficient, index-based recursive queries
• Familiar, MongoDB query language
• Use a single System Of Record
• Cater for all query types
• No added operational overhead
• No synchronization requirements
• Reduced technology surface area
Graph Operations
With MongoDB
Charles Sarrazin
Senior Consulting Engineer, MongoDB

More Related Content

PDF
MongoDB Europe 2016 - Graph Operations with MongoDB
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
PPTX
Apache Arrow: In Theory, In Practice
PPTX
Introduction to MongoDB
PDF
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
PDF
Introduction to MongoDB
PDF
Indexing
PPTX
The Basics of MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
Apache Arrow: In Theory, In Practice
Introduction to MongoDB
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
Introduction to MongoDB
Indexing
The Basics of MongoDB

What's hot (20)

PPTX
Introduction to NoSQL
PDF
Data Modeling with Neo4j
PPT
Fast querying indexing for performance (4)
KEY
JSON-LD: JSON for Linked Data
PDF
Full-on Hypermedia APIs with Hydra
PDF
Inside MongoDB: the Internals of an Open-Source Database
PDF
Working with JSON Data in PostgreSQL vs. MongoDB
PDF
Time Series Data with InfluxDB
PDF
An introduction to MongoDB
PPTX
Indexing with MongoDB
PDF
Linux tuning to improve PostgreSQL performance
PDF
MongoDB World 2019: The Sights (and Smells) of a Bad Query
PDF
MySQL for beginners
PDF
Modern ETL Pipelines with Change Data Capture
PDF
An overview of Neo4j Internals
PDF
Atomicity In Redis: Thomas Hunter
PPTX
MongoDB
PDF
Graph database Use Cases
PPT
7. Key-Value Databases: In Depth
PPTX
elasticsearch_적용 및 활용_정리
Introduction to NoSQL
Data Modeling with Neo4j
Fast querying indexing for performance (4)
JSON-LD: JSON for Linked Data
Full-on Hypermedia APIs with Hydra
Inside MongoDB: the Internals of an Open-Source Database
Working with JSON Data in PostgreSQL vs. MongoDB
Time Series Data with InfluxDB
An introduction to MongoDB
Indexing with MongoDB
Linux tuning to improve PostgreSQL performance
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MySQL for beginners
Modern ETL Pipelines with Change Data Capture
An overview of Neo4j Internals
Atomicity In Redis: Thomas Hunter
MongoDB
Graph database Use Cases
7. Key-Value Databases: In Depth
elasticsearch_적용 및 활용_정리
Ad

Viewers also liked (12)

PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PDF
Using MongoDB as a high performance graph database
PPTX
Building a Directed Graph with MongoDB
PDF
Design, Scale and Performance of MapR's Distribution for Hadoop
PDF
Webinar: 10-Step Guide to Creating a Single View of your Business
PPTX
The Aggregation Framework
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
PDF
Creating a Modern Data Architecture for Digital Transformation
PPTX
Back to Basics Webinar 3: Introduction to Replica Sets
PPTX
Seattle Scalability Meetup - Ted Dunning - MapR
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
PPTX
Back to Basics: My First MongoDB Application
Back to Basics Webinar 1: Introduction to NoSQL
Using MongoDB as a high performance graph database
Building a Directed Graph with MongoDB
Design, Scale and Performance of MapR's Distribution for Hadoop
Webinar: 10-Step Guide to Creating a Single View of your Business
The Aggregation Framework
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
Creating a Modern Data Architecture for Digital Transformation
Back to Basics Webinar 3: Introduction to Replica Sets
Seattle Scalability Meetup - Ted Dunning - MapR
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
Back to Basics: My First MongoDB Application
Ad

Similar to Webinar: Working with Graph Data in MongoDB (20)

PPTX
Geoindexing with MongoDB
PDF
Which Questions We Should Have
PDF
Your Database Cannot Do this (well)
PPTX
MongoDB Schema Design: Practical Applications and Implications
PPTX
Schema Design
KEY
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
ODP
MongoDB - A Document NoSQL Database
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
PPTX
Schema design mongo_boston
PPTX
Webinar: Schema Design
PPTX
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
PDF
MongoDB and Schema Design
KEY
Managing Social Content with MongoDB
PPTX
Conceptos básicos. Seminario web 1: Introducción a NoSQL
KEY
PPTX
Schema Design
PPTX
Intro to MongoDB (Extended Session)
PPTX
Intro to MongoDB Workshop
PDF
MongoDB Atlas Workshop - Singapore
Geoindexing with MongoDB
Which Questions We Should Have
Your Database Cannot Do this (well)
MongoDB Schema Design: Practical Applications and Implications
Schema Design
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
MongoDB - A Document NoSQL Database
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
Schema design mongo_boston
Webinar: Schema Design
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
MongoDB and Schema Design
Managing Social Content with MongoDB
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Schema Design
Intro to MongoDB (Extended Session)
Intro to MongoDB Workshop
MongoDB Atlas Workshop - Singapore

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Machine Learning_overview_presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
A comparative analysis of optical character recognition models for extracting...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Assigned Numbers - 2025 - Bluetooth® Document
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine Learning_overview_presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Webinar: Working with Graph Data in MongoDB

  • 1. Graph Operations With MongoDB Charles Sarrazin Senior Consulting Engineer, MongoDB
  • 2. Charles Sarrazin Senior Consulting Engineer, MongoDB Graph Operations With MongoDB
  • 3. Agenda MongoDB Introduction 01 New Lookup Operators 03Graph Use & Concepts 02 Example Scenarios 04 Wrap-up 06Design & Performance Considerations 05
  • 5. Documents { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays Number
  • 7. Query Language db.collection.aggregate ( [ {$match:{'profession':{'$in':['banking','trader']}}}, {$addFields:{'surnameLower':{$toLower:"$surname"},'prof':{$ifNull:["$prof","Unknown"]}}, {$group: { ... } }, {$sort: { ... } }, {$limit: { ... } }, {$match: { ... } }, ... ] ) Aggregation pipeline
  • 8. Schema Design { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document
  • 9. Schema Design { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Embed same document { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], profession: [‘banking’, ‘finance’, ‘trader’] } cars: { owner_id: 146 model: ‘Bentley’, year: 1973, value: 100000, … }, { owner_id: 146 model: ‘Rolls Royce’, year: 1965, value: 330000, … } Separate Collection with reference
  • 11. Functionality Timeline 2.0 – 2.2 Geospatial Polygon support Aggregation Framework New 2dsphere index Aggregation Framework efficiency optimisations Full text search 2.4 – 2.6 3.0 – 3.2 Join functionality Increased geo accuracy New Aggregation operators Improved case insensitivity Recursive graph traversal Faceted search Multiple collations 3.4
  • 12. MongoDB 3.4 - Multi-Model Database Document Rich JSON Data Structures Flexible Schema Global Scale Relational Left-Outer Join Views Schema Validation Key/Value Horizontal Scale In-Memory Search Text Search Multiple Languages Faceted Search Binaries Files & Metadata Encrypted Graph Graph & Hierarchical Recursive Lookups GeoSpatial GeoJSON 2D & 2DSphere
  • 13. Graph Use & Concepts
  • 14. Common Use Cases • Networks • Social – circle of friends/colleagues • Computer network – physical/virtual/application layer • Mapping / Routes • Shortest route A to B • Cybersecurity & Fraud Detection • Real-time fraud/scam recognition • Personalisation/Recommendation Engine • Product, social, service, professional etc.
  • 15. Graph Key Concepts • Vertices (nodes) • Edges (relationships) • Nodes have properties • Relationships have name & direction
  • 16. Relational DBs Lack Relationships • “Relationships” are actually JOINs • Raw business or storage logic and constraints – not semantic • JOIN tables, sparse columns, null-checks • More JOINS = degraded performance and flexibility
  • 17. Relational DBs Lack Relationships • How expensive/complex is: – Find my friends? – Find friends of my friends? – Find mutual friends? – Find friends of my friends of my friends? – And so on…
  • 18. Native Graph Database Strengths • Relationships are first class citizens of the database • Index-free adjacency • Nodes “point” directly to other nodes • Efficient relationship traversal
  • 19. Native Graph Database Challenges • Complex query languages • Poorly optimized for non-traversal queries • Difficult to express • May be memory intensive • Less often used as System Of Record • Synchronisation with SOR required • Increased operational complexity • Consistency concerns
  • 20. NoSQL DBs Lack Relationships • “Flat” disconnected documents or key/value pairs • “Foreign keys” inferred at application layer • Data integrity/quality onus is on the application • Suggestions re difficulty of modeling ANY relationships efficiently with aggregate stores. • However…
  • 21. Friends Network – Document Style { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 22. Schema Design – before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 23. Why MongoDB For Graph?
  • 26. Syntax $lookup: { from: <target lookup collection>, localField: <field from the input document>, foreignField: <field from the target collection to connect to>, as: <field name for resulting array> }
  • 28. Syntax $graphLookup: { from: <target lookup collection>, startWith: <expression for value to start from>, connectToField: <field name in target collection to connect to>, connectFromField: <field name in target collection to connect from – recurse from here>, as: <field name for resulting array>, maxDepth: <max number of iterations to perform>, depthField: <field name for number of recursive iterations required to reach this node>, restrictSearchWithMatch: <match condition to apply to lookup> }
  • 29. Things To Note • startWith value is an expression • Referencing value of a field requires the ‘$’ prefix • Can do things like {$toLower: "$name" } • Handles array fields automatically • connectToField and connectFromField take field names • restrictSearchWithMatch takes a standard query expressions
  • 30. Things To Note • Cycles are automatically detected • Can be used with 3.4 views: • Define a view • Recurse across existing view (‘base’ or ‘from’) • Can be used multiple times per Aggregation pipeline
  • 31. Schema Design – before $graphLookup • Options • Store an array of direct children in each node • Store parent in each node • Store parent and array of ancestors • Trade-offs • Simple queries… • …vs simple updates 5 13 14 16 176 3 15121094 2 7 8 11 1
  • 32. • Options • Store immediate parent in each node • Store immediate children in each node • Traverse in multiple directions • Recurse in same collection • Join/recurse into another collection 5 13 14 16 176 3 15121094 2 7 8 11 1 Schema Design – with $graphLookup
  • 33. 75% of use cases* *based on beta test user feedback So just how suitable is MongoDB for the many varied graph use cases I have then?”
  • 35. Scenario: Calculate Friend Network { _id: 0, name: "Bob Smith", friends: ["Anna Jones", "Chris Green"] }, { _id: 1, name: "Anna Jones", friends: ["Bob Smith", "Chris Green", "Joe Lee"] }, { _id: 2, name: "Chris Green", friends: ["Anna Jones", "Bob Smith"] }
  • 36. Scenario: Calculate Friend Network [ { $match: { "name": "Bob Smith" } }, { $graphLookup: { from: "contacts", startWith: "$friends", connectToField: "name", connectFromField: "friends”, as: "socialNetwork" } }, { $project: { name: 1, friends:1, socialNetwork: "$socialNetwork.name"} } ] This field is an array No maxDepth set
  • 37. Scenario: Calculate Friend Network { "_id" : 0, "name" : "Bob Smith", "friends" : [ "Anna Jones", "Chris Green" ], "socialNetwork" : [ "Joe Lee", "Fred Brown", "Bob Smith", "Chris Green", "Anna Jones" ] } Array
  • 38. Friends Network - Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ?
  • 39. Friends Network - Social Bob Smith Chris Greenfriends Anna Jones Joe Lee Recommendation ? Acme Soda
  • 40. Scenario: Determine Air Travel Options ORD JFK BOS PWM LHR { "_id" : 0, "airport" : "JFK", "connects" : [ "BOS", "ORD" ] } { "_id" : 1, "airport" : "BOS", "connects" : [ "JFK", "PWM" ] } { "_id" : 2, "airport" : "ORD", "connects" : [ "JFK" ] } { "_id" : 3, "airport" : "PWM", "connects" : [ "BOS", "LHR" ] } { "_id" : 4, "airport" : "LHR", "connects" : [ "PWM" ] }
  • 41. Scenario: Determine Air Travel Options Meet Lucy { "_id" : 0, "name" : "Lucy", "nearestAirport" : "JFK" }
  • 42. [ { "$match": {"name":"Lucy"} }, { "$graphLookup": { from: "airports", startWith: "$nearestAirport", connectToField: "airport", connectFromField: "connects", maxDepth: 2, depthField: "numFlights", as: "destinations” } } ] Scenario: Determine Air Travel Options Record the number of recursions
  • 43. { name: "Lucy”, nearestAirport: "JFK", destinations: [ { _id: 0, airport: "JFK", connects: ["BOS", "ORD"], numFlights: 0 }, { _id: 1, airport: "BOS", connects: ["JFK", "PWM"], numFlights: 1 }, { _id: 2, airport: "ORD", connects: ["JFK"], numFlights: 1 }, { _id: 3, airport: "PWM", connects: ["BOS", "LHR"], numFlights: 2 } ] } Scenario: Determine Air Travel Options How many flights this would take
  • 45. { "_id" : 0, "airport" : "JFK", "connects" : [ { "to" : "BOS", "airlines" : [ "UA", "AA" ] }, { "to" : "ORD", "airlines" : [ "UA", "AA" ] }, { "to" : "ATL", "airlines" : [ "AA", "DL" ] }] } { "_id" : 1, "airport" : "BOS", "connects" : [ { "to" : "JFK", "airlines" : [ "UA", "AA" ] }, { "to" : "PWM", "airlines" : [ "AA" ] } ]] } { "_id" : 2, "airport" : "ORD", "connects" : [ { "to" : "JFK", "airlines" : [ "UA”,"AA" ] }] } { "_id" : 3, "airport" : "PWM", "connects" : [ { "to" : "BOS", "airlines" : [ "AA" ] }] } Scenario: Determine Air Travel Options
  • 46. [ { "$match":{"name":"Lucy"} }, { "$graphLookup": { from: "airports", startWith: "$nearestAirport", connectToField: "airport", connectFromField: "connects.to”, maxDepth: 2, depthField: "numFlights”, restrictSearchWithMatch: {"connects.airlines":"UA"}, as: ”UAdestinations" } } ] Scenario: Determine Air Travel Options We’ve added a filter
  • 47. { "name" : "Lucy", "from" : "JFK", "UAdestinations" : [ { "_id" : 2, "airport" : "ORD", "numFlights" : NumberLong(1) }, { "_id" : 1, "airport" : "BOS", "numFlights" : NumberLong(1) } ] } Scenario: Determine Air Travel Options
  • 48. Scenario: Product Categories Mugs Kitchen & Dining Commuter & Travel Glassware & Drinkware Outdoor Recreation Camping Mugs Running Thermos Red Run Thermos White Run Thermos Blue Run Thermos
  • 49. Scenario: Product Categories Get all children 2 levels deep – flat result
  • 50. Scenario: Product Categories Get all children 2 levels deep – nested result
  • 51. Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth 1 Depth 2 Depth 0 43 19 content id conversion rate recommendation
  • 52. Scenario: Article Recommendation 1 98 9 1 8 15 7 2 6 8 5 38 4 12 3 4 2 75 Depth 1 Depth 2 Depth 0 43 19 content id conversion rate recommendation Recommendations for Target #1 Recommendation for Targets #2 and #3 Target #1 (best) Target #2 Target #3
  • 56. The Tale of Two Biebers VS
  • 57. Follower Churn • Everyone worries about scaling content • But follow requests can be >> message send rates • Twitter enforces per day follow limits
  • 58. Edge Metadata • Models – friends/followers • Requirements typically start simple • Add Groups, Favorites, Relationships
  • 59. Options for Storing Graphs in MongoDB
  • 60. Option One – Embedding Edges
  • 61. Embedded Edge Arrays • Storing connections with user (popular choice) üMost compact form üEfficient for reads • However…. • User documents grow • Upper limit on degree (document size) • Difficult to annotate (and index) edge { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "followers" : [ "jsr", "ian"], "following" : [ "jsr", "pete"] }
  • 62. Embedded Edge Arrays • Creating Rich Graph Information • Can become cumbersome { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ {"uid" : "jsr", "grp" : "school"}, {"uid" : "ian", "grp" : "work"} ] } { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ "jsr", "ian"], "group" : [ ”school", ”work"] }
  • 63. Option Two – Edge Collection
  • 64. Edge Collections • Document per edge • Very flexible for adding edge data > db.followers.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr" } > db.friends.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr", "grp" : "work", "ts" : Date("2013-07-10") }
  • 66. Finding Followers Find followers in single edge collection : > db.followers.find({from : "djw"}, {_id:0, to:1}) { "to" : "jsr" } Using index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } Covered index when searching on "from" for all followers Specify only if multiple edges cannot exist
  • 67. Finding Following What about who a user is following? Could use a reverse covered index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } Notice the flipped field order here Wait ! There may be an issue with the reverse index…..
  • 68. { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard To find who the user is following however, it must scatter-gather the query to all shards SHARDING! Finding Following
  • 70. Dual Edge Collections • When "following" queries are common • Not always the case • Consider overhead carefully • Can use dual collections storing • One for each direction • Edges are duplicated reversed • Can be sharded independently
  • 72. MongoDB $graphLookup • Efficient, index-based recursive queries • Familiar, MongoDB query language • Use a single System Of Record • Cater for all query types • No added operational overhead • No synchronization requirements • Reduced technology surface area
  • 73. Graph Operations With MongoDB Charles Sarrazin Senior Consulting Engineer, MongoDB