SlideShare a Scribd company logo
Data Modeling Deep Dive
Data Modeling: 
Four use cases 
Toji George 
Solutions Architect 
MongoDB Inc.
Agenda 
• 4 Real World Schemas 
– Inbox 
– History 
– Indexed Attributes 
– Multiple Identities 
• Conclusions
In MongoDB 
Application Development requires Good Schema 
Design 
Success comes from Proper Data Structure 
“Schema-less”?
#1 –Message Inbox
Lets get social
Sending Messages 
?
Design Goals 
• Efficiently send new messages to recipients 
• Efficiently read inbox
Reading My Inbox 
?
Three (of many) Approaches 
• Fan out on Read 
• Fan out on Write 
• Fan out on Write with Bucketing
Fan out on read 
// Shard on "from" 
db.shardCollection( "mongodbdays.inbox", { from: 1 } ) 
// Make sure we have an index to handle inbox reads 
db.inbox.ensureIndex( { to: 1, sent: 1 } ) 
msg = { 
from: "Joe", 
to: [ "Bob", "Jane" ], 
sent: new Date(), 
message: "Hi!", 
} 
// Send a message 
db.inbox.save( msg ) 
// Read my inbox 
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read – I/O 
Send 
Message 
Shard 1 Shard 2 Shard 3
Fan out on read – I/O 
Shard 1 Shard 2 Shard 3 
Read Inbox 
Send 
Message
Considerations 
• Write: One document per message sent 
• Read: Find all messages with my own name in 
the recipient field 
• Read: Requires scatter-gather on sharded 
cluster 
• A lot of random I/O on a shard to find everything
Fan out on write 
// Shard on “recipient” and “sent” 
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) 
msg = { 
from: "Joe", 
to: [ "Bob", "Jane" ], 
sent: new Date(), 
message: "Hi!", 
} 
// Send a message 
for ( recipient in msg.to ) { 
msg.recipient = msg.to[recipient] 
db.inbox.save( msg ); 
} 
// Read my inbox 
db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write – I/O 
Send 
Message 
Shard 1 Shard 2 Shard 3
Fan out on write – I/O 
Read Inbox 
Send 
Message 
Shard 1 Shard 2 Shard 3
Considerations 
• Write: One document per recipient 
• Read: Find all of the messages with me as the 
recipient 
• Can shard on recipient, so inbox reads hit one 
shard 
• But still lots of random I/O on the shard
Fan out on write with buckets 
// Shard on "owner / sequence" 
db.shardCollection( "mongodbdays.inbox", 
{ owner: 1, sequence: 1 } ) 
db.shardCollection( "mongodbdays.users", { user_name: 1 } ) 
msg = { 
from: "Joe", 
to: [ "Bob", "Jane" ], 
sent: new Date(), 
message: "Hi!", 
}
Fan out on write with buckets 
// Send a message 
for( recipient in msg.to) { 
count = db.users.findAndModify({ 
query: { user_name: msg.to[recipient] }, 
update: { "$inc": { "msg_count": 1 } }, 
upsert: true, 
new: true }).msg_count; 
sequence = Math.floor(count / 50); 
db.inbox.update({ 
owner: msg.to[recipient], sequence: sequence }, 
{ $push: { "messages": msg } }, 
{ upsert: true } ); 
} 
// Read my inbox 
db.inbox.find( { owner: "Joe" } ) 
.sort ( { sequence: -1 } ).limit( 2 )
Fan out on write with buckets 
• Each “inbox” document is an array of messages 
• Append a message onto “inbox” of recipient 
• Bucket inboxes so there’s not too many 
messages per document 
• Can shard on recipient, so inbox reads hit one 
shard 
• 1 or 2 documents to read the whole inbox
Fan out on write with buckets – I/O 
Send 
Message 
Shard 1 Shard 2 Shard 3
Fan out on write with buckets – I/O 
Shard 1 Shard 2 Shard 3 
Read Inbox 
Send 
Message
#2 - History
Data Modeling Deep Dive
Design Goals 
• Need to retain a limited amount of history e.g. 
– Hours, Days, Weeks 
– May be legislative requirement (e.g. HIPPA, SOX, 
DPA) 
• Need to query efficiently by 
– match 
– ranges
3 (of many) approaches 
• Bucket by Number of messages 
• Fixed size array 
• Bucket by date + TTL collections
Bucket by number of messages 
db.inbox.find() 
{ owner: "Joe", sequence: 25, 
messages: [ 
{ from: "Joe", 
to: [ "Bob", "Jane" ], 
sent: ISODate("2013-03-01T09:59:42.689Z"), 
message: "Hi!" 
}, 
… 
] } 
// Query with a date range 
db.inbox.find ({owner: "friend1", 
messages: { 
$elemMatch: {sent:{$gte: ISODate("…") }}}}) 
// Remove elements based on a date 
db.inbox.update({owner: "friend1" }, 
{ $pull: { messages: { 
sent: { $gte: ISODate("…") } } } } )
Considerations 
• Shrinking documents, space can be reclaimed 
with 
– db.runCommand ( { compact: '<collection>' } ) 
• Removing the document after the last element in 
the array as been removed 
– { "_id" : …, "messages" : [ ], "owner" : 
"friend1", "sequence" : 0 }
Fixed size array 
msg = { 
from: "Your Boss", 
to: [ "Bob" ], 
sent: new Date(), 
message: "CALL ME NOW!" 
} 
// 2.4 Introduces $each, $sort and $slice for $push 
db.messages.update( 
{ _id: 1 }, 
{ $push: { messages: { $each: [ msg ], 
$sort: { sent: 1 }, 
$slice: -50 } 
} 
} 
)
Considerations 
• Need to compute the size of the array based on 
retention period
TTL Collections 
// messages: one doc per user per day 
db.inbox.findOne() 
{ 
_id: 1, 
to: "Joe", 
sequence: ISODate("2013-02-04T00:00:00.392Z"), 
messages: [ ] 
} 
// Auto expires data after 31536000 seconds = 1 year 
db.messages.ensureIndex( { sequence: 1 }, 
{ expireAfterSeconds: 31536000 } )
#3 – Indexed Attributes
Design Goal 
• Application needs to stored a variable number of 
attributes e.g. 
– User defined Form 
– Meta Data tags 
• Queries needed 
– Equality 
– Range based 
• Need to be efficient, regardless of the number of 
attributes
2 (of many) Approaches 
• Attributes as Embedded Document 
• Attributes as Objects in an Array
Attributes as a sub-document 
db.files.insert( { _id: "local.0", 
attr: { type: "text", size: 64, 
created: ISODate("..." } } ) 
db.files.insert( { _id: "local.1", 
attr: { type: "text", size: 128} } ) 
db.files.insert( { _id: "mongod", 
attr: { type: "binary", size: 256, 
created: ISODate("...") } } ) 
// Need to create an index for each item in the sub-document 
db.files.ensureIndex( { "attr.type": 1 } ) 
db.files.find( { "attr.type": "text"} ) 
// Can perform range queries 
db.files.ensureIndex( { "attr.size": 1 } ) 
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Considerations 
• Each attribute needs an Index 
• Each time you extend, you add an index 
• Lots and lots of indexes
Attributes as objects in array 
db.files.insert( {_id: "local.0", 
attr: [ { type: "text" }, 
{ size: 64 }, 
{ created: ISODate("...") } ] } ) 
db.files.insert( { _id: "local.1", 
attr: [ { type: "text" }, 
{ size: 128 } ] } ) 
db.files.insert( { _id: "mongod", 
attr: [ { type: "binary" }, 
{ size: 256 }, 
{ created: ISODate("...") } ] } ) 
db.files.ensureIndex( { attr: 1 } )
Considerations 
• Only one index needed on attr 
• Can support range queries, etc. 
• Index can be used only once per query
#4 –Multiple Identities
Design Goal 
• Ability to look up by a number of different 
identities e.g. 
- Username 
- Email address 
- FB handle 
- LinkedIn URL
2 (of many) approaches 
• Identifiers in a single document 
• Separate Identifiers from Content
Single document by user 
db.users.findOne() 
{ _id: "joe", 
email: "joe@example.com, 
fb: "joe.smith", // facebook 
li: "joe.e.smith", // linkedin 
other: {…} 
} 
// Shard collection by _id 
db.shardCollection("mongodbdays.users", { _id: 1 } ) 
// Create indexes on each key 
db.users.ensureIndex( { email: 1} ) 
db.users.ensureIndex( { fb: 1 } ) 
db.users.ensureIndex( { li: 1 } )
Read by _id (shard key) 
find( { _id: "joe"} ) 
Shard 1 Shard 2 Shard 3
Read by email (non-shard key) 
find ( { email: joe@example.com } ) 
Shard 1 Shard 2 Shard 3
Considerations 
• Lookup by shard key is routed to 1 shard 
• Lookup by other identifier is scatter gathered 
across all shards 
• Secondary keys cannot have a unique index
Document per identity 
// Create unique index 
db.identities.ensureIndex( { identifier : 1} , { unique: true} ) 
// Create a document for each users document 
db.identities.save( 
{ identifier : { hndl: "joe" }, user: "1200-42" } ) 
db.identities.save( 
{ identifier : { email: "joe@abc.com" }, user: "1200-42" } ) 
db.identities.save( 
{ identifier : { li: "joe.e.smith" }, user: "1200-42" } ) 
// Shard collection by _id 
db.shardCollection( "mydb.identities", { identifier : 1 } ) 
// Create unique index 
db.users.ensureIndex( { _id: 1} , { unique: true} ) 
// Shard collection by _id 
db.shardCollection( "mydb.users", { _id: 1 } )
Read requires 2 reads 
db.identities.find({"identifier" : { "hndl" 
: "joe" }}) 
db.users.find( { _id: "1200-42"} ) 
Shard 1 Shard 2 Shard 3
Considerations 
• Lookup to Identities is a routed query 
• Lookup to Users is a routed query 
• Unique indexes available 
• Must do two queries per lookup
Conclusion
Summary 
• Multiple ways to model a domain problem 
• Understand the key uses cases of your app 
• Balance between ease of query vs. ease of write 
• Reduce random I/O where possible for better 
performance
Data Modeling Deep Dive
Ad

Recommended

PPTX
Webinar: Data Modeling Examples in the Real World
MongoDB
 
PPTX
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB
 
PPTX
Data Modeling for the Real World
Mike Friedman
 
PPTX
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB
 
PPTX
MongoDB Advanced Schema Design - Inboxes
Jared Rosoff
 
PDF
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
PPTX
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
PPT
Building web applications with mongo db presentation
Murat Çakal
 
PPTX
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
KEY
Schema Design with MongoDB
rogerbodamer
 
PDF
Mongo DB schema design patterns
joergreichert
 
PDF
Building your first app with mongo db
MongoDB
 
PPTX
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
PDF
MongoDB Schema Design
Alex Litvinok
 
KEY
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
PPTX
Webinar: Schema Design
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
PPTX
Building Your First App: An Introduction to MongoDB
MongoDB
 
PDF
Building Apps with MongoDB
Nate Abele
 
PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB
 
PDF
Learn Learn how to build your mobile back-end with MongoDB
Marakana Inc.
 
PPT
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
PPT
MongoDB Schema Design
MongoDB
 
PPT
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
PPTX
Socialite, the Open Source Status Feed
MongoDB
 
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
PPTX
Seminario web: Simplificando el uso de su base de datos con Atlas
MongoDB
 
PPTX
Advanced Document Modeling Techniques from a High-Scale Commerce Platform
MongoDB
 
PDF
LMSUG 2015 "The Business Behind Microservices: Organisational, Architectural ...
Daniel Bryant
 

More Related Content

What's hot (19)

PPTX
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
KEY
Schema Design with MongoDB
rogerbodamer
 
PDF
Mongo DB schema design patterns
joergreichert
 
PDF
Building your first app with mongo db
MongoDB
 
PPTX
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
PDF
MongoDB Schema Design
Alex Litvinok
 
KEY
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
PPTX
Webinar: Schema Design
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
PPTX
Building Your First App: An Introduction to MongoDB
MongoDB
 
PDF
Building Apps with MongoDB
Nate Abele
 
PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB
 
PDF
Learn Learn how to build your mobile back-end with MongoDB
Marakana Inc.
 
PPT
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
PPT
MongoDB Schema Design
MongoDB
 
PPT
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
PPTX
Socialite, the Open Source Status Feed
MongoDB
 
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Schema Design with MongoDB
rogerbodamer
 
Mongo DB schema design patterns
joergreichert
 
Building your first app with mongo db
MongoDB
 
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
MongoDB Schema Design
Alex Litvinok
 
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Webinar: Schema Design
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Building Your First App: An Introduction to MongoDB
MongoDB
 
Building Apps with MongoDB
Nate Abele
 
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
MongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Marakana Inc.
 
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
MongoDB Schema Design
MongoDB
 
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
Socialite, the Open Source Status Feed
MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 

Viewers also liked (20)

PPTX
Seminario web: Simplificando el uso de su base de datos con Atlas
MongoDB
 
PPTX
Advanced Document Modeling Techniques from a High-Scale Commerce Platform
MongoDB
 
PDF
LMSUG 2015 "The Business Behind Microservices: Organisational, Architectural ...
Daniel Bryant
 
PDF
Evolution of The Twitter Stack
Chris Aniszczyk
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PPTX
Windows Server and Docker - The Internals Behind Bringing Docker and Containe...
Docker, Inc.
 
PPT
BuzzFeed Pitch Deck
Tech in Asia ID
 
PDF
Contently Pitch Deck
Ryan Gum
 
PDF
Pendo Series B Investor Deck External
Todd Olson
 
PDF
Tinder Pitch Deck
Ryan Gum
 
PDF
Airbnb Pitch Deck From 2008
Ryan Gum
 
PDF
Intercom's first pitch deck!
Eoghan McCabe
 
PDF
Front series A deck
Mathilde Collin
 
PDF
Mattermark 2nd (Final) Series A Deck
Danielle Morrill
 
PDF
The Enterprise Case for Node.js
NodejsFoundation
 
PDF
The investor presentation we used to raise 2 million dollars
Mikael Cho
 
PPTX
Foursquare's 1st Pitch Deck
Rami Al-Karmi
 
PDF
Linkedin Series B Pitch Deck
Joseph Hsieh
 
PDF
Mixpanel - Our pitch deck that we used to raise $65M
Suhail Doshi
 
PDF
The slide deck we used to raise half a million dollars
Buffer
 
Seminario web: Simplificando el uso de su base de datos con Atlas
MongoDB
 
Advanced Document Modeling Techniques from a High-Scale Commerce Platform
MongoDB
 
LMSUG 2015 "The Business Behind Microservices: Organisational, Architectural ...
Daniel Bryant
 
Evolution of The Twitter Stack
Chris Aniszczyk
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Windows Server and Docker - The Internals Behind Bringing Docker and Containe...
Docker, Inc.
 
BuzzFeed Pitch Deck
Tech in Asia ID
 
Contently Pitch Deck
Ryan Gum
 
Pendo Series B Investor Deck External
Todd Olson
 
Tinder Pitch Deck
Ryan Gum
 
Airbnb Pitch Deck From 2008
Ryan Gum
 
Intercom's first pitch deck!
Eoghan McCabe
 
Front series A deck
Mathilde Collin
 
Mattermark 2nd (Final) Series A Deck
Danielle Morrill
 
The Enterprise Case for Node.js
NodejsFoundation
 
The investor presentation we used to raise 2 million dollars
Mikael Cho
 
Foursquare's 1st Pitch Deck
Rami Al-Karmi
 
Linkedin Series B Pitch Deck
Joseph Hsieh
 
Mixpanel - Our pitch deck that we used to raise $65M
Suhail Doshi
 
The slide deck we used to raise half a million dollars
Buffer
 
Ad

Similar to Data Modeling Deep Dive (20)

PPTX
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
PPTX
Data Modeling Examples from the Real World
MongoDB
 
PPTX
Choosing a Shard key
MongoDB
 
PPTX
Schema Design - Real world use case
Matias Cascallares
 
PDF
Mongodb in-anger-boston-rb-2011
bostonrb
 
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
PDF
MongoDB and Schema Design
Matias Cascallares
 
KEY
2011 mongo sf-schemadesign
MongoDB
 
PPTX
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
KEY
Scaling with MongoDB
MongoDB
 
KEY
Managing Social Content with MongoDB
MongoDB
 
KEY
Schema Design (Mongo Austin)
MongoDB
 
KEY
2012 phoenix mug
Paul Pedersen
 
PPTX
Webinar: Schema Design
MongoDB
 
KEY
Schema design
christkv
 
PPTX
Schema design mongo_boston
MongoDB
 
PDF
Getting Started with MongoDB: 4 Application Designs
DATAVERSITY
 
PPTX
MongoDB Schema Design -- Inboxes
Jeremy Taylor
 
PPTX
Schema Design
MongoDB
 
PPTX
Schema Design
MongoDB
 
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
Data Modeling Examples from the Real World
MongoDB
 
Choosing a Shard key
MongoDB
 
Schema Design - Real world use case
Matias Cascallares
 
Mongodb in-anger-boston-rb-2011
bostonrb
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
MongoDB and Schema Design
Matias Cascallares
 
2011 mongo sf-schemadesign
MongoDB
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Scaling with MongoDB
MongoDB
 
Managing Social Content with MongoDB
MongoDB
 
Schema Design (Mongo Austin)
MongoDB
 
2012 phoenix mug
Paul Pedersen
 
Webinar: Schema Design
MongoDB
 
Schema design
christkv
 
Schema design mongo_boston
MongoDB
 
Getting Started with MongoDB: 4 Application Designs
DATAVERSITY
 
MongoDB Schema Design -- Inboxes
Jeremy Taylor
 
Schema Design
MongoDB
 
Schema Design
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PDF
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
PPTX
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
PDF
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
PDF
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PPTX
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PDF
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
PDF
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PDF
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
PDF
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 

Data Modeling Deep Dive

  • 2. Data Modeling: Four use cases Toji George Solutions Architect MongoDB Inc.
  • 3. Agenda • 4 Real World Schemas – Inbox – History – Indexed Attributes – Multiple Identities • Conclusions
  • 4. In MongoDB Application Development requires Good Schema Design Success comes from Proper Data Structure “Schema-less”?
  • 8. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  • 10. Three (of many) Approaches • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  • 11. Fan out on read // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
  • 12. Fan out on read – I/O Send Message Shard 1 Shard 2 Shard 3
  • 13. Fan out on read – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  • 14. Considerations • Write: One document per message sent • Read: Find all messages with my own name in the recipient field • Read: Requires scatter-gather on sharded cluster • A lot of random I/O on a shard to find everything
  • 15. Fan out on write // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = msg.to[recipient] db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
  • 16. Fan out on write – I/O Send Message Shard 1 Shard 2 Shard 3
  • 17. Fan out on write – I/O Read Inbox Send Message Shard 1 Shard 2 Shard 3
  • 18. Considerations • Write: One document per recipient • Read: Find all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random I/O on the shard
  • 19. Fan out on write with buckets // Shard on "owner / sequence" db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", }
  • 20. Fan out on write with buckets // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ) .sort ( { sequence: -1 } ).limit( 2 )
  • 21. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  • 22. Fan out on write with buckets – I/O Send Message Shard 1 Shard 2 Shard 3
  • 23. Fan out on write with buckets – I/O Shard 1 Shard 2 Shard 3 Read Inbox Send Message
  • 26. Design Goals • Need to retain a limited amount of history e.g. – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) • Need to query efficiently by – match – ranges
  • 27. 3 (of many) approaches • Bucket by Number of messages • Fixed size array • Bucket by date + TTL collections
  • 28. Bucket by number of messages db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } )
  • 29. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  • 30. Fixed size array msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } )
  • 31. Considerations • Need to compute the size of the array based on retention period
  • 32. TTL Collections // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )
  • 33. #3 – Indexed Attributes
  • 34. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  • 35. 2 (of many) Approaches • Attributes as Embedded Document • Attributes as Objects in an Array
  • 36. Attributes as a sub-document db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } ) db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
  • 37. Considerations • Each attribute needs an Index • Each time you extend, you add an index • Lots and lots of indexes
  • 38. Attributes as objects in array db.files.insert( {_id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("...") } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("...") } ] } ) db.files.ensureIndex( { attr: 1 } )
  • 39. Considerations • Only one index needed on attr • Can support range queries, etc. • Index can be used only once per query
  • 41. Design Goal • Ability to look up by a number of different identities e.g. - Username - Email address - FB handle - LinkedIn URL
  • 42. 2 (of many) approaches • Identifiers in a single document • Separate Identifiers from Content
  • 43. Single document by user db.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } )
  • 44. Read by _id (shard key) find( { _id: "joe"} ) Shard 1 Shard 2 Shard 3
  • 45. Read by email (non-shard key) find ( { email: [email protected] } ) Shard 1 Shard 2 Shard 3
  • 46. Considerations • Lookup by shard key is routed to 1 shard • Lookup by other identifier is scatter gathered across all shards • Secondary keys cannot have a unique index
  • 47. Document per identity // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mydb.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Shard collection by _id db.shardCollection( "mydb.users", { _id: 1 } )
  • 48. Read requires 2 reads db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} ) Shard 1 Shard 2 Shard 3
  • 49. Considerations • Lookup to Identities is a routed query • Lookup to Users is a routed query • Unique indexes available • Must do two queries per lookup
  • 51. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Reduce random I/O where possible for better performance