SlideShare a Scribd company logo
Technical Director, 10gen
@jonnyeight alvin@10gen.com alvinonmongodb.com
Alvin Richards
#MongoDBdays
Schema Design
3 Real World Use Cases
I'm planning a Trip to LA…
Single Table En
Agenda
• Why is schema design important
• 3 Real World Schemas
– Inbox
– IndexedAttributes
– Multiple Identities
• Conclusions
Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RBMS – "What answers do I have?"
• MongoDB – "What question will I have?"
• Must consider use case with schema
#1 - Message Inbox
Let’s get
Social
Sending Messages
?
Reading my Inbox
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
Fan out on read – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
db.inbox.save(
{ to: [ "Bob", "Jane" ], … } )
Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read
Inbox
db.inbox.find( { to: "Bob" } )
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: "Bob" } ).sort( { sent: -1 } )
Fan out on read
Considerations
1 document per message sent
Multiple recipients in an array key
Reading inbox finds all messages with my own
name in the recipient field
✖Requires scatter-gather on sharded cluster
✖Then a lot of random IO on a shard to find
everything
Fan out on write – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
db.inbox.save(
{ to: "Bob", …} )
Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read
Inbox
db.inbox.find( { to: "Bob" } )
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe”,
recipient: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for ( recipient in msg.recipient ) {
msg.to = recipient
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on write
Considerations
✖1 document per recipient per message
Reading inbox is finding all of the messages with
me as the recipient
Can shard on recipient, so inbox reads hit one
shard
✖But still lots of random IO on the shard
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there’s not too many
per document
• Can shard on recipient, so inbox reads hit one
shard
• A few documents to read the whole inbox
Bucketed fan out on write -
Send
Shard 1 Shard 2 Shard 3
Send
Message
db.inbox.update(
{ to: "Bob"}, { $push: { msg: … } }
)
Bucketed fan out on write -
Read
Shard 1 Shard 2 Shard 3
Read
Inbox
db.inbox.find( { to: "Bob" } )
// Shard on “owner / sequence”
db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for( recipient in msg.to) {
count = db.users.findAndModify({
query: { user_name: msg.to[recipient] },
update: { "$inc": { "msg_count": 1 } },
upsert: true,
new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update( { to: msg.to[recipient], sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true } );
}
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )
Fan out on write – with
buckets
Considerations
Fewer documents per recipient
Reading inbox is just finding a few buckets
Can shard on recipient, so inbox reads hit one
shard
✖But still some random IO on the shard
But…
• What if I do not / cannot retain all history?
– Space limited: Hours, Days, Weeks, $$$
– Legislative limited: HIPPA, SOX, DPA
3 Approaches (there are
more)
• Bucket by Number of messages – just seen
that
• Fixed size Array
• Bucket by Date + TTL Collections
// Query with a date range
db.inbox.find ( { owner: "Joe",
messages: {
$elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})
// Remove elements based on a date
db.inbox.update( { owner: "Joe" },
{ $pull:
{ messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )
Inbox – Bucket by #
messages
Considerations
Limited to a known range of messages
✖Shrinking documents
• space can be reclaimed with
db.runCommand ( { compact: '<collection>' } )
✖Removing the document after the last element
in the array as been removed
– { "_id" : …, "messages" : [ ], "owner" :
"friend1", "sequence" : 0 }
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50
}
}
}
)
Maintain the latest – Fixed
Size Array
Push this object
onto the array
Sort the resulting
array by "sent"
Limit the array to
50 elements
Considerations
 Limited to a known # of messages
✖Need to compute the size of the array based on
retention period
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
TTL Collections
Considerations
 Limited to a known range of messages
 Automatic purge of expired data
No need to have a CRON task, etc. to do this
✖ Per Collection basis
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
2 Approaches (there are
more)
• Attributes
• Attributes as Objects in an Array
// Flexible set of attributes
db.files.insert( { _id:"mongod",
attr: { type: "binary", size: 256,
created: ISODate("2013-04-01T18:13:42.689Z") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes
Considerations
Attributes can be queried via an Index
Equality & Range queries supported
✖Each attribute needs an Index
✖Each time you extend, you add an index
✖Single index is used (unless you have $or)
// Flexible set of attributes, each attribute is an object
db.files.insert( { _id: "mongod",
attr: [ { type: "binary" },
{ size: 256 },
{ created: ISODate("2013-04-01T18:13:42.689Z") } ] } )
db.files.ensureIndex( { attr: 1 } )
Attributes as Objects in Array
// Range queries
db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )
db.files.find( { attr:
{ $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )
// Multiple condition – Only the first predicate on the query can use the Index
// ensure that this is the most selective.
// Index Intersection will allow multiple indexes, see SERVER-3071
db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } },
{ attr: { $gt: { size:128 }, $lte: { size: 16384 } } }
] } )
// Each $or can use an index
db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } },
{ attr: { $gt: { size:128 }, $lte: { size: 16384 } } }
] } )
Queries
Considerations
 Attributes can be queried via a Single index
 New attributes do not need extra Indexes
 Equality & Range queries supported
✖ $and can only use a Single Index
#3 – Multiple Identities
Design Goal
• Ability to look up by a number of different
identities e.g.
• Username
• Email address
• FB Handle
• LinkedIn URL
2 Approaches (there are
more)
• Multiple Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne()
{ _id: "joe",
email: "joe@example.com,
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
other: {…}
}
// Shard collection by _id
db.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each key
db.users.ensureIndex( { email: 1} )
db.users.ensureIndex( { fb: 1 } )
db.users.ensureIndex( { li: 1 } )
Single Document by User
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Read by email (non-shard
key)
Shard 1 Shard 2 Shard 3
find ( { email: joe@example.com }
)
Considerations
 Lookup by shard key is routed to 1 shard
✖ Lookup by other identifier is scatter gathered
across all shards
✖ Secondary keys cannot have a unique index
// Create a document that holds all the other user attributes
db.users.save( { _id: "1200-42", ... } )
// Shard collection by _id
db.shardCollection( "mongodbdays.users", { _id: 1 } )
// Create a document for each users document
db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )
db.identities.save( { identifier : { email: "joe@example.com" }, user: "1200-42" } )
db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id
db.shardCollection( "mongodbdays.identities", { identifier : 1 } )
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
db.users.ensureIndex( { _id: 1} , { unique: true} )
Document per Identity
Read requires 2 queries
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : {
"hndl" : "joe" }})
db.users.find( { _id: "1200-42"}
)
Considerations
 Multiple queries, but always routed
 Lookup to Identities is a routed query
 Lookup to Users is a routed query
 Unique indexes available
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Avoid Random IO
• Avoid Scatter / Gather query pattern
Technical Director, 10gen
@jonnyeight alvin@10gen.com alvinonmongodb.com
Alvin Richards
#MongoDBdays
Thank You
Ad

Recommended

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
Jared Rosoff
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
Dev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best Practices
MongoDB
 
Mongo DB schema design patterns
Mongo DB schema design patterns
joergreichert
 
Data Modeling for the Real World
Data Modeling for the Real World
Mike Friedman
 
MongoDB Schema Design
MongoDB Schema Design
Alex Litvinok
 
Webinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real World
MongoDB
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
hungarianhc
 
Schema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
Data Modeling Deep Dive
Data Modeling Deep Dive
MongoDB
 
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Building web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
Webinar: Schema Design
Webinar: Schema Design
MongoDB
 
Building your first app with mongo db
Building your first app with mongo db
MongoDB
 
MongoDB Schema Design
MongoDB Schema Design
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
 
Building Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 
Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 
Building Apps with MongoDB
Building Apps with MongoDB
Nate Abele
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
MongoDB
 
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
MongoDB
 
Choosing a Shard key
Choosing a Shard key
MongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real World
MongoDB
 

More Related Content

What's hot (19)

Schema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
Data Modeling Deep Dive
Data Modeling Deep Dive
MongoDB
 
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Building web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
Webinar: Schema Design
Webinar: Schema Design
MongoDB
 
Building your first app with mongo db
Building your first app with mongo db
MongoDB
 
MongoDB Schema Design
MongoDB Schema Design
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
 
Building Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 
Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 
Building Apps with MongoDB
Building Apps with MongoDB
Nate Abele
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
MongoDB
 
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
MongoDB
 
Schema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
Data Modeling Deep Dive
Data Modeling Deep Dive
MongoDB
 
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Building web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDB
Stennie Steneker
 
Webinar: Schema Design
Webinar: Schema Design
MongoDB
 
Building your first app with mongo db
Building your first app with mongo db
MongoDB
 
MongoDB Schema Design
MongoDB Schema Design
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
 
Building Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata Catalog
hungarianhc
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 
Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 
Building Apps with MongoDB
Building Apps with MongoDB
Nate Abele
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
MongoDB
 
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
MongoDB
 

Similar to MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen (20)

Choosing a Shard key
Choosing a Shard key
MongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real World
MongoDB
 
Schema Design - Real world use case
Schema Design - Real world use case
Matias Cascallares
 
Managing Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
Mihail Mateev
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
rogerbodamer
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
MongoDB
 
Mongodb intro
Mongodb intro
christkv
 
About elasticsearch
About elasticsearch
Minsoo Jun
 
Starting with MongoDB
Starting with MongoDB
DoThinger
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
Mike Dirolf
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Superficial mongo db
Superficial mongo db
DaeMyung Kang
 
Schema design
Schema design
christkv
 
Full metal mongo
Full metal mongo
Israel Gutiérrez
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
MongoDB
 
Choosing a Shard key
Choosing a Shard key
MongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
Lewis Lin 🦊
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real World
MongoDB
 
Schema Design - Real world use case
Schema Design - Real world use case
Matias Cascallares
 
Managing Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
Mihail Mateev
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
rogerbodamer
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
MongoDB
 
Mongodb intro
Mongodb intro
christkv
 
About elasticsearch
About elasticsearch
Minsoo Jun
 
Starting with MongoDB
Starting with MongoDB
DoThinger
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
Mike Dirolf
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Superficial mongo db
Superficial mongo db
DaeMyung Kang
 
Schema design
Schema design
christkv
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
MongoDB
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (17)

Jordan Minnesota - The Town Where Your Diet Comes to Die
Jordan Minnesota - The Town Where Your Diet Comes to Die
Forklift Trucks in Minnesota
 
tiranga ritik baclink indexing on google
tiranga ritik baclink indexing on google
Jalwa Game
 
原版一样(IWU毕业证书)美国印第安纳卫斯里大学毕业证在线购买
原版一样(IWU毕业证书)美国印第安纳卫斯里大学毕业证在线购买
taqyed
 
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin
 
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
VenkatDeepakSarma
 
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭UANE学费单
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭UANE学费单
Taqyea
 
The Bet - Concept Teaser v06 Storyboards
The Bet - Concept Teaser v06 Storyboards
Jim Mortensen
 
SpeakOut_TeachersBook_Advanced_PlusEdition.pdf
SpeakOut_TeachersBook_Advanced_PlusEdition.pdf
agustolosa93
 
16 Billions Google Leaked Password Alert in 2025
16 Billions Google Leaked Password Alert in 2025
Harshh Goel
 
Top 1 app watch girls livestream (1).docx
Top 1 app watch girls livestream (1).docx
jonhsey0009
 
Hasta la vista sota vita la humour.pptx
Hasta la vista sota vita la humour.pptx
JohnAsir4
 
Strategy & Survival in Aliens Another Glorious Day in the Corps!
Strategy & Survival in Aliens Another Glorious Day in the Corps!
BoardGamesNMore
 
silver_linings_playbook the movie the movie
silver_linings_playbook the movie the movie
VernonSmap
 
仿制CSUS学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,CSUS文凭
仿制CSUS学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,CSUS文凭
taqyed
 
Breaking the Romance Narrative – Why I Wrote “Hello”
Breaking the Romance Narrative – Why I Wrote “Hello”
itstriggerhere
 
Rice Genomics & Whole Genome Sequencing.pptx
Rice Genomics & Whole Genome Sequencing.pptx
LikhithHR
 
办理学历认证USC学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,USC成绩单防伪
办理学历认证USC学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,USC成绩单防伪
Taqyea
 
Jordan Minnesota - The Town Where Your Diet Comes to Die
Jordan Minnesota - The Town Where Your Diet Comes to Die
Forklift Trucks in Minnesota
 
tiranga ritik baclink indexing on google
tiranga ritik baclink indexing on google
Jalwa Game
 
原版一样(IWU毕业证书)美国印第安纳卫斯里大学毕业证在线购买
原版一样(IWU毕业证书)美国印第安纳卫斯里大学毕业证在线购买
taqyed
 
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin
 
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
VenkatDeepakSarma
 
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭UANE学费单
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭UANE学费单
Taqyea
 
The Bet - Concept Teaser v06 Storyboards
The Bet - Concept Teaser v06 Storyboards
Jim Mortensen
 
SpeakOut_TeachersBook_Advanced_PlusEdition.pdf
SpeakOut_TeachersBook_Advanced_PlusEdition.pdf
agustolosa93
 
16 Billions Google Leaked Password Alert in 2025
16 Billions Google Leaked Password Alert in 2025
Harshh Goel
 
Top 1 app watch girls livestream (1).docx
Top 1 app watch girls livestream (1).docx
jonhsey0009
 
Hasta la vista sota vita la humour.pptx
Hasta la vista sota vita la humour.pptx
JohnAsir4
 
Strategy & Survival in Aliens Another Glorious Day in the Corps!
Strategy & Survival in Aliens Another Glorious Day in the Corps!
BoardGamesNMore
 
silver_linings_playbook the movie the movie
silver_linings_playbook the movie the movie
VernonSmap
 
仿制CSUS学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,CSUS文凭
仿制CSUS学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,CSUS文凭
taqyed
 
Breaking the Romance Narrative – Why I Wrote “Hello”
Breaking the Romance Narrative – Why I Wrote “Hello”
itstriggerhere
 
Rice Genomics & Whole Genome Sequencing.pptx
Rice Genomics & Whole Genome Sequencing.pptx
LikhithHR
 
办理学历认证USC学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,USC成绩单防伪
办理学历认证USC学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,USC成绩单防伪
Taqyea
 

MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

  • 1. Technical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com Alvin Richards #MongoDBdays Schema Design 3 Real World Use Cases
  • 2. I'm planning a Trip to LA…
  • 3. Single Table En Agenda • Why is schema design important • 3 Real World Schemas – Inbox – IndexedAttributes – Multiple Identities • Conclusions
  • 4. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • RBMS – "What answers do I have?" • MongoDB – "What question will I have?" • Must consider use case with schema
  • 5. #1 - Message Inbox
  • 9. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  • 10. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  • 11. Fan out on read – Send Message Shard 1 Shard 2 Shard 3 Send Message db.inbox.save( { to: [ "Bob", "Jane" ], … } )
  • 12. Fan out on read – Inbox Read Shard 1 Shard 2 Shard 3 Read Inbox db.inbox.find( { to: "Bob" } )
  • 13. // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Bob" } ).sort( { sent: -1 } ) Fan out on read
  • 14. Considerations 1 document per message sent Multiple recipients in an array key Reading inbox finds all messages with my own name in the recipient field ✖Requires scatter-gather on sharded cluster ✖Then a lot of random IO on a shard to find everything
  • 15. Fan out on write – Send Message Shard 1 Shard 2 Shard 3 Send Message db.inbox.save( { to: "Bob", …} )
  • 16. Fan out on write– Read Inbox Shard 1 Shard 2 Shard 3 Read Inbox db.inbox.find( { to: "Bob" } )
  • 17. // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe”, recipient: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.recipient ) { msg.to = recipient db.inbox.save( msg ); } // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } ) Fan out on write
  • 18. Considerations ✖1 document per recipient per message Reading inbox is finding all of the messages with me as the recipient Can shard on recipient, so inbox reads hit one shard ✖But still lots of random IO on the shard
  • 19. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inbox documents so there’s not too many per document • Can shard on recipient, so inbox reads hit one shard • A few documents to read the whole inbox
  • 20. Bucketed fan out on write - Send Shard 1 Shard 2 Shard 3 Send Message db.inbox.update( { to: "Bob"}, { $push: { msg: … } } )
  • 21. Bucketed fan out on write - Read Shard 1 Shard 2 Shard 3 Read Inbox db.inbox.find( { to: "Bob" } )
  • 22. // Shard on “owner / sequence” db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update( { to: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { to: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 ) Fan out on write – with buckets
  • 23. Considerations Fewer documents per recipient Reading inbox is just finding a few buckets Can shard on recipient, so inbox reads hit one shard ✖But still some random IO on the shard
  • 24. But… • What if I do not / cannot retain all history? – Space limited: Hours, Days, Weeks, $$$ – Legislative limited: HIPPA, SOX, DPA
  • 25. 3 Approaches (there are more) • Bucket by Number of messages – just seen that • Fixed size Array • Bucket by Date + TTL Collections
  • 26. // Query with a date range db.inbox.find ( { owner: "Joe", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}}) // Remove elements based on a date db.inbox.update( { owner: "Joe" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } ) Inbox – Bucket by # messages
  • 27. Considerations Limited to a known range of messages ✖Shrinking documents • space can be reclaimed with db.runCommand ( { compact: '<collection>' } ) ✖Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
  • 28. msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Maintain the latest – Fixed Size Array Push this object onto the array Sort the resulting array by "sent" Limit the array to 50 elements
  • 29. Considerations  Limited to a known # of messages ✖Need to compute the size of the array based on retention period
  • 30. // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) TTL Collections
  • 31. Considerations  Limited to a known range of messages  Automatic purge of expired data No need to have a CRON task, etc. to do this ✖ Per Collection basis
  • 32. #3 – Indexed Attributes
  • 33. Design Goal • Application needs to stored a variable number of attributes e.g. – User defined Form – Meta Data tags • Queries needed – Equality – Range based • Need to be efficient, regardless of the number of attributes
  • 34. 2 Approaches (there are more) • Attributes • Attributes as Objects in an Array
  • 35. // Flexible set of attributes db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} ) // Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } ) Attributes
  • 36. Considerations Attributes can be queried via an Index Equality & Range queries supported ✖Each attribute needs an Index ✖Each time you extend, you add an index ✖Single index is used (unless you have $or)
  • 37. // Flexible set of attributes, each attribute is an object db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } ) db.files.ensureIndex( { attr: 1 } ) Attributes as Objects in Array
  • 38. // Range queries db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } ) db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } ) // Multiple condition – Only the first predicate on the query can use the Index // ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071 db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } ) // Each $or can use an index db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } ) Queries
  • 39. Considerations  Attributes can be queried via a Single index  New attributes do not need extra Indexes  Equality & Range queries supported ✖ $and can only use a Single Index
  • 40. #3 – Multiple Identities
  • 41. Design Goal • Ability to look up by a number of different identities e.g. • Username • Email address • FB Handle • LinkedIn URL
  • 42. 2 Approaches (there are more) • Multiple Identifiers in a single document • Separate Identifiers from Content
  • 43. db.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } ) Single Document by User
  • 44. Read by _id (shard key) Shard 1 Shard 2 Shard 3 find( { _id: "joe"} )
  • 45. Read by email (non-shard key) Shard 1 Shard 2 Shard 3 find ( { email: [email protected] } )
  • 46. Considerations  Lookup by shard key is routed to 1 shard ✖ Lookup by other identifier is scatter gathered across all shards ✖ Secondary keys cannot have a unique index
  • 47. // Create a document that holds all the other user attributes db.users.save( { _id: "1200-42", ... } ) // Shard collection by _id db.shardCollection( "mongodbdays.users", { _id: 1 } ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mongodbdays.identities", { identifier : 1 } ) // Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) db.users.ensureIndex( { _id: 1} , { unique: true} ) Document per Identity
  • 48. Read requires 2 queries Shard 1 Shard 2 Shard 3 db.identities.find({"identifier" : { "hndl" : "joe" }}) db.users.find( { _id: "1200-42"} )
  • 49. Considerations  Multiple queries, but always routed  Lookup to Identities is a routed query  Lookup to Users is a routed query  Unique indexes available
  • 51. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Avoid Random IO • Avoid Scatter / Gather query pattern
  • 52. Technical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com Alvin Richards #MongoDBdays Thank You