Schema design

Schema Design

Christian Kvalheim - christkv@10gen.com

Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents

Topics
Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents

Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues

Ways to model data:

http://www.ﬂickr.com/photos/42304632@N00/493639870/

Terminology
RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Schema-design criteria
How can we manipulate Access Patterns?
this data?
• Dynamic Queries • Read / Write Ratio
• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
• Aggregation (coming soon)
Considerations

• No Joins
• Document writes are atomic

A simple start
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}

> db.blog.save(post)

Map the documents to your application.

Find the document
> db.blog.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text: "Destination Moon",
tags: [ "comic", "adventure" ]
}

Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied

Add an index, ﬁnd via index
> db.blog.ensureIndex({author: 1})
> db.blog.find({author: 'Hergé'})

author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

Secondary index on "author"

Examine the query plan
> db.blogs.find({"author": 'Hergé'}).explain()
{
"cursor" : "BtreeCursor author_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"indexBounds" : {
"author" : [
[
"Hergé",
"Hergé"
]
]
}
}

Multi-key indexes
// Build an index on the 'tags' array
> db.blog.ensureIndex({tags: 1})

// find posts with a specific tag
// (This will use an index!)
> db.blog.find({tags: 'comic'})
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists,$type, ..
$lt, $lte, $gt, $gte, $ne

Update operators:
$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the schema

http://nysi.org.uk/kids_stuff/rocket/rocket.htm

Extending the Schema
new_comment = {author: "Chris",
date: new Date(),
text: "great book",
votes: 5}

> db.blog.update(
{text: "Destination Moon" },

{"$push": {comments: new_comment},
"$inc": {comments_count: 1}
})

Extending the Schema
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
text : "Destination Moon",
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}

The 'dot' operator
// create index on nested documents:
> db.blog.ensureIndex({"comments.author": 1})

> db.blog.find({"comments.author":"Chris"})
author: "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
...
}

The 'dot' operator

// create index comment votes:
> db.blog.ensureIndex({"comments.votes": 1})

// find all posts with any comments with
// more than 50 votes
> db.blog.find({"comments.votes": {$gt: 50}})

The 'dot' operator

// find last 5 posts:
> db.blog.find().sort({"date":-1}).limit(5)

// find the top 10 commented posts:
> db.blog.find().sort({"comments_count":-1}).limit(10)

When sorting, check if you need an index...

Watch for full table scans
{
"cursor" : "BasicCursor",
"nscanned" : 250003,
"nscannedObjects" : 250003,
"n" : 10,
"scanAndOrder" : true,
"millis" : 335,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

Rich Documents

http://www.ﬂickr.com/photos/diorama_sky/2975796332

Rich Documents

• Intuitive
• Developer friendly
• Encapsulates whole objects
• Performant
• They are scalable

Common Patterns

http://www.ﬂickr.com/photos/colinwarren/158628063

Inheritance

http://www.ﬂickr.com/photos/dysonstarr/5098228295

Single Table Inheritance - RDBMS
• Shapes table
id type area radius d length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - MongoDB
> db.shapes.find()
{ _id: "1", type: "circle", area: 3.14, radius: 1}
{ _id: "2", type: "square", area: 4, d: 2}
{ _id: "3", type: "rect", area: 10, length: 5,
width: 2}

> db.shapes.find()
width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})

> db.shapes.find()
width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})

// create sparse index
> db.shapes.ensureIndex({radius: 1}, {sparse: true})

One to Many

http://www.ﬂickr.com/photos/j-ﬁsh/6502708899/

One to Many
Embedded Array / Array Keys

• $slice operator to return subset of array
• some queries hard
e.g ﬁnd latest comments across all documents

One to Many
Embedded Array / Array Keys
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
tags : [ "comic", "adventure" ],
comments : [{
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
text : "great book",
votes : 5
}],
comments_count: 1
}

One to Many
Normalized (2 collections)

• Most ﬂexible
• More queries

One to Many - Normalized
// Posts collection
{ _id : 1000,
author : "Hergé",
date: ISODate("2012-01-23T14:01:00.117Z"),
}
// Comments collection
{ _id : 1,
blog : 1000,
author : "Chris",
date : ISODate("2012-01-23T14:31:53.848Z"),
...
}
> blog = db.blogs.find({text: "Destination Moon"});
> db.comments.find({blog: blog._id});

One to Many - patterns

• Embedded Array / Array Keys

• Embedded Array / Array Keys
• Normalized

Embedding vs. Referencing

• Embed when the 'many' objects always appear
with their parent.

• Reference when you need more ﬂexibility.

Many to Many

http://www.ﬂickr.com/photos/pats0n/6013379192

Many - Many
Example:

• Product can be in many categories
• Category can have many products

Many to Many
// Products
{ _id: 10,
name: "Destination Moon",
category_ids: [20, 30]}

Many to Many
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic",
product_ids:[10, 11, 12]}
{ _id: 30,
name: "adventure",
product_ids:[10]}

Many to Many
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic",
product_ids:[10, 11, 12]}
{ _id: 30,
name: "adventure",
product_ids:[10]}

//All categories for a given product
> db.categories.find({"product_ids": 10})

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})

Alternative
// Products
{ _id: 10,
// Categories
{ _id: 20,
name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})

// All categories for a given product
product = db.products.find(_id : some_id)
> db.categories.find({_id : {$in : product.category_ids}})

Trees

http://www.ﬂickr.com/photos/cubagallery/5949819558

Trees
Hierarchical information

Trees
Embedded Tree
{ comments : [{
author : "Chris", text : "...",
replies : [{
author : "Fred", text : "..."
replies : [],
}]
}]
}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

Array of Ancestors
A B C
// Store all ancestors of a node
{ _id: "a" } E D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" } F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }

Array of Ancestors
A B C
{ _id: "a" } E D
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})

Array of Ancestors
A B C
{ _id: "a" } E D
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})

Array of Ancestors
Store hierarchy as a path expression

• Separate each node by a delimiter, e.g. "/"
• Use text search for ﬁnd parts of a tree
{ comments: [
{ author: "Kyle", text: "initial post",
path: "" },
{ author: "Jim", text: "jim’s comment",
path: "jim" },
{ author: "Kyle", text: "Kyle’s reply to Jim",
path : "jim/kyle"} ] }

// Find the conversations Jim was part of
> db.blogs.find({path: /^jim/i})

Queues

http://www.ﬂickr.com/photos/deanspic/4960440218

Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}

Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
priority: 1,
message: "Rich documents FTW!"
...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
query: {in_progress: false},
sort: {priority: -1),
update: {$set: {in_progress: true,
started: new Date()}}})

Anti Patterns

http://www.ﬂickr.com/photos/51838104@N02/5841690990

Anti patterns
• Careless indexing
• Large, deeply nested documents
• Multiple types for a key
• One size ﬁts all collections
• One collection per user

Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how the apps manipulates data
• Rapidly evolve schema to meet your requirements
• Enjoy your new freedom, use it wisely :-)

download at mongodb.org

conferences, appearances, and meetups
http://www.10gen.com/events

Facebook | Twitter | LinkedIn
http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo

support, training, and this talk brought to you by

Schema design

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Schema design (20)

Recently uploaded (20)

Schema design

Editor's Notes