SlideShare a Scribd company logo
Schema Design

Christian Kvalheim - christkv@10gen.com
Topics
 Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents
Topics
 Introduction
• Working with documents
• Evolving a schema
• Queries and indexes
• Rich Documents

Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
Ways to model data:




      http://www.flickr.com/photos/42304632@N00/493639870/
Relational
Rich Document
Terminology
  RDBMS       MongoDB

  Table       Collection

  Row(s)      JSON Document

  Index       Index

  Join        Embedding & Linking
Schema-design criteria
 How can we manipulate         Access Patterns?
 this data?
• Dynamic Queries             • Read / Write Ratio
• Secondary Indexes           • Types of updates
• Atomic Updates              • Types of queries
• Map Reduce                  • Data life-cycle
• Aggregation (coming soon)
 Considerations

• No Joins
• Document writes are atomic
Destination Moon
A simple start
post = {author: "Hergé",
        date: new Date(),
        text: "Destination Moon",
        tags: ["comic", "adventure"]}

> db.blog.save(post)


Map the documents to your application.
Find the document
> db.blog.find()
  { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author: "Hergé",
    date: ISODate("2012-01-23T14:01:00.117Z"),
    text: "Destination Moon",
    tags: [ "comic", "adventure" ]
  }

Note:
• _id must be unique, but can be anything you'd like
• Default BSON ObjectId if one is not supplied
Add an index, find via index
> db.blog.ensureIndex({author: 1})
> db.blog.find({author: 'Hergé'})

   { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author: "Hergé",
     date: ISODate("2012-01-23T14:01:00.117Z"),
      ...
    }


Secondary index on "author"
Examine the query plan
> db.blogs.find({"author": 'Hergé'}).explain()
{
	    "cursor" : "BtreeCursor author_1",
	    "nscanned" : 1,
	    "nscannedObjects" : 1,
	    "n" : 1,
	    "millis" : 0,
	    "indexBounds" : {
	    	   "author" : [
	    	   	    [
	    	   	    	   "Hergé",
	    	   	    	   "Hergé"
	    	   	    ]
	    	   ]
	    }
}
Multi-key indexes
// Build an index on the 'tags' array
> db.blog.ensureIndex({tags: 1})

// find posts with a specific tag
// (This will use an index!)
> db.blog.find({tags: 'comic'})
  { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author: "Hergé",
     date: ISODate("2012-01-23T14:01:00.117Z"),
     ...
   }
Query operators
 Conditional operators:
 $ne, $in, $nin, $mod, $all, $size, $exists,$type, ..
 $lt, $lte, $gt, $gte, $ne

 Update operators:
   $set, $inc, $push, $pop, $pull, $pushAll, $pullAll
Extending the schema




    http://nysi.org.uk/kids_stuff/rocket/rocket.htm
Extending the Schema
new_comment = {author: "Chris",
               date: new Date(),
               text: "great book",
               votes: 5}

> db.blog.update(
     {text: "Destination Moon" },

       {"$push": {comments: new_comment},
        "$inc": {comments_count: 1}
  })
Extending the Schema
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "Hergé",
      date: ISODate("2012-01-23T14:01:00.117Z"),
      text : "Destination Moon",
      tags : [ "comic", "adventure" ],
      comments : [{
	     	   author : "Chris",
	     	   date : ISODate("2012-01-23T14:31:53.848Z"),
	     	   text : "great book",
          votes : 5
	     }],
      comments_count: 1
    }
Extending the Schema
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "Hergé",
      date: ISODate("2012-01-23T14:01:00.117Z"),
      text : "Destination Moon",
      tags : [ "comic", "adventure" ],
      comments : [{
	     	   author : "Chris",
	     	   date : ISODate("2012-01-23T14:31:53.848Z"),
	     	   text : "great book",
          votes : 5
	     }],
      comments_count: 1
    }
The 'dot' operator
// create index on nested documents:
> db.blog.ensureIndex({"comments.author": 1})

> db.blog.find({"comments.author":"Chris"})
  { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"),
     author: "Hergé",
     date: ISODate("2012-01-23T14:01:00.117Z"),
     ...
   }
The 'dot' operator

// create index comment votes:
> db.blog.ensureIndex({"comments.votes": 1})

// find all posts with any comments with
// more than 50 votes
> db.blog.find({"comments.votes": {$gt: 50}})
The 'dot' operator

// find last 5 posts:
> db.blog.find().sort({"date":-1}).limit(5)

// find the top 10 commented posts:
> db.blog.find().sort({"comments_count":-1}).limit(10)

When sorting, check if you need an index...
Watch for full table scans
{
	    "cursor" : "BasicCursor",
	    "nscanned" : 250003,
	    "nscannedObjects" : 250003,
	    "n" : 10,
	    "scanAndOrder" : true,
	    "millis" : 335,
	    "nYields" : 0,
	    "nChunkSkips" : 0,
	    "isMultiKey" : false,
	    "indexOnly" : false,
	    "indexBounds" : {
	    	
	    }
}
Watch for full table scans
{
	    "cursor" : "BasicCursor",
	    "nscanned" : 250003,
	    "nscannedObjects" : 250003,
	    "n" : 10,
	    "scanAndOrder" : true,
	    "millis" : 335,
	    "nYields" : 0,
	    "nChunkSkips" : 0,
	    "isMultiKey" : false,
	    "indexOnly" : false,
	    "indexBounds" : {
	    	
	    }
}
Rich Documents




 http://www.flickr.com/photos/diorama_sky/2975796332
Rich Documents

• Intuitive
• Developer friendly
• Encapsulates whole objects
• Performant
• They are scalable
Common Patterns




  http://www.flickr.com/photos/colinwarren/158628063
Inheritance




http://www.flickr.com/photos/dysonstarr/5098228295
Inheritance
Single Table Inheritance - RDBMS
• Shapes table
  id    type     area   radius d   length width

  1     circle   3.14   1



  2     square 4              2



  3     rect     10                5     2
Single Table Inheritance - MongoDB
> db.shapes.find()
 { _id: "1", type: "circle", area: 3.14, radius: 1}
 { _id: "2", type: "square", area: 4, d: 2}
 { _id: "3", type: "rect", area: 10, length: 5,
   width: 2}
Single Table Inheritance - MongoDB
> db.shapes.find()
 { _id: "1", type: "circle", area: 3.14, radius: 1}
 { _id: "2", type: "square", area: 4, d: 2}
 { _id: "3", type: "rect", area: 10, length: 5,
   width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})
Single Table Inheritance - MongoDB
> db.shapes.find()
 { _id: "1", type: "circle", area: 3.14, radius: 1}
 { _id: "2", type: "square", area: 4, d: 2}
 { _id: "3", type: "rect", area: 10, length: 5,
   width: 2}
// find shapes where radius > 0
> db.shapes.find({radius: {$gt: 0}})

// create sparse index
> db.shapes.ensureIndex({radius: 1}, {sparse: true})
One to Many




http://www.flickr.com/photos/j-fish/6502708899/
One to Many
One to Many
Embedded Array / Array Keys

• $slice operator to return subset of array
• some queries hard
    e.g find latest comments across all documents
One to Many
    Embedded Array / Array Keys
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "Hergé",
      date: ISODate("2012-01-23T14:01:00.117Z"),
      text : "Destination Moon",
      tags : [ "comic", "adventure" ],
      comments : [{
	     	   author : "Chris",
	     	   date : ISODate("2012-01-23T14:31:53.848Z"),
	     	   text : "great book",
          votes : 5
	     }],
      comments_count: 1
    }
One to Many
    Embedded Array / Array Keys
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "Hergé",
      date: ISODate("2012-01-23T14:01:00.117Z"),
      text : "Destination Moon",
      tags : [ "comic", "adventure" ],
      comments : [{
	     	   author : "Chris",
	     	   date : ISODate("2012-01-23T14:31:53.848Z"),
	     	   text : "great book",
          votes : 5
	     }],
      comments_count: 1
    }
One to Many
    Embedded Array / Array Keys
    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
      author : "Hergé",
      date: ISODate("2012-01-23T14:01:00.117Z"),
      text : "Destination Moon",
      tags : [ "comic", "adventure" ],
      comments : [{
	     	   author : "Chris",
	     	   date : ISODate("2012-01-23T14:31:53.848Z"),
	     	   text : "great book",
          votes : 5
	     }],
      comments_count: 1
    }
One to Many
Normalized (2 collections)

• Most flexible
• More queries
One to Many - Normalized
 // Posts collection
 { _id : 1000,
    author : "Hergé",
    date: ISODate("2012-01-23T14:01:00.117Z"),
    text : "Destination Moon",
  }
  // Comments collection
  { _id : 1,
     blog : 1000,
     author : "Chris",
     date : ISODate("2012-01-23T14:31:53.848Z"),
     ...
  }
> blog = db.blogs.find({text: "Destination Moon"});
> db.comments.find({blog: blog._id});
One to Many - patterns


• Embedded Array / Array Keys




• Embedded Array / Array Keys
• Normalized
Embedding vs. Referencing

• Embed when the 'many' objects always appear
 with their parent.

• Reference when you need more flexibility.
Many to Many




http://www.flickr.com/photos/pats0n/6013379192
Many - Many
Example:

• Product can be in many categories
• Category can have many products
Many to Many
// Products
{ _id: 10,
  name: "Destination Moon",
  category_ids: [20, 30]}
Many to Many
// Products
{ _id: 10,
  name: "Destination Moon",
  category_ids: [20, 30]}
 // Categories
{ _id: 20,
  name: "comic",
  product_ids:[10, 11, 12]}
{ _id: 30,
  name: "adventure",
  product_ids:[10]}
Many to Many
 // Products
 { _id: 10,
   name: "Destination Moon",
   category_ids: [20, 30]}
  // Categories
 { _id: 20,
   name: "comic",
   product_ids:[10, 11, 12]}
 { _id: 30,
   name: "adventure",
   product_ids:[10]}

//All categories for a given product
> db.categories.find({"product_ids": 10})
Alternative
// Products
{ _id: 10,
  name: "Destination Moon",
  category_ids: [20, 30]}
 // Categories
{ _id: 20,
  name: "comic"}
Alternative
 // Products
 { _id: 10,
   name: "Destination Moon",
   category_ids: [20, 30]}
  // Categories
 { _id: 20,
   name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})
Alternative
 // Products
 { _id: 10,
   name: "Destination Moon",
   category_ids: [20, 30]}
  // Categories
 { _id: 20,
   name: "comic"}

//All products for a given category
> db.products.find({"category_ids": 20})

// All categories for a given product
product = db.products.find(_id : some_id)
> db.categories.find({_id : {$in : product.category_ids}})
Trees




http://www.flickr.com/photos/cubagallery/5949819558
Trees
Hierarchical information
Trees
 Embedded Tree
  { comments : [{
	   	   author : "Chris", text : "...",
        replies : [{
            author : "Fred", text : "..."
            replies : [],
	   	    }]
	   }]
  }

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit
Array of Ancestors
                                            A      B   C
// Store all ancestors of a node
{ _id: "a" }                                       E   D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" }       F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
Array of Ancestors
                                            A      B   C
// Store all ancestors of a node
{ _id: "a" }                                       E   D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" }       F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
Array of Ancestors
                                            A      B   C
// Store all ancestors of a node
{ _id: "a" }                                       E   D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" }       F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
Array of Ancestors
                                            A      B   C
// Store all ancestors of a node
{ _id: "a" }                                       E   D
{ _id: "b", thread: [ "a" ], replyTo: "a" }
{ _id: "c", thread: [ "a", "b" ], replyTo: "b" }       F
{ _id: "d", thread: [ "a", "b" ], replyTo: "b" }
{ _id: "e", thread: [ "a" ], replyTo: "a" }
{ _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where 'b" is in
> db.msg_tree.find({"thread": "b"})
// find all direct message "b: replied to
> db.msg_tree.find({"replyTo": "b"})
//find all ancestors of f:
> threads = db.msg_tree.findOne({"_id": "f"}).thread
> db.msg_tree.find({"_id ": { $in : threads})
Array of Ancestors
Store hierarchy as a path expression

 • Separate each node by a delimiter, e.g. "/"
 • Use text search for find parts of a tree
{ comments: [
     { author: "Kyle", text: "initial post",
       path: "" },
     { author: "Jim", text: "jim’s comment",
       path: "jim" },
     { author: "Kyle", text: "Kyle’s reply to Jim",
       path : "jim/kyle"} ] }

// Find the conversations Jim was part of
> db.blogs.find({path: /^jim/i})
Queues




http://www.flickr.com/photos/deanspic/4960440218
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
  priority: 1,
  message: "Rich documents FTW!"
  ...
}
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
  priority: 1,
  message: "Rich documents FTW!"
  ...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
               query: {in_progress: false},
               sort:   {priority: -1),
               update: {$set: {in_progress: true,
                               started: new Date()}}})
Queue
Requirements
• See jobs waiting, jobs in progress
• Ensure that each job is started once and only once
// Queue document
{ in_progress: false,
  priority: 1,
  message: "Rich documents FTW!"
  ...
}
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({
               query: {in_progress: false},
               sort:   {priority: -1),
               update: {$set: {in_progress: true,
                               started: new Date()}}})
Anti Patterns




http://www.flickr.com/photos/51838104@N02/5841690990
Anti patterns
• Careless indexing
• Large, deeply nested documents
• Multiple types for a key
• One size fits all collections
• One collection per user
Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how the apps manipulates data
• Rapidly evolve schema to meet your requirements
• Enjoy your new freedom, use it wisely :-)
download at mongodb.org

     conferences, appearances, and meetups
                http://www.10gen.com/events



   Facebook             |    Twitter   |        LinkedIn
http://bit.ly/mongofb       @mongodb   http://linkd.in/joinmongo


  support, training, and this talk brought to you by

More Related Content

PDF
10gen Presents Schema Design and Data Modeling
PPTX
Building a Scalable Inbox System with MongoDB and Java
PPTX
MongoDB Schema Design: Four Real-World Examples
PDF
MongoDB Schema Design
PPTX
Webinar: Schema Design
KEY
Schema Design with MongoDB
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
PPTX
Dev Jumpstart: Schema Design Best Practices
10gen Presents Schema Design and Data Modeling
Building a Scalable Inbox System with MongoDB and Java
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design
Webinar: Schema Design
Schema Design with MongoDB
Webinar: General Technical Overview of MongoDB for Dev Teams
Dev Jumpstart: Schema Design Best Practices

What's hot (20)

PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
KEY
Mongo db presentation
PDF
Mongo DB schema design patterns
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
KEY
Managing Social Content with MongoDB
PPTX
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
KEY
MongoDB - Introduction
PPTX
Agg framework selectgroup feb2015 v2
KEY
MongoDB, PHP and the cloud - php cloud summit 2011
PPTX
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
PDF
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
PDF
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
PDF
Agile Schema Design: An introduction to MongoDB
PPTX
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
PPT
MongoDB Schema Design
PPTX
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
PPTX
Introduction to MongoDB and Hadoop
PDF
MongoSV Schema Workshop
Back to Basics Webinar 3: Schema Design Thinking in Documents
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Mongo db presentation
Mongo DB schema design patterns
MongoDB Europe 2016 - Debugging MongoDB Performance
Managing Social Content with MongoDB
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB - Introduction
Agg framework selectgroup feb2015 v2
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
Agile Schema Design: An introduction to MongoDB
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB Schema Design
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Introduction to MongoDB and Hadoop
MongoSV Schema Workshop
Ad

Viewers also liked (17)

PDF
Lessons from 4 years of driver develoment
PPT
Jessica Oughton Athlete Prospectus
PPS
Presentació del curs de tast de vi "Del celler al paladar"
PDF
Node.js and ruby
PPTX
Storage talk
PPS
Mp Ciekawe Fotografie
PPTX
Restarting Enterprise Architecture in the age of Digital Transformation
PPT
Tema 6 La construccion del estado liberal 1833_1868
PDF
The Web Hacking Incidents Database Annual
PPS
Vai um planner aí?
PPS
PDF
New in MongoDB 2.6
KEY
Mongodb intro
KEY
Mongo db ecommerce
PPT
Els Invertebrats
PPT
Viral Marketing Strategies, Graphing Social Patterns East Presented by Jeff R...
PPTX
It4it state of the forum ogsfo partner pavilion jan 2016
Lessons from 4 years of driver develoment
Jessica Oughton Athlete Prospectus
Presentació del curs de tast de vi "Del celler al paladar"
Node.js and ruby
Storage talk
Mp Ciekawe Fotografie
Restarting Enterprise Architecture in the age of Digital Transformation
Tema 6 La construccion del estado liberal 1833_1868
The Web Hacking Incidents Database Annual
Vai um planner aí?
New in MongoDB 2.6
Mongodb intro
Mongo db ecommerce
Els Invertebrats
Viral Marketing Strategies, Graphing Social Patterns East Presented by Jeff R...
It4it state of the forum ogsfo partner pavilion jan 2016
Ad

Similar to Schema design (20)

KEY
Schema Design (Mongo Austin)
PDF
Intro to MongoDB and datamodeling
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PDF
Latinoware
PPTX
Schema design mongo_boston
PPTX
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
PPTX
Schema Design
PPTX
Webinar: Schema Design
PDF
Schema & Design
PPTX
Document databases
PPTX
Webinar: Back to Basics: Thinking in Documents
PDF
Schema Design
PDF
Schema Design
PPT
Building Your First MongoDB App ~ Metadata Catalog
PPTX
Schema Design
PDF
The emerging world of mongo db csp
KEY
Schema Design
PDF
Mongo db
PDF
MongoDB Schema Design
KEY
Modeling Data in MongoDB
Schema Design (Mongo Austin)
Intro to MongoDB and datamodeling
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Latinoware
Schema design mongo_boston
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Schema Design
Webinar: Schema Design
Schema & Design
Document databases
Webinar: Back to Basics: Thinking in Documents
Schema Design
Schema Design
Building Your First MongoDB App ~ Metadata Catalog
Schema Design
The emerging world of mongo db csp
Schema Design
Mongo db
MongoDB Schema Design
Modeling Data in MongoDB

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Tartificialntelligence_presentation.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
1. Introduction to Computer Programming.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Getting Started with Data Integration: FME Form 101
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A comparative analysis of optical character recognition models for extracting...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Assigned Numbers - 2025 - Bluetooth® Document
Spectroscopy.pptx food analysis technology
Tartificialntelligence_presentation.pptx
A comparative study of natural language inference in Swahili using monolingua...
OMC Textile Division Presentation 2021.pptx
Empathic Computing: Creating Shared Understanding
1. Introduction to Computer Programming.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Getting Started with Data Integration: FME Form 101

Schema design

  • 2. Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich Documents
  • 3. Topics Introduction • Working with documents • Evolving a schema • Queries and indexes • Rich Documents Common patterns • Single table inheritance • One-to-Many & Many-to-Many • Trees • Queues
  • 4. Ways to model data: http://www.flickr.com/photos/42304632@N00/493639870/
  • 7. Terminology RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking
  • 8. Schema-design criteria How can we manipulate Access Patterns? this data? • Dynamic Queries • Read / Write Ratio • Secondary Indexes • Types of updates • Atomic Updates • Types of queries • Map Reduce • Data life-cycle • Aggregation (coming soon) Considerations • No Joins • Document writes are atomic
  • 10. A simple start post = {author: "Hergé", date: new Date(), text: "Destination Moon", tags: ["comic", "adventure"]} > db.blog.save(post) Map the documents to your application.
  • 11. Find the document > db.blog.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Note: • _id must be unique, but can be anything you'd like • Default BSON ObjectId if one is not supplied
  • 12. Add an index, find via index > db.blog.ensureIndex({author: 1}) > db.blog.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... } Secondary index on "author"
  • 13. Examine the query plan > db.blogs.find({"author": 'Hergé'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] } }
  • 14. Multi-key indexes // Build an index on the 'tags' array > db.blog.ensureIndex({tags: 1}) // find posts with a specific tag // (This will use an index!) > db.blog.find({tags: 'comic'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... }
  • 15. Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists,$type, .. $lt, $lte, $gt, $gte, $ne Update operators: $set, $inc, $push, $pop, $pull, $pushAll, $pullAll
  • 16. Extending the schema http://nysi.org.uk/kids_stuff/rocket/rocket.htm
  • 17. Extending the Schema new_comment = {author: "Chris", date: new Date(), text: "great book", votes: 5} > db.blog.update( {text: "Destination Moon" }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} })
  • 18. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 19. Extending the Schema { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 20. The 'dot' operator // create index on nested documents: > db.blog.ensureIndex({"comments.author": 1}) > db.blog.find({"comments.author":"Chris"}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), ... }
  • 21. The 'dot' operator // create index comment votes: > db.blog.ensureIndex({"comments.votes": 1}) // find all posts with any comments with // more than 50 votes > db.blog.find({"comments.votes": {$gt: 50}})
  • 22. The 'dot' operator // find last 5 posts: > db.blog.find().sort({"date":-1}).limit(5) // find the top 10 commented posts: > db.blog.find().sort({"comments_count":-1}).limit(10) When sorting, check if you need an index...
  • 23. Watch for full table scans { "cursor" : "BasicCursor", "nscanned" : 250003, "nscannedObjects" : 250003, "n" : 10, "scanAndOrder" : true, "millis" : 335, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } }
  • 24. Watch for full table scans { "cursor" : "BasicCursor", "nscanned" : 250003, "nscannedObjects" : 250003, "n" : 10, "scanAndOrder" : true, "millis" : 335, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } }
  • 26. Rich Documents • Intuitive • Developer friendly • Encapsulates whole objects • Performant • They are scalable
  • 27. Common Patterns http://www.flickr.com/photos/colinwarren/158628063
  • 30. Single Table Inheritance - RDBMS • Shapes table id type area radius d length width 1 circle 3.14 1 2 square 4 2 3 rect 10 5 2
  • 31. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
  • 32. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
  • 33. Single Table Inheritance - MongoDB > db.shapes.find() { _id: "1", type: "circle", area: 3.14, radius: 1} { _id: "2", type: "square", area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2} // find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create sparse index > db.shapes.ensureIndex({radius: 1}, {sparse: true})
  • 36. One to Many Embedded Array / Array Keys • $slice operator to return subset of array • some queries hard e.g find latest comments across all documents
  • 37. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 38. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 39. One to Many Embedded Array / Array Keys { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [{ author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), text : "great book", votes : 5 }], comments_count: 1 }
  • 40. One to Many Normalized (2 collections) • Most flexible • More queries
  • 41. One to Many - Normalized // Posts collection { _id : 1000, author : "Hergé", date: ISODate("2012-01-23T14:01:00.117Z"), text : "Destination Moon", } // Comments collection { _id : 1, blog : 1000, author : "Chris", date : ISODate("2012-01-23T14:31:53.848Z"), ... } > blog = db.blogs.find({text: "Destination Moon"}); > db.comments.find({blog: blog._id});
  • 42. One to Many - patterns • Embedded Array / Array Keys • Embedded Array / Array Keys • Normalized
  • 43. Embedding vs. Referencing • Embed when the 'many' objects always appear with their parent. • Reference when you need more flexibility.
  • 45. Many - Many Example: • Product can be in many categories • Category can have many products
  • 46. Many to Many // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]}
  • 47. Many to Many // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic", product_ids:[10, 11, 12]} { _id: 30, name: "adventure", product_ids:[10]}
  • 48. Many to Many // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic", product_ids:[10, 11, 12]} { _id: 30, name: "adventure", product_ids:[10]} //All categories for a given product > db.categories.find({"product_ids": 10})
  • 49. Alternative // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic"}
  • 50. Alternative // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic"} //All products for a given category > db.products.find({"category_ids": 20})
  • 51. Alternative // Products { _id: 10, name: "Destination Moon", category_ids: [20, 30]} // Categories { _id: 20, name: "comic"} //All products for a given category > db.products.find({"category_ids": 20}) // All categories for a given product product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}})
  • 54. Trees Embedded Tree { comments : [{ author : "Chris", text : "...", replies : [{ author : "Fred", text : "..." replies : [], }] }] } Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit
  • 55. Array of Ancestors A B C // Store all ancestors of a node { _id: "a" } E D { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } F { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }
  • 56. Array of Ancestors A B C // Store all ancestors of a node { _id: "a" } E D { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } F { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where 'b" is in > db.msg_tree.find({"thread": "b"})
  • 57. Array of Ancestors A B C // Store all ancestors of a node { _id: "a" } E D { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } F { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where 'b" is in > db.msg_tree.find({"thread": "b"}) // find all direct message "b: replied to > db.msg_tree.find({"replyTo": "b"})
  • 58. Array of Ancestors A B C // Store all ancestors of a node { _id: "a" } E D { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } F { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where 'b" is in > db.msg_tree.find({"thread": "b"}) // find all direct message "b: replied to > db.msg_tree.find({"replyTo": "b"}) //find all ancestors of f: > threads = db.msg_tree.findOne({"_id": "f"}).thread > db.msg_tree.find({"_id ": { $in : threads})
  • 59. Array of Ancestors Store hierarchy as a path expression • Separate each node by a delimiter, e.g. "/" • Use text search for find parts of a tree { comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] } // Find the conversations Jim was part of > db.blogs.find({path: /^jim/i})
  • 61. Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once // Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... }
  • 62. Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once // Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
  • 63. Queue Requirements • See jobs waiting, jobs in progress • Ensure that each job is started once and only once // Queue document { in_progress: false, priority: 1, message: "Rich documents FTW!" ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {in_progress: false}, sort: {priority: -1), update: {$set: {in_progress: true, started: new Date()}}})
  • 65. Anti patterns • Careless indexing • Large, deeply nested documents • Multiple types for a key • One size fits all collections • One collection per user
  • 66. Summary • Schema design is different in MongoDB • Basic data design principals stay the same • Focus on how the apps manipulates data • Rapidly evolve schema to meet your requirements • Enjoy your new freedom, use it wisely :-)
  • 67. download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by

Editor's Notes

  • #2: \n
  • #3: \n
  • #4: * EXplain why..\n
  • #5: * 3rd Normal Form - determining a table's degree of vulnerability to logical inconsistencies\n* The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies\n
  • #6: * Scaling RDMS path tends towards denormalization\n
  • #7: * No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
  • #8: * Questions about database features inform our schema design\nAccess Patterns\n* Less of an issue for Normalized databases\n* MongoDB document models can be rich, its flexible\n
  • #9: * To review simple schema design we'll use a simple blog example..\n
  • #10: * Notice Hergé - UTF-8 support is native\n
  • #11: \n
  • #12: \n
  • #13: \n
  • #14: * Can create indexes for arrays / objects\n* In the Relational world - you'd have to do joins\n* Object modelled directly to MongoDB\n
  • #15: * Rich query language\n* Powerful - can do range queries $lt and $gt\n* Update - can update parts of documents\n
  • #16: \n
  • #17: * upserts - $push, $inc\n\n
  • #18: \n
  • #19: * Allows easy access to embedded documents / arrays\n* Also can do positional: comments.0.author\n
  • #20: * range queries still use indexes\n
  • #21: \n
  • #22: * Full collection scan\n* scanAndOrder - reorders\n
  • #23: \n
  • #24: * If document is always presented as a whole - a single doc gives performance benefits\n* A single doc is not a panacea - as we'll see\n
  • #25: *As with nature common patterns emerge when modeling data\n
  • #26: \n
  • #27: \n
  • #28: * Leaves nulls in the table\n* Not intuitive\n
  • #29: * Single Table inheritance is clean and initiative in mongodb\n
  • #30: * Single Table inheritance is clean and initiative in mongodb\n
  • #31: \n
  • #32: * One author one Blog Entry\n* Many authors for one Blog Entry\n** Delete the blog - don't delete the author(s)\n** Delete the blog - delete the author(s) - aka Cascading delete\n
  • #33: \n
  • #34: \n
  • #35: \n
  • #36: \n
  • #37: \n
  • #38: \n
  • #39: \n
  • #40: \n
  • #41: \n
  • #42: \n
  • #43: \n
  • #44: \n
  • #45: \n
  • #46: \n
  • #47: \n
  • #48: \n
  • #49: \n
  • #50: * Also one to many pattern\n
  • #51: \n
  • #52: \n
  • #53: \n
  • #54: \n
  • #55: \n
  • #56: \n
  • #57: \n
  • #58: \n
  • #59: \n
  • #60: \n
  • #61: \n
  • #62: \n
  • #63: \n
  • #64: \n
  • #65: * Update: will update in_progress and add started\n
  • #66: * Update: will update in_progress and add started\n
  • #67: \n
  • #68: * limits on number of namespaces\n
  • #69: * Schema is specific to application / data usage\n* Think future - data change / how you are going to query\n
  • #70: \n