SlideShare a Scribd company logo
NoSQL, NO PROBLEM:
USING AZURE
DOCUMENTDB
{
"name": "Ken Cenerelli",
"twitter": "@KenCenerelli",
"e-mail": "Ken_Cenerelli@Outlook.com",
"hashtags": ["#DevTeach", "#DocumentDB"]
}
ABOUT ME
Twitter: @KenCenerelli
Email: Ken_Cenerelli@Outlook.com
Blog: kencenerelli.wordpress.com
LinkedI
n:
linkedin.com/in/kencenerelli
Bio:
 Content Developer / Programmer
Writer
 Microsoft MVP - Visual Studio and
Development Technologies
 Microsoft TechNet Wiki Guru
 Co-Organizer of CTTDNUG
 Technical reviewer of multiple
booksCTTDNU
G
Ken
Cenerelli
2
ROAD MAP
1. Overview
2. The Resource Model
3. Modeling Your Data
4. Performance
5. Developing with DocumentDB & Demos {“aka”: “The Good Stuff”}
6. Pricing
7. Wrap-up
3
WHAT IS NoSQL?
 NoSQL → Not Only SQL
 No up-front (schema) design
 Easier to scale horizontally
 Easier to develop iteratively
 Types & Examples:
 Document databases: DocumentDB, MongoDB, CouchDB
 Key-value stores: Redis
 Graph stores: Neo4J, Giraph
 Wide-column: Cassandra, HBase
4
WHAT IS AZURE DOCUMENTDB?
 NoSQL document database fully managed by Microsoft Azure
 Part of the NoSQL family of databases
 For rapid development of cloud-designed apps (web, mobile,
gaming, IoT)
 Store and query schema agnostic JSON data with SQL-like grammar
 Fast, predictable performance
 Transactionally process multiple documents via native JavaScript
processing
 Tunable consistency levels
 Built with familiar tools – REST, JSON, JavaScript
5
6
WHERE DOES IT FIT IN THE AZURE
FAMILY?
7
WHEN TO USE DOCUMENTDB?
 In General
 You don’t want to do replication and scale-out by yourself
 You want ACID transactions
 You want to have tunable consistency
 You want to do rapid development where models can evolve
 You want to utilize your .NET, JavaScript and MongoDB skills
 Compared to relational databases
 You don’t want predefined columns
 Compared to other document stores
 You want to use a SQL-like grammar
8
WHEN TO NOT USE DOCUMENTDB?
 If your data has complex relationships
 If your data has rigid schemas
 If your data has complex transactions
 If your data needs aggregation
 If your data needs encrypted storage
 If you’re planning to move your entire data store to DocumentDB
 If you do not want your data to be locked into Azure
9
DOCUMENTDB USE CASES
 User generated content
 Blog posts, chat sessions, ratings, comments, feedback, polls
 Catalog data
 User accounts, product catalogs, device registries for IoT
 Logging and Time-series data
 Event logs, input source for data analytics jobs performed offline
 Gaming
 In-game stats, social media integration, and high-score leaderboards
 User preferences data
 Modern web and mobile applications
 IoT and Device sensor data
 Ingest bursts of data from device sensors, ad-hoc querying and offline analytics
10
RESOURCE MODEL
11
JS
JS
JS
101
010
RESOURCE MODEL
JS
JS
JS
101
010
RESOURCE MODEL
JS
JS
JS
101
010
* collection != table of homogenous entities
collection ~ a data partition
RESOURCE MODEL
14
JS
JS
JS
101
010
{
"id" : "123"
"name" : "joe"
"age" : 30
"address" : {
"street" : "some st"
}
}
RESOURCE MODEL
15
JS
JS
JS
101
010
RESOURCE ADDRESSING
 Native REST Interface
 Each resource has a permanent unique ID
 API URL:
 https://{database account}.documents.azure.com
 Document Path:
 /dbs/{database id}/colls/{collection id}/docs/{document id}
16
DOCUMENTDB JSON DOCUMENTS
JSON
 Intersection of most
modern type systems
JSON values
 Self-describable,
self-contained values
 Are trivially serialized
to/from text
17
{
"locations":
[
{"country": "Germany", "city": "Berlin"},
{"country": "France", "city": "Paris"},
],
"headquarters": "Belgium",
"exports":[{"city"; "Moscow"},{"city: "Athens"}]
};
a JSON document, as a tree
Locations
Headquarte
rs
Belgium
Country City Country City
Germany Berlin France Paris
Exports
CityCity
Moscow Athens
0 10 1
DATA MODELING WITH RDBMS
18
Doing it the RDBMS way: normalize everything!
To query for Person joins are needed to related
tables:
SELECT p.name, p.lastName, p.age, cd.detail,
cdt.type, a.street, a.city, a.state, a.zip
FROM Person p
INNER JOIN Address a
ON a.person_id = p.id
INNER JOIN ContactDetail cd
ON cd.person_id = p.id
INNER JOIN ContactDetailType cdt
ON cd.type_id = cdt.id
multiple
table updates
DATA MODELING WITH
DENORMALIZATION
19
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "thomas@andersen.com"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
}
Try to model your entity as a self-
contained document
Generally, use embedded data models
when:
 contains
 one-to-few
 changes infrequently
 won’t grow
 integral
better read
performance
DATA MODELING WITH
REFERENCING
20
In general, use normalized
data models when:
 Write performance is more
important than read
performance
 Representing one-to-many
relationships
 Can representing many-to-many
relationships
 Related data changes frequently
Provides more flexibility than
embedding
More round trips to read data
{
"id": "xyz",
"username: "user xyz"
}
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "user@user.com"
"phone" : "555 5555"
}
Normalizing typically provides better write performance
HYBRID MODELS: DENORMALIZE +
REFERENCE
21
No magic bullet!
Think about how your data is
going to be written and read
then model accordingly
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}
Author document
Book document
DATA MODELLING TIPS
 Map properties to JSON types
 Prefer smaller documents (<16KB) for smaller footprint, less IO,
lower RU charges
 Maximum size is 512KB – watch unbounded arrays leading to
document bloat
 Store metadata on attachments, reference binary data/free text as
external links
 Prefer sparse properties – skip rather than explicit null
 Use fullName = "Azure DocumentDB" instead of firstName =
"Azure" AND lastName = "DocumentDB"
22
TUNABLE CONSISTENCY
 Set at the account level
 Can be overridden at the query level
 Levels:
 Strong
 Session (default option)
 Bounded Staleness
 Eventual
23
Strong consistency; slow write
speeds
Weak consistency; fast write
speeds
INDEXING
 Automatic indexing of documents and its properties when added
to the collection
 Instantly queryable by property using a SQL-like grammar
 No need to define secondary indices / schema hints for indexing
24
Indexing Modes
Consistent
 Default mode
 Index updated
synchronously on
writes
Lazy
 Useful for bulk
ingestion scenarios
Indexing Policies
Automatic
 Default
Manual
 Can manually opt-
out of automatic
indexing
Indexing Types
Hash
 For equality queries
 Strings and
numbers
Range
 For comparison
queries
 Numbers
INDEXING POLICIES
25
Configuration Level Options
Automatic Per collection True (default) or False
Override with each document write
Indexing Mode Per collection Consistent or Lazy
Lazy for eventual updates/bulk ingestion
Included and excluded
paths
Per path Individual path or recursive includes (? And *)
Indexing Type Per path Support Hash (Default) and Range
Hash for equality, range for range queries
Indexing Precision Per path Supports 3 – 7 per path
Tradeoff storage, query RUs and write RUs
DOCUMENTDB FOR DEVELOPERS
 Promotes code-first development
 Resilient to iterative schema changes
 Low impedance as object / JSON store; no ORM required
 Richer query and indexing
 Has a REST API
 Available SDKs and libraries:
 .NET (LINQ to SQL is supported)
 Node.js
 JavaScript
 Python
 Java
 JavaScript for server-side app logic
26
QUERYING LIMITATION
 Within a collection
 Besides filtering, ORDER BY and TOP is supported
 No aggregation yet
 No COUNT
 No GROUP BY
 No SUM, AVG, etc.
SQL for queries only
 No batch UPDATE or DELETE or CREATE
27
DEMO TIME!
28
REQUEST UNITS
 DocumentDB unit of scale
 Throughput (in terms of rate of transactions / second)
 Measured in Request Units (RUs)
 1 RU = throughput for a 1KB document/second
 2,000 requests per second allowed
 “Request” depends on the size of the document
 For example, uploading 1,000 large JSON documents might count as more than
one request
 Max throughput per collection, measured in RUs per second per
collection, is 250,000 RUs/second
29
REQUEST UNITS
30
Request Unit (RU) is
the normalized
currency
%
Memory
% IOPS
% CPU
Replica gets a fixed
budget of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocsargs
Resource Resource
Predictable Performance
NOT ALL REQUEST UNITS ARE
CREATED EQUALLY
31
PRICING
 Standard pricing tier with
hourly billing
 99.95% availability
 Adjustable performance
levels
 Collections have 10 GB SSD
room
 Limit of 100 collections (1
TB) for each account – can be
adjusted
 http://bit.do/documentdb-
pricing
32
LIMITATIONS & QUOTA
33
Entity Quota
Accounts 5 (soft)
DBs / Account 100
Document storage per collection 250 GB
Collections / DB 100 (soft)
Request document size 512 KB
Permissions / Account 2M
Stored Procedures, Triggers & UDFs / collection 25
Max Execution Time / Stored Procedure or Trigger 5 seconds
ID Length 255 chars
AND, OR / query 20
https://azure.microsoft.com/en-
us/documentation/articles/documentdb-limits/
SUMMARY
 Collections != Tables
 De-normalize data where appropriate
 Tuning / Performance
 Consistency Levels
 Indexing Policies
 Understand Query Costs / Limits / Avoid Scans
34
DESIGNING A DOCUMENTDB APP
1.
2.
3.
4.
5.
6.





35
RESOURCES
 Query Playground: aka.ms/docdbplayground
 Data Import Tool: aka.ms/docdbimport
 Docs & Tutorials: aka.ms/documentdb-docs
 Code Samples: aka.ms/documentdb-samples
 Cheat Sheet: aka.ms/docdbcheatsheet
 Blog: aka.ms/documentdb-blog
 Twitter: @documentdb
36
QUESTIONS?
37
@KenCenerelli
Ken_Cenerelli@Outlook.
com
Please complete the session evaluation to win
prizes!
CLD101: NoSQL, No Problem: Use Azure DocumentDB
38Credit:

More Related Content

PPTX
Introduction to Azure DocumentDB
PPTX
Azure DocumentDB 101
PDF
Azure - Data Platform
PDF
Data Modeling and Relational to NoSQL
PPTX
Configuration in azure done right
PPTX
Test driving Azure Search and DocumentDB
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
RavenDB Overview
Introduction to Azure DocumentDB
Azure DocumentDB 101
Azure - Data Platform
Data Modeling and Relational to NoSQL
Configuration in azure done right
Test driving Azure Search and DocumentDB
Analyzing StackExchange data with Azure Data Lake
RavenDB Overview

What's hot (20)

PPTX
SQL to NoSQL: Top 6 Questions
PDF
Data Platform Overview
PPT
Document Databases & RavenDB
PDF
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
PPTX
Introduction to RavenDB
PDF
Analyze and visualize non-relational data with DocumentDB + Power BI
PPTX
Tokyo azure meetup #2 big data made easy
PPTX
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
PPT
Cloudant Overview Bluemix Meetup from Lisa Neddam
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PDF
Serverless SQL
PPTX
SharePoint 2013 APIs
PPTX
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
PPTX
ECS 19 Anil Erduran - simplifying microsoft architectures with aws services
PPTX
R in Power BI
PPTX
Thinking in a document centric world with RavenDB by Nick Josevski
PDF
Azure Data Factory presentation with links
PPTX
Oracle application container cloud back end integration using node final
PPTX
Deep Dive into Azure Data Factory v2
PDF
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
SQL to NoSQL: Top 6 Questions
Data Platform Overview
Document Databases & RavenDB
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Introduction to RavenDB
Analyze and visualize non-relational data with DocumentDB + Power BI
Tokyo azure meetup #2 big data made easy
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
Cloudant Overview Bluemix Meetup from Lisa Neddam
Integration Monday - Analysing StackExchange data with Azure Data Lake
Serverless SQL
SharePoint 2013 APIs
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
ECS 19 Anil Erduran - simplifying microsoft architectures with aws services
R in Power BI
Thinking in a document centric world with RavenDB by Nick Josevski
Azure Data Factory presentation with links
Oracle application container cloud back end integration using node final
Deep Dive into Azure Data Factory v2
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Ad

Similar to No SQL, No Problem: Use Azure DocumentDB (20)

PPTX
Introduction à DocumentDB
PPTX
Introducing DocumentDB
PPTX
Azure DocumentDB
PPTX
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
PPTX
Cool NoSQL on Azure with DocumentDB
PPTX
Azure doc db (slideshare)
PDF
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
PPTX
Azure DocumentDB Overview
PPTX
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
PPTX
Azure document db/Cosmos DB
PPTX
Webinar - Introduction to Azure DocumentDB
PPTX
Radu pintilie + liviu mazilu document db
PPTX
Document databases
PPTX
Introduction to Azure DocumentDB
PPTX
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
PDF
Introduction to azure document db
PPTX
Azure DocumentDB: Advanced Features for Large Scale-Apps
PPTX
AzureDocumentDB
PDF
Samedi SQL Québec - La plateforme data de Azure
Introduction à DocumentDB
Introducing DocumentDB
Azure DocumentDB
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
Cool NoSQL on Azure with DocumentDB
Azure doc db (slideshare)
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Azure DocumentDB Overview
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Azure document db/Cosmos DB
Webinar - Introduction to Azure DocumentDB
Radu pintilie + liviu mazilu document db
Document databases
Introduction to Azure DocumentDB
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
Introduction to azure document db
Azure DocumentDB: Advanced Features for Large Scale-Apps
AzureDocumentDB
Samedi SQL Québec - La plateforme data de Azure
Ad

More from Ken Cenerelli (14)

PPTX
ASP.NET Core deployment options
PPTX
Azure app service to create web and mobile apps
PPTX
ASP.NET Core: The best of the new bits
PPTX
Analyze Your Code With Visual Studio 2015 Diagnostic Tools
PPTX
Azure Data Storage
PPTX
Building high performance software with Microsoft Application Insights
PPTX
An Introduction to Universal Windows Apps
PPTX
Build end-to-end video experiences with Azure Media Services
PPTX
Cloud Powered Mobile Apps with Azure
PPTX
Building Windows 8.1 Apps with Mobile Services
PPTX
Maximizing code reuse between Windows Phone 8 and Windows 8 (That Conference ...
PPTX
Maximizing code reuse between Windows Phone 8 and Windows 8 (DevTeach Toronto...
PPTX
An Introduction to Windows Phone 7 Development
PPTX
Introduction To Umbraco
ASP.NET Core deployment options
Azure app service to create web and mobile apps
ASP.NET Core: The best of the new bits
Analyze Your Code With Visual Studio 2015 Diagnostic Tools
Azure Data Storage
Building high performance software with Microsoft Application Insights
An Introduction to Universal Windows Apps
Build end-to-end video experiences with Azure Media Services
Cloud Powered Mobile Apps with Azure
Building Windows 8.1 Apps with Mobile Services
Maximizing code reuse between Windows Phone 8 and Windows 8 (That Conference ...
Maximizing code reuse between Windows Phone 8 and Windows 8 (DevTeach Toronto...
An Introduction to Windows Phone 7 Development
Introduction To Umbraco

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A comparative analysis of optical character recognition models for extracting...
Spectroscopy.pptx food analysis technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
NewMind AI Weekly Chronicles - August'25-Week II
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

No SQL, No Problem: Use Azure DocumentDB

  • 1. NoSQL, NO PROBLEM: USING AZURE DOCUMENTDB { "name": "Ken Cenerelli", "twitter": "@KenCenerelli", "e-mail": "[email protected]", "hashtags": ["#DevTeach", "#DocumentDB"] }
  • 2. ABOUT ME Twitter: @KenCenerelli Email: [email protected] Blog: kencenerelli.wordpress.com LinkedI n: linkedin.com/in/kencenerelli Bio:  Content Developer / Programmer Writer  Microsoft MVP - Visual Studio and Development Technologies  Microsoft TechNet Wiki Guru  Co-Organizer of CTTDNUG  Technical reviewer of multiple booksCTTDNU G Ken Cenerelli 2
  • 3. ROAD MAP 1. Overview 2. The Resource Model 3. Modeling Your Data 4. Performance 5. Developing with DocumentDB & Demos {“aka”: “The Good Stuff”} 6. Pricing 7. Wrap-up 3
  • 4. WHAT IS NoSQL?  NoSQL → Not Only SQL  No up-front (schema) design  Easier to scale horizontally  Easier to develop iteratively  Types & Examples:  Document databases: DocumentDB, MongoDB, CouchDB  Key-value stores: Redis  Graph stores: Neo4J, Giraph  Wide-column: Cassandra, HBase 4
  • 5. WHAT IS AZURE DOCUMENTDB?  NoSQL document database fully managed by Microsoft Azure  Part of the NoSQL family of databases  For rapid development of cloud-designed apps (web, mobile, gaming, IoT)  Store and query schema agnostic JSON data with SQL-like grammar  Fast, predictable performance  Transactionally process multiple documents via native JavaScript processing  Tunable consistency levels  Built with familiar tools – REST, JSON, JavaScript 5
  • 6. 6
  • 7. WHERE DOES IT FIT IN THE AZURE FAMILY? 7
  • 8. WHEN TO USE DOCUMENTDB?  In General  You don’t want to do replication and scale-out by yourself  You want ACID transactions  You want to have tunable consistency  You want to do rapid development where models can evolve  You want to utilize your .NET, JavaScript and MongoDB skills  Compared to relational databases  You don’t want predefined columns  Compared to other document stores  You want to use a SQL-like grammar 8
  • 9. WHEN TO NOT USE DOCUMENTDB?  If your data has complex relationships  If your data has rigid schemas  If your data has complex transactions  If your data needs aggregation  If your data needs encrypted storage  If you’re planning to move your entire data store to DocumentDB  If you do not want your data to be locked into Azure 9
  • 10. DOCUMENTDB USE CASES  User generated content  Blog posts, chat sessions, ratings, comments, feedback, polls  Catalog data  User accounts, product catalogs, device registries for IoT  Logging and Time-series data  Event logs, input source for data analytics jobs performed offline  Gaming  In-game stats, social media integration, and high-score leaderboards  User preferences data  Modern web and mobile applications  IoT and Device sensor data  Ingest bursts of data from device sensors, ad-hoc querying and offline analytics 10
  • 13. RESOURCE MODEL JS JS JS 101 010 * collection != table of homogenous entities collection ~ a data partition
  • 14. RESOURCE MODEL 14 JS JS JS 101 010 { "id" : "123" "name" : "joe" "age" : 30 "address" : { "street" : "some st" } }
  • 16. RESOURCE ADDRESSING  Native REST Interface  Each resource has a permanent unique ID  API URL:  https://{database account}.documents.azure.com  Document Path:  /dbs/{database id}/colls/{collection id}/docs/{document id} 16
  • 17. DOCUMENTDB JSON DOCUMENTS JSON  Intersection of most modern type systems JSON values  Self-describable, self-contained values  Are trivially serialized to/from text 17 { "locations": [ {"country": "Germany", "city": "Berlin"}, {"country": "France", "city": "Paris"}, ], "headquarters": "Belgium", "exports":[{"city"; "Moscow"},{"city: "Athens"}] }; a JSON document, as a tree Locations Headquarte rs Belgium Country City Country City Germany Berlin France Paris Exports CityCity Moscow Athens 0 10 1
  • 18. DATA MODELING WITH RDBMS 18 Doing it the RDBMS way: normalize everything! To query for Person joins are needed to related tables: SELECT p.name, p.lastName, p.age, cd.detail, cdt.type, a.street, a.city, a.state, a.zip FROM Person p INNER JOIN Address a ON a.person_id = p.id INNER JOIN ContactDetail cd ON cd.person_id = p.id INNER JOIN ContactDetailType cdt ON cd.type_id = cdt.id multiple table updates
  • 19. DATA MODELING WITH DENORMALIZATION 19 { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] } Try to model your entity as a self- contained document Generally, use embedded data models when:  contains  one-to-few  changes infrequently  won’t grow  integral better read performance
  • 20. DATA MODELING WITH REFERENCING 20 In general, use normalized data models when:  Write performance is more important than read performance  Representing one-to-many relationships  Can representing many-to-many relationships  Related data changes frequently Provides more flexibility than embedding More round trips to read data { "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } } { "id: "contact_xyz", "userid": "xyz", "email" : "[email protected]" "phone" : "555 5555" } Normalizing typically provides better write performance
  • 21. HYBRID MODELS: DENORMALIZE + REFERENCE 21 No magic bullet! Think about how your data is going to be written and read then model accordingly { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] } Author document Book document
  • 22. DATA MODELLING TIPS  Map properties to JSON types  Prefer smaller documents (<16KB) for smaller footprint, less IO, lower RU charges  Maximum size is 512KB – watch unbounded arrays leading to document bloat  Store metadata on attachments, reference binary data/free text as external links  Prefer sparse properties – skip rather than explicit null  Use fullName = "Azure DocumentDB" instead of firstName = "Azure" AND lastName = "DocumentDB" 22
  • 23. TUNABLE CONSISTENCY  Set at the account level  Can be overridden at the query level  Levels:  Strong  Session (default option)  Bounded Staleness  Eventual 23 Strong consistency; slow write speeds Weak consistency; fast write speeds
  • 24. INDEXING  Automatic indexing of documents and its properties when added to the collection  Instantly queryable by property using a SQL-like grammar  No need to define secondary indices / schema hints for indexing 24 Indexing Modes Consistent  Default mode  Index updated synchronously on writes Lazy  Useful for bulk ingestion scenarios Indexing Policies Automatic  Default Manual  Can manually opt- out of automatic indexing Indexing Types Hash  For equality queries  Strings and numbers Range  For comparison queries  Numbers
  • 25. INDEXING POLICIES 25 Configuration Level Options Automatic Per collection True (default) or False Override with each document write Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion Included and excluded paths Per path Individual path or recursive includes (? And *) Indexing Type Per path Support Hash (Default) and Range Hash for equality, range for range queries Indexing Precision Per path Supports 3 – 7 per path Tradeoff storage, query RUs and write RUs
  • 26. DOCUMENTDB FOR DEVELOPERS  Promotes code-first development  Resilient to iterative schema changes  Low impedance as object / JSON store; no ORM required  Richer query and indexing  Has a REST API  Available SDKs and libraries:  .NET (LINQ to SQL is supported)  Node.js  JavaScript  Python  Java  JavaScript for server-side app logic 26
  • 27. QUERYING LIMITATION  Within a collection  Besides filtering, ORDER BY and TOP is supported  No aggregation yet  No COUNT  No GROUP BY  No SUM, AVG, etc. SQL for queries only  No batch UPDATE or DELETE or CREATE 27
  • 29. REQUEST UNITS  DocumentDB unit of scale  Throughput (in terms of rate of transactions / second)  Measured in Request Units (RUs)  1 RU = throughput for a 1KB document/second  2,000 requests per second allowed  “Request” depends on the size of the document  For example, uploading 1,000 large JSON documents might count as more than one request  Max throughput per collection, measured in RUs per second per collection, is 250,000 RUs/second 29
  • 30. REQUEST UNITS 30 Request Unit (RU) is the normalized currency % Memory % IOPS % CPU Replica gets a fixed budget of Request Units Resource Resource set Resource Resource DocumentsSQL sprocsargs Resource Resource Predictable Performance
  • 31. NOT ALL REQUEST UNITS ARE CREATED EQUALLY 31
  • 32. PRICING  Standard pricing tier with hourly billing  99.95% availability  Adjustable performance levels  Collections have 10 GB SSD room  Limit of 100 collections (1 TB) for each account – can be adjusted  http://bit.do/documentdb- pricing 32
  • 33. LIMITATIONS & QUOTA 33 Entity Quota Accounts 5 (soft) DBs / Account 100 Document storage per collection 250 GB Collections / DB 100 (soft) Request document size 512 KB Permissions / Account 2M Stored Procedures, Triggers & UDFs / collection 25 Max Execution Time / Stored Procedure or Trigger 5 seconds ID Length 255 chars AND, OR / query 20 https://azure.microsoft.com/en- us/documentation/articles/documentdb-limits/
  • 34. SUMMARY  Collections != Tables  De-normalize data where appropriate  Tuning / Performance  Consistency Levels  Indexing Policies  Understand Query Costs / Limits / Avoid Scans 34
  • 35. DESIGNING A DOCUMENTDB APP 1. 2. 3. 4. 5. 6.      35
  • 36. RESOURCES  Query Playground: aka.ms/docdbplayground  Data Import Tool: aka.ms/docdbimport  Docs & Tutorials: aka.ms/documentdb-docs  Code Samples: aka.ms/documentdb-samples  Cheat Sheet: aka.ms/docdbcheatsheet  Blog: aka.ms/documentdb-blog  Twitter: @documentdb 36
  • 37. QUESTIONS? 37 @KenCenerelli Ken_Cenerelli@Outlook. com Please complete the session evaluation to win prizes! CLD101: NoSQL, No Problem: Use Azure DocumentDB

Editor's Notes

  • #5: Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. Mongo, Azure DocumentDB. Work best with hierarchical documents that are entirely or almost entirely self contained. Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
  • #6: In DocumentDB, persist whole object in DB ; query the whole object and display it Service: no need for containers or VMs to run it; provision the service; has fine grain control b/c of this Indexing: builds the structure to start right away; can shape the data; no need to spend time on ERD or large schemas to persist data; indexes evolve as models do DocumentDB database can grow to hundred of terabytes or event petabytes; thousands of nodes in data centers
  • #8: Table Storage (Key Value) Azure Blob Storage can be used to store full user profiles including images. Azure Tables is cheap, scalable No SQL solution Good for key value patterns to be scored at scale Runs on spinning disk and has higher latency when getting to 95th percentile But no secondary indexes either
  • #9: Mongo protocol support now supports Mongo drivers Can leverage existing skills and tools to interact with a DocumentDB service This supports the native MongoDB wire protocol
  • #10: Complex relationships – obvious (denormalized data and flex schema) Rigid schema – for instance if your data was imported from XML docs Complex transactions – supports some but can’t cross boundaries Aggregation – since some properties could be present or not, it’s not a good fit for data aggregation Encrypted data storage – protocol is encrypted, data is not. No current in-built mechanisms for encrypted data storage Moving entire datastore to Azure DocumentDB – use as supplement not replacement Azure specific – although adding protocol support for MongoDB
  • #11: UGC in social media applications is a blend of free form text, properties, tags and relationships not bounded by rigid structure. Content such as chats, comments, and posts can be stored in DocumentDB without requiring transformations or complex object to relational mapping layers. Data properties can be added or modified easily to match requirements as developers iterate over the application code, thus promoting rapid development. Catalog Data - Attributes for this data may vary and can change over time to fit application requirements. Consider an example of a product catalog for an automotive parts supplier. Every part may have its own attributes in addition to the common attributes that all parts share. Furthermore, attributes for a specific part can change the following year when a new model is released. As a JSON document store, DocumentDB supports flexible schemas and allows you to represent data with nested properties, and thus it is well suited for storing product catalog data. Logging - Perform ad-hoc queries over a subset of data for troubleshooting. Subset of data is first retrieved from the logs, typically by time series. Then, a drill-down is performed by filtering the dataset with error levels or error messages. Long running data analytics jobs performed offline over a large volume of log data. Examples of this use case include server availability analysis, application error analysis, and clickstream data analysis. Typically, Hadoop is used to perform these types of analyses with data from DocumentDB using Hadoop connector Gaming - Handle updating profile and stats from millions of simultaneous gamers, millisecond reads and writes to help avoid any lags, automatic indexing allows for filtering against multiple different properties in real-time User preferences data - most modern web and mobile applications come with complex views and experiences. These views and experiences are usually dynamic, catering to user preferences or moods and branding needs. Hence, applications need to be able to retrieve personalized settings effectively in order to render UI elements and experiences quickly. IoT - Bursts of data can be ingested by Azure Event Hubs as it offers high throughput data ingestion with low latency. Data ingested that needs to be processed for real time insight can be funneled to Azure Stream Analytics for real time analytics. Data can be loaded into DocumentDB for ad-hoc querying. Once the data is loaded into DocumentDB, the data is ready to be queried. The data in DocumentDB can be used as reference data as part of real time analytics. In addition, data can further be refined and processed by connecting DocumentDB data to HDInsight for Pig, Hive or Map/Reduce jobs. Refined data is then loaded back to DocumentDB for reporting.
  • #12: Provides keys and resource location for databases and contents
  • #14: Container of JSON documents and the associated JavaScript application logic (stored proc, trigger, UDF) Collections are isolated and independent from one another. Queries run against a single collection and returns documents from that collection
  • #15: Read consistency level of documents follows the consistency policy on the database account
  • #16: DocumentDB cannot have poor performing code affect the server so it uses bounded execution to limit how long code can run
  • #19: Requires multiple reads Typically provides faster write speeds
  • #20: Data from entities are queried together Only making one round trip to server for all the data
  • #21: One to many relationships (unbounded) Blog article comments array could have millions of entries Many-to-many relationships For example, one speaker could have multiple sessions and each session could have multiple speakers Related data changes with differing volatility Document does not change but document with views or likes does
  • #22: Use Amazon/Best Buy product example Can have embedded product info that also contains current review summary so don’t have to retrieve each time When you want the rating summary or review you have different query
  • #24: Trade-off between performance and consistency Strong: Strong consistency guarantees that a write is only visible after it is committed durably by the majority quorum of replicas. Strong consistency provides absolute guarantees on data consistency, but offers the lowest level of read and write performance Session: ability to read your own writes. A read request for session consistency is issued against a replica that can serve the client requested version (part of the session cookie). Session consistency provides predictable read data consistency for a session while offering the lowest latency writes. Reads are also low latency as except in the rare cases, the read will be served by a single replica. Bounded staleness: provides more predictable behavior for read consistency while offering the lowest latency writes. As reads are acknowledged by a majority quorum, read latency is not the lowest offered by the system. This provides a stronger guarantee than Session or Eventual. Eventual: weakest form of consistency wherein a client may get the values which are older than the ones it had seen before, over time. In the absence of any further writes, the replicas within the group will eventually converge. The read request is served by any secondary index. Eventual consistency provides the weakest read consistency but offers the lowest latency for both reads and writes.
  • #27: DocumentDB supports a RESTful protocol Any DocumentDB operation can be performed from any HTTP client as long as request URL points to valid DocumentDB resource and the headers contain the required authentication info
  • #30: Throughput = amount of items passing through a system
  • #32: Every request you do has a response that displays the RequestUnit charge whether action succeeds or fails Can see charge in QueryExplorer Writes are more expensive than reads
  • #33: Standard – pay as you go, allows partitioning, storage measured in GB, 400-250K RUs, 250GB storage max (can be increased) S1, S2, S3 – predefined. 10GB storage max, throughput max 2.5K Rus Can have mix of S1, S2 and S3 collections in a DB; can archive data in an S1 and keep active data in an S3 Once you create an account there is a Minimum S1 charge ($25) even with no collections created But when you create a collection you satisfy the charge and no longer have the $25 fee