SlideShare a Scribd company logo
NoSQL Data Modeling Using JSON Documents –
A Practical Approach
David Segleau
Dir.Technical Product Marketing
Couchbase
©2016 Couchbase Inc. 2
About the speaker – David Segleau
David Segleau
DirectorTechnical Product Marketing
Couchbase (since Nov 2015)
Experience:
- Database guy
- Couchbase, Oracle, Sleepycat, Informix, Illustra,Teradata
- Tech Marketing,VP Eng, Prod Mgmt, QA, Support,Training, Docs
- Technology is only useful when it’s deployed
- Expertise:
- Database server technology, RDBMS, and NoSQL
©2016 Couchbase Inc. 3
Today’s agenda
§ What is Couchbase?
§ Why NoSQL?
§ Identifying the right application
§ Modeling your data
§ Accessing your data
§ Migrating your data
§ Q & A
©2016 Couchbase Inc. 4
What is Couchbase?
Couchbase delivers the Data Platform for the Digital Economy
• Products: Couchbase Server & Couchbase Mobile
• Open source NoSQL, JSON document database
• Founded 2010
• 500+ enterprise customers, including 20+ Fortune 100
UNIFIED ADMINISTRATION
UNIFIED PROGRAMMING INTERFACE
Data Query Index SearchMobileReplication Analytics
{N1QL}
©2016 Couchbase Inc. 5
Who is using Couchbase?
6 of the top 10
ECOMMERCE
COMPANIES
IN THE US
3 of the 3
GDS COMPANIES
3 of the 10
AIRLINES
6 of the top 10
US & EUROPEAN
BROADCAST
COMPANIES
6 of the top 10
ONLINE CASINO
GAMING
COMPANIES
6 of the top 10
FIN SERVICES
COMPANIES
IN THE US
©2016 Couchbase Inc. 6
Who is using Couchbase?
§ Gannett, publisher of 90+ media properties, replaced relational database
technology with NoSQL to power its digital publishing platform.
§ eBay, with over 2 billion page views per day, uses Couchbase + RDBMS for
their Listing cache, and Couchbase as database of record forToken
management.
§ Cars.com, with over 30 million visits per month, replaced SQL Server with
NoSQL to store customer and vehicle data.
§ Marriott deployed NoSQL to modernize its hotel reservation system that
supports $38 billion in annual bookings.
§ Equifax uses Couchbase to generate insights from historic credit data,
leveraging the JSON documents to represent complex data objects without
normalization.
©2016 Couchbase Inc. 7
What is NoSQL?
§ No SQL?
§ Not only SQL?
üNon relational
§ Distributed (most)
– Scaled out, not up
• Elasticity and commodity hardware
– Partitioned and replicated
• Scalability, performance, availability
§ Schema-less (most)
– Flexible model
– JSON (some)
§ Multi-model
– Key-value & Document
– Columnar & Graph
– Graph & Key-value
©2016 Couchbase Inc. 8
Why are they using NoSQL?
Technology Drivers
§ Customers are going online
§ The internet is connecting everything
§ Big Data is getting bigger
§ Applications are moving to the cloud
§ The world has gone mobile
Technical Needs
§ Develop with agility
– Flexibility + Simplicity
– Easier + Faster
§ Operate at any scale
– Elasticity + Availability
– Performance at scale
– Always-on, global deployment
Business Needs
§ Innovate and compete
– Faster time to market
– Reduced costs (operational + hardware)
– Increased revenue
©2016 Couchbase Inc. 9
NoSQL vs. RDBMS
§ Replace or Complement? à It depends
– Replace: NoSQL is often the operational
database of record
– Complement: NoSQL adds perf, scale, and
availability to legacy RDBMS
§ Most customers use RDBMS and NoSQL
§ NoSQL is adding RDBMS features
– Security, Query Language, Analytics
§ RDBMS is adding NoSQL features
– Sharding, JSON, Distributed Processing
©2016 Couchbase Inc. 10
Why migrate from an RDBMS to NoSQL?
§ Easier to scale
3 nodes to 100s, 1 data center to many, commodity hardware
§ Better performance
Integrated caching, memory-optimized indexes, memory-based replication
§ Up to 40x lower cost
Open source, subscription-based, per instance (not per core)
§ Greater agility
JSON-based data model, SQL-based query language
§ Cross-platform
Runs onWindows or Linux (Red Hat, Ubuntu, Debian, etc.)
©2016 Couchbase Inc. 11
How do you get started?
1. Identify the right application
2. Model your data
3. Access your data
4. Migrate your data
5. Q&A
©2016 Couchbase Inc. 12
Identifying the right application
©2016 Couchbase Inc. 13
Identifying the right application
Have one or more of the following characteristics or requirements:
ü Innovate and iterate faster
ü Send and receive JSON
ü Provide low latency at any throughput
ü Support many concurrent users
ü Supports users anywhere and everywhere
ü Be available 24x7
ü Store terabytes of data
ü Read and write to multiple data centers
Service
RDBMS
Service Service
NoSQL
Application
Examples:
Ø High performance, high availability caching service
Ø Independent application with a narrow scope
Ø Logical or physical service within a large application
Ø Global service that powers multiple applications
©2016 Couchbase Inc. 14
Model your data
©2016 Couchbase Inc. 15
Demystifying terminology
Relational NoSQL (Couchbase)
Failover Cluster Cluster
Availability Group Cluster
Database Bucket
Table Bucket
Row (Tuple) Document (JSON)
Primary Key Object ID
IDENTITY or Sequence Counter
IndexedView View
SQL N1QL
©2016 Couchbase Inc. 16
Data Modeling Approaches
NoSQL
Relaxed Normalization
schema implied by structure
fields may be empty, duplicate, or missing
Relational
Required Normalization
schema enforced by DB
same fields in all records
• Minimize data inconsistencies (one item = one location)
• Reduced duplicated data
• Preserve storage resources
• Optimized based on access patterns
• Flexible, based on application requirements
• Supports clustered architecture
• Reduced server overhead
©2016 Couchbase Inc. 17
What and Why JSON?
17
• What is JSON?
– Schema flexibility
– Lightweight data interchange format
– Based on JavaScript
– Programming language independent
– Field names must be unique
• Why JSON?
– Less verbose
– Can represent Objects and Arrays
(including nested documents)
No impedance mismatch between a JSON Document and a Java Object
©2016 Couchbase Inc. 18
Modeling your data: Fixed vs. self-describing schema
©2016 Couchbase Inc. 19
Modeling your data:The flexibility of JSON
Same document type,
Different fields
• Different types
• Optional
• On demand
Tip: Add a version field to track changes.
{“docType”: “user”, “docVersion”: “1”, …}
{“docType”: “user”, “docVersion”: “2”, …}
©2016 Couchbase Inc. 20
Modeling your data: Changing the data model
Relational database
• Modify the database schema
• Modify the application code (e.g., Java)
• Modify the interface (e.g., HTML5/JS)
Document database
• Modify the interface (e.g., HTML5/JS)
©2016 Couchbase Inc. 21
Modeling your data: Object IDs
Best Practices
• Natural Keys
• Human Readable
• Deterministic
• Semantic
Examples
• author::shane
• author::shane::blogs
• blog::nosql_fueled_hadoop
• blog::nosql_fueled_hadoop::comments
What about identity columns?
1. Document<Long> nextAuthorIdDoc = bucket.counter(“authorIdCounter”, 1);
2. Long nextAuthorId = nextAuthorIdDoc.content();
3. String authDocId = “author::” + nextAuthorId; // author::101
Tip: Increment the counter by 10, 20, etc. instead of doing it for every insert.
©2016 Couchbase Inc. 22
Modeling your data: Relationships
Author
Blog (FK)Blog (FK)
Comment (FK) Comment (FK)
Author (FK x2)
BlogBlog (FK x2)
Comment Comment
Bottom up/”BelongsTo” Top down/”Has”
©2016 Couchbase Inc. 23
Modeling your data: Relationships - Related or Nested
©2016 Couchbase Inc. 24
Modeling your data: Strategies and best practices
If … Then …
Relationship is one-to-one or one-to-many Store related data as nested objects
Relationship is many-to-one or many-to-many Store related data as separate documents
Data reads are mostly parent fields Store children as separate documents
Data reads are mostly parent + child fields Store children as nested objects
Data writes are mostly parent or child (not both) Store children as separate documents
Data writes are mostly parent and child (both) Store children as nested objects
©2016 Couchbase Inc. 25
Modeling your data: Strategies and best practices
§ Are there a lot of concurrent writes, continuous updates?
§ Store children as separate documents
Blog
§ Thread
§ Comment
§ Comment
§ Thread
§ Comment
§ Comment
Blog
{
“docType”: “blog”,
“author”: “author::shane”,
“title”: “Couchbase Wins”,
“threads”: [
“blog::couchbase_wins::threads::001”,
“blog::couchbase_wins::threads::002”
}
Thread
{
“docType”: “thread”,
“comments”: [
{
“visitor”: “Brendan Bond”,
“text”: “This blog is amazing!”
“replies”: [
{
“user”: “Dustin Johnson”,
“text”: “No, it is not.”
}]
}
}
©2016 Couchbase Inc. 26
Some JSON Design Choices
26
• Couchbase Server neither enforces nor validates for any particular document
structure
• Choices that impact JSON document design:
– Single Root Attributes vs. Document type
– Objects vs. Arrays
– Array ElementTypes
– Timestamp Formats
– Property Names
– Empty and Null PropertyValuesVS Missing Properties
– JSON Schema Options
• See "Agile document modeling and data structures“ from Couchbase
Connect16 On-Demand Recordings
©2016 Couchbase Inc. 27
Access your data
©2016 Couchbase Inc. 28
Accessing your data: Options
Key-Value
(CRUD)
N1QL
(Query)
Views
(Query)
Documents
Indexes MapReduce
FullText
(Search)
Geospatial
(Search)
We’ll focus on N1QL ]for now.
Indexes MapReduce
©2016 Couchbase Inc. 29
Accessing your data – N1QL queries: Capabilities
Feature SQL N1QL
JOIN ✔ ✔
TRANSFORM ✔ ✔
FILTER ✔ ✔
AGGREGATE ✔ ✔
SORT ✔ ✔
SUBQUERIES ✔ ✔
PAGINATION ✔ ✔
OPERATORS ✔ ✔
FUNCTIONS ✔ ✔
©2016 Couchbase Inc. 30
Accessing your data: N1QL queries – referenced data
©2016 Couchbase Inc. 31
Accessing your data: N1QL queries – nested data
©2016 Couchbase Inc. 32
Accessing your data: N1QL queries – CRUD
©2016 Couchbase Inc. 33
Accessing your data: N1QL queries – indexes
Simple
Compound
Functional
Partial
©2016 Couchbase Inc. 34
Couchbase Index Options
34
IndexType Description
1 Primary Index Index on the document key on the whole bucket
2 Simple Index Index on the key-value or document-key
3 Composite Index Index on more than one key-value
4 Functional Index Index on function or expression on key-values
5 Partial Index Index subset of items in the bucket -- usesWHERE clause
6 Array Index Index individual elements of the arrays
7 Memory Optimized
Index
Index that is pinned in memory – defined when the cluster is configured
8 Covering Index Query able to resolve the query 100% within the index
9 Duplicate Index Ability to create a copy of the index on specific nodes within the cluster,
thereby providing load balancing and failover – usesWITH { “nodes”: } clause
©2016 Couchbase Inc. 35
Accessing your data: Indexing Considerations
Relational Couchbase
Indexes are synchronous, index & data are in
sync
Indexes are asynchronous, index updates lag
behind the data, application specifies read
consistency
Indexes slow down write operations Indexes do not affect write throughput
Index load balancing for queries can only be
implemented in the application
Index load balancing for queries is automatic,
based on index signature
Indexes contend with other memory usage
Memory Optimized indexes are pinned in
memory and provides low-latency, high
mutation throughput
©2016 Couchbase Inc. 36
Understanding your Query Plan: Explain
§ EXPLAIN shows the query plan, i.e exact steps how N1QL
plans to execute the query
cbq> EXPLAIN INSERT INTO default VALUES ("1", { "make" : "Toyota"});
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "ValueScan",
"values": "[["1", {""make"": "Toyota"}]]"
},
{
"#operator": "Parallel",
"maxParallelism": 1,
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "SendInsert",
©2016 Couchbase Inc. 37
Accessing your data: Strategies and best practices
Concept Strategies & Best Practices
Key-Value Operations provide the best
possible performance
• Create an effective key naming strategy
• Create an optimized data model
Incremental MapReduce (Views) are well
suited to aggregation
• Ideal for large data sets
• Data set can be used to create complex
view indexes
N1QL queries provide the most flexibility –
everything else
• Query data regardless of how it is modeled
• Remember to create secondary indexes,
leverage covering indexes where possible
©2016 Couchbase Inc. 38
Migrate your data
©2016 Couchbase Inc. 39
So many options! Remember the KISS principle
1) Identify the requirements
• ETL vs. Data cleanse vs. Data enrichment
• Duration vs. Resources
• Data governance
2) Pick your strategy
• Batch vs. Incremental
• Single threaded vs. multi-threaded
3) Pick your tools
• Data migration tools (Informatica, Looker,
Talend)
• BYO-tool (PHP & Python scripts, Hadoop, Spark)
• KISS with Couchbase
• Export to CVS; Import as documents; Use
N1QL to transform & insert into new
bucket
• Use SQL to transform & export; Insert into
Couchbase
• Best Practices
• Align with your data model
• Plan for failure (bad source data, hardware
failure, resource limitations)
• Ensure interruptible, restartable, logged,
predictable
©2016 Couchbase Inc. 40
How can you sync NoSQL and relational?
§ 1. Application Code (Manual)
§ 2. Replication (Automatic)
– From NoSQL to relational
– From relational to NoSQL
Couchbase
Kafka
Queue
Producer Consumer RDBMSDCP
Stream
RDBMS Handler CouchbaseGoldenGate
https://github.com/mahurtado/CouchbaseGoldenGateAdapter
©2016 Couchbase Inc. 41
Data Modeling Best Practices Recap
• Pick the right application
• Focus on SOA, application/use case specific
• Drive data model from data access patterns
• Use Document type,Versionid
• Create optimized, understandable keys
• Weigh nested, referenced or mixed designs
• Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory
Optimized
• Match the data access method to requirements
• N1QL, Key-value,Views,
• Proof of Concept
• Focus, Success Criteria, Review Architecture
©2016 Couchbase Inc. 42
Questions?
©2016 Couchbase Inc. 43
Want to learn more?
Getting Started guide:
http://www.couchbase.com/get-started-developing-nosql
Download Couchbase software:
http://www.couchbase.com/nosql-databases/downloads
Free OnlineTraining
http://training.couchbase.com/online
“Why NoSQL” white paper
http://www.couchbase.com/nosql-resources/why-nosql
©2016 Couchbase Inc. 44
Additional Resources
44
§ General Docs: http://docs.couchbase.com
§ Developer Portal: http://developer.couchbase.com
§ Couchbase Labs: https://github.com/couchbaselabs
§ Query Portal: http://query.couchbase.com
§ Sample Applications:
§ https://github.com/couchbaselabs?utf8=%E2%9C%93&query=try
§ https://github.com/couchbaselabs?utf8=%E2%9C%93&query=beer
§ Blog: http://blog.couchbase.com
§ Forum: http://forums.couchbase.com
©2016 Couchbase Inc. 45
Additional Resources – Data Modeling
45
Webinar:The Why,When, and How of NoSQL: A Practical Approach
Webinar: Relational to NoSQL: How to Get Started from SQL Server
Presentation: Data Modeling with Couchbase Server
Connect16 On Demand Recordings
• Agile document modeling and data structures
• Migrating from relational – Data modeling and access
• LINQing to data: Easing the transition from SQL
• Tuning for Performance: Indexes and Queries
Documentation: Data Modeling with JSON
Training class: CD210 Couchbase NoSQL Data Modeling, Querying, andTuning Using
N1QL
©2016 Couchbase Inc. 46
Thank you

More Related Content

What's hot (20)

PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Snowflake Company Presentation
AndrewJiang18
 
PPTX
Key-Value NoSQL Database
Heman Hosainpana
 
PDF
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Databricks
 
PPTX
MySQL Multi Master Replication
Moshe Kaplan
 
PPTX
Dynamodb Presentation
advaitdeo
 
ODP
Nonrelational Databases
Udi Bauman
 
PDF
Session découverte de la Data Virtualization
Denodo
 
PDF
The delta architecture
Prakash Chockalingam
 
PPTX
SQL Azure the database in the cloud
Eduardo Castro
 
PDF
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
PPTX
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
PDF
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summits
 
PDF
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
PDF
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
confluent
 
PDF
Elastic Observability
FaithWestdorp
 
PPTX
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PPTX
Liquibase case study
Vivek Dhayalan
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Snowflake Company Presentation
AndrewJiang18
 
Key-Value NoSQL Database
Heman Hosainpana
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Databricks
 
MySQL Multi Master Replication
Moshe Kaplan
 
Dynamodb Presentation
advaitdeo
 
Nonrelational Databases
Udi Bauman
 
Session découverte de la Data Virtualization
Denodo
 
The delta architecture
Prakash Chockalingam
 
SQL Azure the database in the cloud
Eduardo Castro
 
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summits
 
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
Architecture Patterns for Event Streaming (Nick Dearden, Confluent) London 20...
confluent
 
Elastic Observability
FaithWestdorp
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Liquibase case study
Vivek Dhayalan
 

Similar to Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach (20)

PDF
The Why, When, and How of NoSQL - A Practical Approach
DATAVERSITY
 
PDF
Slides: Moving from a Relational Model to NoSQL
DATAVERSITY
 
PDF
moving_from_relational_to_nosql_couchbase_2016
Richard (Rick) Nelson
 
PDF
Couchbase overview033113long
Jeff Harris
 
PDF
Couchbase overview033113long
Jeff Harris
 
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
PDF
Couchbase Overview Nov 2013
Jeff Harris
 
PPTX
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Keshav Murthy
 
PPTX
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
PPTX
Json data modeling june 2017 - pittsburgh tech fest
Matthew Groves
 
PDF
Softshake 2013: Introduction to NoSQL with Couchbase
Tugdual Grall
 
PDF
No sql data-storage for-your-ios-apps-using-couchbase-mobile
Priya Rajagopal
 
PPTX
I Have a NoSQL toaster - DC - August 2017
Matthew Groves
 
PDF
NoSQL - Vital Open Source Ingredient for Modern Success
Arun Gupta
 
PDF
NoSQL, the Vital Open Source Ingredient for Modern Success
All Things Open
 
ODP
Couchbase - Introduction
Knoldus Inc.
 
PPTX
Query in Couchbase. N1QL: SQL for JSON
Keshav Murthy
 
PDF
JSON Data Modeling in Document Database
DATAVERSITY
 
PDF
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
Ortus Solutions, Corp
 
PDF
I have a NoSQL Toaster - ConnectJS - October 2016
Matthew Groves
 
The Why, When, and How of NoSQL - A Practical Approach
DATAVERSITY
 
Slides: Moving from a Relational Model to NoSQL
DATAVERSITY
 
moving_from_relational_to_nosql_couchbase_2016
Richard (Rick) Nelson
 
Couchbase overview033113long
Jeff Harris
 
Couchbase overview033113long
Jeff Harris
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
Couchbase Overview Nov 2013
Jeff Harris
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Keshav Murthy
 
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Json data modeling june 2017 - pittsburgh tech fest
Matthew Groves
 
Softshake 2013: Introduction to NoSQL with Couchbase
Tugdual Grall
 
No sql data-storage for-your-ios-apps-using-couchbase-mobile
Priya Rajagopal
 
I Have a NoSQL toaster - DC - August 2017
Matthew Groves
 
NoSQL - Vital Open Source Ingredient for Modern Success
Arun Gupta
 
NoSQL, the Vital Open Source Ingredient for Modern Success
All Things Open
 
Couchbase - Introduction
Knoldus Inc.
 
Query in Couchbase. N1QL: SQL for JSON
Keshav Murthy
 
JSON Data Modeling in Document Database
DATAVERSITY
 
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
Ortus Solutions, Corp
 
I have a NoSQL Toaster - ConnectJS - October 2016
Matthew Groves
 
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
PDF
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
PDF
Exploring Levels of Data Literacy
DATAVERSITY
 
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
PDF
Make Data Work for You
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
PDF
Data Modeling Fundamentals
DATAVERSITY
 
PDF
Showing ROI for Your Analytic Project
DATAVERSITY
 
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
PDF
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
PDF
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
PDF
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
PDF
2023 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
Data Strategy Best Practices
DATAVERSITY
 
PDF
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
PDF
Data Management Best Practices
DATAVERSITY
 
PDF
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Ad

Recently uploaded (20)

PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PPTX
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
PPTX
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
Practical Applications of AI in Local Government
OnBoard
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
reInforce 2025 Lightning Talk - Scott Francis.pptx
ScottFrancis51
 
𝙳𝚘𝚠𝚗𝚕𝚘𝚊𝚍—Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 

Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach

  • 1. NoSQL Data Modeling Using JSON Documents – A Practical Approach David Segleau Dir.Technical Product Marketing Couchbase
  • 2. ©2016 Couchbase Inc. 2 About the speaker – David Segleau David Segleau DirectorTechnical Product Marketing Couchbase (since Nov 2015) Experience: - Database guy - Couchbase, Oracle, Sleepycat, Informix, Illustra,Teradata - Tech Marketing,VP Eng, Prod Mgmt, QA, Support,Training, Docs - Technology is only useful when it’s deployed - Expertise: - Database server technology, RDBMS, and NoSQL
  • 3. ©2016 Couchbase Inc. 3 Today’s agenda § What is Couchbase? § Why NoSQL? § Identifying the right application § Modeling your data § Accessing your data § Migrating your data § Q & A
  • 4. ©2016 Couchbase Inc. 4 What is Couchbase? Couchbase delivers the Data Platform for the Digital Economy • Products: Couchbase Server & Couchbase Mobile • Open source NoSQL, JSON document database • Founded 2010 • 500+ enterprise customers, including 20+ Fortune 100 UNIFIED ADMINISTRATION UNIFIED PROGRAMMING INTERFACE Data Query Index SearchMobileReplication Analytics {N1QL}
  • 5. ©2016 Couchbase Inc. 5 Who is using Couchbase? 6 of the top 10 ECOMMERCE COMPANIES IN THE US 3 of the 3 GDS COMPANIES 3 of the 10 AIRLINES 6 of the top 10 US & EUROPEAN BROADCAST COMPANIES 6 of the top 10 ONLINE CASINO GAMING COMPANIES 6 of the top 10 FIN SERVICES COMPANIES IN THE US
  • 6. ©2016 Couchbase Inc. 6 Who is using Couchbase? § Gannett, publisher of 90+ media properties, replaced relational database technology with NoSQL to power its digital publishing platform. § eBay, with over 2 billion page views per day, uses Couchbase + RDBMS for their Listing cache, and Couchbase as database of record forToken management. § Cars.com, with over 30 million visits per month, replaced SQL Server with NoSQL to store customer and vehicle data. § Marriott deployed NoSQL to modernize its hotel reservation system that supports $38 billion in annual bookings. § Equifax uses Couchbase to generate insights from historic credit data, leveraging the JSON documents to represent complex data objects without normalization.
  • 7. ©2016 Couchbase Inc. 7 What is NoSQL? § No SQL? § Not only SQL? üNon relational § Distributed (most) – Scaled out, not up • Elasticity and commodity hardware – Partitioned and replicated • Scalability, performance, availability § Schema-less (most) – Flexible model – JSON (some) § Multi-model – Key-value & Document – Columnar & Graph – Graph & Key-value
  • 8. ©2016 Couchbase Inc. 8 Why are they using NoSQL? Technology Drivers § Customers are going online § The internet is connecting everything § Big Data is getting bigger § Applications are moving to the cloud § The world has gone mobile Technical Needs § Develop with agility – Flexibility + Simplicity – Easier + Faster § Operate at any scale – Elasticity + Availability – Performance at scale – Always-on, global deployment Business Needs § Innovate and compete – Faster time to market – Reduced costs (operational + hardware) – Increased revenue
  • 9. ©2016 Couchbase Inc. 9 NoSQL vs. RDBMS § Replace or Complement? à It depends – Replace: NoSQL is often the operational database of record – Complement: NoSQL adds perf, scale, and availability to legacy RDBMS § Most customers use RDBMS and NoSQL § NoSQL is adding RDBMS features – Security, Query Language, Analytics § RDBMS is adding NoSQL features – Sharding, JSON, Distributed Processing
  • 10. ©2016 Couchbase Inc. 10 Why migrate from an RDBMS to NoSQL? § Easier to scale 3 nodes to 100s, 1 data center to many, commodity hardware § Better performance Integrated caching, memory-optimized indexes, memory-based replication § Up to 40x lower cost Open source, subscription-based, per instance (not per core) § Greater agility JSON-based data model, SQL-based query language § Cross-platform Runs onWindows or Linux (Red Hat, Ubuntu, Debian, etc.)
  • 11. ©2016 Couchbase Inc. 11 How do you get started? 1. Identify the right application 2. Model your data 3. Access your data 4. Migrate your data 5. Q&A
  • 12. ©2016 Couchbase Inc. 12 Identifying the right application
  • 13. ©2016 Couchbase Inc. 13 Identifying the right application Have one or more of the following characteristics or requirements: ü Innovate and iterate faster ü Send and receive JSON ü Provide low latency at any throughput ü Support many concurrent users ü Supports users anywhere and everywhere ü Be available 24x7 ü Store terabytes of data ü Read and write to multiple data centers Service RDBMS Service Service NoSQL Application Examples: Ø High performance, high availability caching service Ø Independent application with a narrow scope Ø Logical or physical service within a large application Ø Global service that powers multiple applications
  • 14. ©2016 Couchbase Inc. 14 Model your data
  • 15. ©2016 Couchbase Inc. 15 Demystifying terminology Relational NoSQL (Couchbase) Failover Cluster Cluster Availability Group Cluster Database Bucket Table Bucket Row (Tuple) Document (JSON) Primary Key Object ID IDENTITY or Sequence Counter IndexedView View SQL N1QL
  • 16. ©2016 Couchbase Inc. 16 Data Modeling Approaches NoSQL Relaxed Normalization schema implied by structure fields may be empty, duplicate, or missing Relational Required Normalization schema enforced by DB same fields in all records • Minimize data inconsistencies (one item = one location) • Reduced duplicated data • Preserve storage resources • Optimized based on access patterns • Flexible, based on application requirements • Supports clustered architecture • Reduced server overhead
  • 17. ©2016 Couchbase Inc. 17 What and Why JSON? 17 • What is JSON? – Schema flexibility – Lightweight data interchange format – Based on JavaScript – Programming language independent – Field names must be unique • Why JSON? – Less verbose – Can represent Objects and Arrays (including nested documents) No impedance mismatch between a JSON Document and a Java Object
  • 18. ©2016 Couchbase Inc. 18 Modeling your data: Fixed vs. self-describing schema
  • 19. ©2016 Couchbase Inc. 19 Modeling your data:The flexibility of JSON Same document type, Different fields • Different types • Optional • On demand Tip: Add a version field to track changes. {“docType”: “user”, “docVersion”: “1”, …} {“docType”: “user”, “docVersion”: “2”, …}
  • 20. ©2016 Couchbase Inc. 20 Modeling your data: Changing the data model Relational database • Modify the database schema • Modify the application code (e.g., Java) • Modify the interface (e.g., HTML5/JS) Document database • Modify the interface (e.g., HTML5/JS)
  • 21. ©2016 Couchbase Inc. 21 Modeling your data: Object IDs Best Practices • Natural Keys • Human Readable • Deterministic • Semantic Examples • author::shane • author::shane::blogs • blog::nosql_fueled_hadoop • blog::nosql_fueled_hadoop::comments What about identity columns? 1. Document<Long> nextAuthorIdDoc = bucket.counter(“authorIdCounter”, 1); 2. Long nextAuthorId = nextAuthorIdDoc.content(); 3. String authDocId = “author::” + nextAuthorId; // author::101 Tip: Increment the counter by 10, 20, etc. instead of doing it for every insert.
  • 22. ©2016 Couchbase Inc. 22 Modeling your data: Relationships Author Blog (FK)Blog (FK) Comment (FK) Comment (FK) Author (FK x2) BlogBlog (FK x2) Comment Comment Bottom up/”BelongsTo” Top down/”Has”
  • 23. ©2016 Couchbase Inc. 23 Modeling your data: Relationships - Related or Nested
  • 24. ©2016 Couchbase Inc. 24 Modeling your data: Strategies and best practices If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to-many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects
  • 25. ©2016 Couchbase Inc. 25 Modeling your data: Strategies and best practices § Are there a lot of concurrent writes, continuous updates? § Store children as separate documents Blog § Thread § Comment § Comment § Thread § Comment § Comment Blog { “docType”: “blog”, “author”: “author::shane”, “title”: “Couchbase Wins”, “threads”: [ “blog::couchbase_wins::threads::001”, “blog::couchbase_wins::threads::002” } Thread { “docType”: “thread”, “comments”: [ { “visitor”: “Brendan Bond”, “text”: “This blog is amazing!” “replies”: [ { “user”: “Dustin Johnson”, “text”: “No, it is not.” }] } }
  • 26. ©2016 Couchbase Inc. 26 Some JSON Design Choices 26 • Couchbase Server neither enforces nor validates for any particular document structure • Choices that impact JSON document design: – Single Root Attributes vs. Document type – Objects vs. Arrays – Array ElementTypes – Timestamp Formats – Property Names – Empty and Null PropertyValuesVS Missing Properties – JSON Schema Options • See "Agile document modeling and data structures“ from Couchbase Connect16 On-Demand Recordings
  • 27. ©2016 Couchbase Inc. 27 Access your data
  • 28. ©2016 Couchbase Inc. 28 Accessing your data: Options Key-Value (CRUD) N1QL (Query) Views (Query) Documents Indexes MapReduce FullText (Search) Geospatial (Search) We’ll focus on N1QL ]for now. Indexes MapReduce
  • 29. ©2016 Couchbase Inc. 29 Accessing your data – N1QL queries: Capabilities Feature SQL N1QL JOIN ✔ ✔ TRANSFORM ✔ ✔ FILTER ✔ ✔ AGGREGATE ✔ ✔ SORT ✔ ✔ SUBQUERIES ✔ ✔ PAGINATION ✔ ✔ OPERATORS ✔ ✔ FUNCTIONS ✔ ✔
  • 30. ©2016 Couchbase Inc. 30 Accessing your data: N1QL queries – referenced data
  • 31. ©2016 Couchbase Inc. 31 Accessing your data: N1QL queries – nested data
  • 32. ©2016 Couchbase Inc. 32 Accessing your data: N1QL queries – CRUD
  • 33. ©2016 Couchbase Inc. 33 Accessing your data: N1QL queries – indexes Simple Compound Functional Partial
  • 34. ©2016 Couchbase Inc. 34 Couchbase Index Options 34 IndexType Description 1 Primary Index Index on the document key on the whole bucket 2 Simple Index Index on the key-value or document-key 3 Composite Index Index on more than one key-value 4 Functional Index Index on function or expression on key-values 5 Partial Index Index subset of items in the bucket -- usesWHERE clause 6 Array Index Index individual elements of the arrays 7 Memory Optimized Index Index that is pinned in memory – defined when the cluster is configured 8 Covering Index Query able to resolve the query 100% within the index 9 Duplicate Index Ability to create a copy of the index on specific nodes within the cluster, thereby providing load balancing and failover – usesWITH { “nodes”: } clause
  • 35. ©2016 Couchbase Inc. 35 Accessing your data: Indexing Considerations Relational Couchbase Indexes are synchronous, index & data are in sync Indexes are asynchronous, index updates lag behind the data, application specifies read consistency Indexes slow down write operations Indexes do not affect write throughput Index load balancing for queries can only be implemented in the application Index load balancing for queries is automatic, based on index signature Indexes contend with other memory usage Memory Optimized indexes are pinned in memory and provides low-latency, high mutation throughput
  • 36. ©2016 Couchbase Inc. 36 Understanding your Query Plan: Explain § EXPLAIN shows the query plan, i.e exact steps how N1QL plans to execute the query cbq> EXPLAIN INSERT INTO default VALUES ("1", { "make" : "Toyota"}); "plan": { "#operator": "Sequence", "~children": [ { "#operator": "ValueScan", "values": "[["1", {""make"": "Toyota"}]]" }, { "#operator": "Parallel", "maxParallelism": 1, "~child": { "#operator": "Sequence", "~children": [ { "#operator": "SendInsert",
  • 37. ©2016 Couchbase Inc. 37 Accessing your data: Strategies and best practices Concept Strategies & Best Practices Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Incremental MapReduce (Views) are well suited to aggregation • Ideal for large data sets • Data set can be used to create complex view indexes N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Remember to create secondary indexes, leverage covering indexes where possible
  • 38. ©2016 Couchbase Inc. 38 Migrate your data
  • 39. ©2016 Couchbase Inc. 39 So many options! Remember the KISS principle 1) Identify the requirements • ETL vs. Data cleanse vs. Data enrichment • Duration vs. Resources • Data governance 2) Pick your strategy • Batch vs. Incremental • Single threaded vs. multi-threaded 3) Pick your tools • Data migration tools (Informatica, Looker, Talend) • BYO-tool (PHP & Python scripts, Hadoop, Spark) • KISS with Couchbase • Export to CVS; Import as documents; Use N1QL to transform & insert into new bucket • Use SQL to transform & export; Insert into Couchbase • Best Practices • Align with your data model • Plan for failure (bad source data, hardware failure, resource limitations) • Ensure interruptible, restartable, logged, predictable
  • 40. ©2016 Couchbase Inc. 40 How can you sync NoSQL and relational? § 1. Application Code (Manual) § 2. Replication (Automatic) – From NoSQL to relational – From relational to NoSQL Couchbase Kafka Queue Producer Consumer RDBMSDCP Stream RDBMS Handler CouchbaseGoldenGate https://github.com/mahurtado/CouchbaseGoldenGateAdapter
  • 41. ©2016 Couchbase Inc. 41 Data Modeling Best Practices Recap • Pick the right application • Focus on SOA, application/use case specific • Drive data model from data access patterns • Use Document type,Versionid • Create optimized, understandable keys • Weigh nested, referenced or mixed designs • Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized • Match the data access method to requirements • N1QL, Key-value,Views, • Proof of Concept • Focus, Success Criteria, Review Architecture
  • 42. ©2016 Couchbase Inc. 42 Questions?
  • 43. ©2016 Couchbase Inc. 43 Want to learn more? Getting Started guide: http://www.couchbase.com/get-started-developing-nosql Download Couchbase software: http://www.couchbase.com/nosql-databases/downloads Free OnlineTraining http://training.couchbase.com/online “Why NoSQL” white paper http://www.couchbase.com/nosql-resources/why-nosql
  • 44. ©2016 Couchbase Inc. 44 Additional Resources 44 § General Docs: http://docs.couchbase.com § Developer Portal: http://developer.couchbase.com § Couchbase Labs: https://github.com/couchbaselabs § Query Portal: http://query.couchbase.com § Sample Applications: § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=try § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=beer § Blog: http://blog.couchbase.com § Forum: http://forums.couchbase.com
  • 45. ©2016 Couchbase Inc. 45 Additional Resources – Data Modeling 45 Webinar:The Why,When, and How of NoSQL: A Practical Approach Webinar: Relational to NoSQL: How to Get Started from SQL Server Presentation: Data Modeling with Couchbase Server Connect16 On Demand Recordings • Agile document modeling and data structures • Migrating from relational – Data modeling and access • LINQing to data: Easing the transition from SQL • Tuning for Performance: Indexes and Queries Documentation: Data Modeling with JSON Training class: CD210 Couchbase NoSQL Data Modeling, Querying, andTuning Using N1QL
  • 46. ©2016 Couchbase Inc. 46 Thank you