SlideShare a Scribd company logo
Open source, high performance database




Data Distribution Theory

Will LaForest
Senior Director of 10gen Federal
will@10gen.com
@WLaForest




                                         1
• I will be talking about distributing data across
  machines connected by reliable network
• I will NOT be talking about on disk arrangements
  (well maybe a little)
• I will NOT be talking replication
   – This has some overlaps but in most respects can be
     considered orthogonally


• There is a ton of implementation minutia from
  technology to technology that I will try to avoid

                                                          2
• Need to scale for more
• Cost effective to scale horizontally by distributing
• Fundamentally limited by some resource
   –   Memory
   –   IO Capacity
   –   Disk
   –   CPU
• Lots of systems need to distribute
   –   Web servers/app servers
   –   File systems
   –   Databases
   –   Caches
                                                         3
• Its always been this way
   – From time to time people forget
        • Stateful MTS Objects
        • Stateful EJB

• Concurrent access of data not as simple
   – We will set aside fencing/locking for another day
• RDBMS not built for distributed computing
   –   Not surprising since theory was from 40 years ago
   –   Model works because joins fast
   –   BUT generically efficient distributed joins difficult
   –   Ditto for distributed transaction

                                                               4
• Given a data record what node do I store it on?
• Round robin/ ”random”
   –   Evenly distribute across set severs
   –   Doesn’t take into account rebalancing
   –   Expiring a lot of data? Not too bad (MVCC, expiring cache)
   –   MarkLogic
• Hash based (not talking the pipe)
   – Many search engines & caches
   – Amazon Dynamo (Cassandra*)
• Range based
   – BigTable (HBase)
   – MongoDB
                                                                    5
• Distribute on the hash of some attribute
• Simple way is hash(att) mod N
   – What happens when N changes (we add a node)?
• The industry standard is consistent hashing
• Pros
   – Evenly distributes across nodes
   – Avoid hot spots
   – Great for high write throughput
• Cons
   – No data locality
   – Scattered reads on each node
   – Scatter gather on all queries
                                                    6
0
            |     E
        4   128                   • Circles represent
            2         3
                                    nodes.
                                     hash(hostname)
                                  • Letters represent
    B
                              C     data points
                                     hash(attribute)


                              A

                                    Whats wrong
2                                   with this
                          1         specific ring?


        D
                                                        7
• Use a hash algorithm with even
  distribution (like MD5 or SHA-1)
• Use multiple points or replicas on                         100

  the hash ring
• Instead of just hash(“Host1”)



                                       standard deviations
                                                             50
• hash(“Host1-1”) .. hash(“Host1-
  r”)                                                        20


                                                             10
• Running simulations you get a
  plot that looks like this (see Tom                          5

  White reference)                                                 1   5   20          100   500
• Based on 10 nodes and 10k                                                 replicas
  points

                                                                                               8
2-3
             2-R          1-5
                                              • We have R
                                                replicas for each
       1-2                       3-1            node

                                              • The hash ring
 3-3                                   1-R      could also be
                                                used to
                                                determine
1-4                                     2-2
                                                replicas by using
                                                the same
                                                strategy with
                                                data
 3-R                                   3-4




       2-1                      1-1



              1-3         2-4
                    3-2
                                                                    9
• Also known as sharding
• Distribute based upon an attribute (the key)
   – Or multiple keys (compound)
• Pros
   – Better for reads
   – Data locality so…
   – Querying/reads with shard attribute terms avoid scatter
   – Data can be arranged in contiguous blocks
   – If hash based indexing only allow for range queries on key


                                                                  10
• Cons
  – Requires more consideration a-priori
  – Pick the right shard key
  – Can develop hot spots
  – Leads to more data balancing activities
• Chunking can be done on many levels
  – BigTable breaks into tablets
  – MongoDB uses “chunks”




                                              11
• Pick a key(s) to partition on    • In this example we are
• Map the key space to the nodes     partitioning by Last Name
• Range to node mappings
  adjusted to keep data as         • What happens if we partition
  distributed as possible            by hash(attribute)?




   -∞                   Isaacj
                          i LaForest
                                Meyer      r Scheich             ∞
  Abrams


                                                                     12
• Use a key with a high cardinality
   – Sufficient granularity to your “chunks”
• What are your write vs read requirements
• Read and query king?
   1. Shard key should be something most of your queries use
   2. Also something that distributes reads evenly (avoiding
      read hotspots)
   3. Reading scaling can sometimes be accommodated by
      replication
• Write throughput biggest concern?
   1. You may want to consider partitioning on hash
   2. Avoid hot spots
   3. What happens if shard on systematically increasing key?

                                                                13
• Consistent Hashing and Random Trees
  – One of the original papers on consistent hashing
• Tom White: Consistent Hashing
  – Great blog post on consistent hashing




                                                       14

More Related Content

PPTX
Introduction to Sharding
PDF
MongoDB sharded cluster. How to design your topology ?
PDF
Analyse Yourself
PDF
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
PDF
Advanced MongoDB Aggregation Pipelines
PDF
How To Get Hadoop App Intelligence with Driven
PPTX
Advanced applications with MongoDB
PPTX
Data Treatment MongoDB
Introduction to Sharding
MongoDB sharded cluster. How to design your topology ?
Analyse Yourself
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
Advanced MongoDB Aggregation Pipelines
How To Get Hadoop App Intelligence with Driven
Advanced applications with MongoDB
Data Treatment MongoDB

Viewers also liked (9)

PPTX
MongoDB + Spring
PDF
Geospatial and MongoDB
PDF
MongoDB and Python
PDF
MongoDB and Node.js
PPTX
MongoDB on Financial Services Sector
PDF
MongoDB Certification Study Group - May 2016
PPTX
From Monolithic to Microservices in 45 Minutes
PDF
How Financial Services Organizations Use MongoDB
PPTX
Retail Reference Architecture
MongoDB + Spring
Geospatial and MongoDB
MongoDB and Python
MongoDB and Node.js
MongoDB on Financial Services Sector
MongoDB Certification Study Group - May 2016
From Monolithic to Microservices in 45 Minutes
How Financial Services Organizations Use MongoDB
Retail Reference Architecture
Ad

Similar to Data Distribution Theory (20)

PDF
A Guide to the Post Relational Revolution
PDF
NoSQL - how it works (@pavlobaron)
PDF
Intro to Cassandra
PDF
Kai – An Open Source Implementation of Amazon’s Dynamo
PDF
Cassandra for Ruby/Rails Devs
ZIP
Hashing
PDF
Scalable Data Storage Getting You Down? To The Cloud!
PDF
Scalable Data Storage Getting you Down? To the Cloud!
ODP
Cassandra Overview
PPTX
Scaling with MongoDB
PDF
Progressive NOSQL: Cassandra
PPT
MongoDB Basic Concepts
PDF
KEY
2011 03-31 Riak Stockholm Meetup
PDF
Cassandra talk @JUG Lausanne, 2012.06.14
PDF
MongoDB: Scaling write performance | Devon 2012
PPTX
RTree Spatial Indexing with MongoDB - MongoDC
PPT
Computer notes - Hashing
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PPTX
Presentation1
A Guide to the Post Relational Revolution
NoSQL - how it works (@pavlobaron)
Intro to Cassandra
Kai – An Open Source Implementation of Amazon’s Dynamo
Cassandra for Ruby/Rails Devs
Hashing
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting you Down? To the Cloud!
Cassandra Overview
Scaling with MongoDB
Progressive NOSQL: Cassandra
MongoDB Basic Concepts
2011 03-31 Riak Stockholm Meetup
Cassandra talk @JUG Lausanne, 2012.06.14
MongoDB: Scaling write performance | Devon 2012
RTree Spatial Indexing with MongoDB - MongoDC
Computer notes - Hashing
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Presentation1
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Machine Learning_overview_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
August Patch Tuesday
PPTX
A Presentation on Artificial Intelligence
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Per capita expenditure prediction using model stacking based on satellite ima...
Machine Learning_overview_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Heart disease approach using modified random forest and particle swarm optimi...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mushroom cultivation and it's methods.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
A comparative study of natural language inference in Swahili using monolingua...
Encapsulation_ Review paper, used for researhc scholars
Assigned Numbers - 2025 - Bluetooth® Document
August Patch Tuesday
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Data Distribution Theory

  • 1. Open source, high performance database Data Distribution Theory Will LaForest Senior Director of 10gen Federal [email protected] @WLaForest 1
  • 2. • I will be talking about distributing data across machines connected by reliable network • I will NOT be talking about on disk arrangements (well maybe a little) • I will NOT be talking replication – This has some overlaps but in most respects can be considered orthogonally • There is a ton of implementation minutia from technology to technology that I will try to avoid 2
  • 3. • Need to scale for more • Cost effective to scale horizontally by distributing • Fundamentally limited by some resource – Memory – IO Capacity – Disk – CPU • Lots of systems need to distribute – Web servers/app servers – File systems – Databases – Caches 3
  • 4. • Its always been this way – From time to time people forget • Stateful MTS Objects • Stateful EJB • Concurrent access of data not as simple – We will set aside fencing/locking for another day • RDBMS not built for distributed computing – Not surprising since theory was from 40 years ago – Model works because joins fast – BUT generically efficient distributed joins difficult – Ditto for distributed transaction 4
  • 5. • Given a data record what node do I store it on? • Round robin/ ”random” – Evenly distribute across set severs – Doesn’t take into account rebalancing – Expiring a lot of data? Not too bad (MVCC, expiring cache) – MarkLogic • Hash based (not talking the pipe) – Many search engines & caches – Amazon Dynamo (Cassandra*) • Range based – BigTable (HBase) – MongoDB 5
  • 6. • Distribute on the hash of some attribute • Simple way is hash(att) mod N – What happens when N changes (we add a node)? • The industry standard is consistent hashing • Pros – Evenly distributes across nodes – Avoid hot spots – Great for high write throughput • Cons – No data locality – Scattered reads on each node – Scatter gather on all queries 6
  • 7. 0 | E 4 128 • Circles represent 2 3 nodes. hash(hostname) • Letters represent B C data points hash(attribute) A Whats wrong 2 with this 1 specific ring? D 7
  • 8. • Use a hash algorithm with even distribution (like MD5 or SHA-1) • Use multiple points or replicas on 100 the hash ring • Instead of just hash(“Host1”) standard deviations 50 • hash(“Host1-1”) .. hash(“Host1- r”) 20 10 • Running simulations you get a plot that looks like this (see Tom 5 White reference) 1 5 20 100 500 • Based on 10 nodes and 10k replicas points 8
  • 9. 2-3 2-R 1-5 • We have R replicas for each 1-2 3-1 node • The hash ring 3-3 1-R could also be used to determine 1-4 2-2 replicas by using the same strategy with data 3-R 3-4 2-1 1-1 1-3 2-4 3-2 9
  • 10. • Also known as sharding • Distribute based upon an attribute (the key) – Or multiple keys (compound) • Pros – Better for reads – Data locality so… – Querying/reads with shard attribute terms avoid scatter – Data can be arranged in contiguous blocks – If hash based indexing only allow for range queries on key 10
  • 11. • Cons – Requires more consideration a-priori – Pick the right shard key – Can develop hot spots – Leads to more data balancing activities • Chunking can be done on many levels – BigTable breaks into tablets – MongoDB uses “chunks” 11
  • 12. • Pick a key(s) to partition on • In this example we are • Map the key space to the nodes partitioning by Last Name • Range to node mappings adjusted to keep data as • What happens if we partition distributed as possible by hash(attribute)? -∞ Isaacj i LaForest Meyer r Scheich ∞ Abrams 12
  • 13. • Use a key with a high cardinality – Sufficient granularity to your “chunks” • What are your write vs read requirements • Read and query king? 1. Shard key should be something most of your queries use 2. Also something that distributes reads evenly (avoiding read hotspots) 3. Reading scaling can sometimes be accommodated by replication • Write throughput biggest concern? 1. You may want to consider partitioning on hash 2. Avoid hot spots 3. What happens if shard on systematically increasing key? 13
  • 14. • Consistent Hashing and Random Trees – One of the original papers on consistent hashing • Tom White: Consistent Hashing – Great blog post on consistent hashing 14