SlideShare a Scribd company logo
A New MongoDB Sharding 
Architecture for Higher 
Availability and Better 
Resource Utilization 
Leif Walsh 
@leifwalsh
A Traditional 
MongoDB Cluster 
• 3 shards. 
• 3 replicas per shard.
A Traditional 
MongoDB Cluster 
• 3x write throughput. 
• 3x read throughput.
A Traditional 
MongoDB Cluster 
• 1 node can go down 
without losing availability.
A Traditional 
MongoDB Cluster 
• Data can survive 
destruction of 2 nodes.
General 
MongoDB Cluster 
• Sx write throughput. 
• Rx read throughput. 
• R/2 nodes can go down 
without losing availability. 
• Data can survive 
destruction of R-1 nodes. 
• S×R hardware & 
maintenance cost.
TokuMX: MongoDB with Fractal Trees 
• MongoDB fork. 
• Compression, performance, transactions. 
• Details about Fractal Trees after lunch.
TokuMX: MongoDB with Fractal Trees 
• Read-free Replication 
• Fast Updates 
• Optimized Sharding Migrations 
• Ark Consensus for Replication Failover 
• Partitioned Collections 
• Clustering Indexes & Primary Keys 
• tokutek.com/tokumx
Fractal Tree 
Performance Basics 
Writes are cheap: 
• O(1/B) I/Os per op. 
• ≈10k/s 
Reads are expensive: 
• Ω(1) I/O per op. 
• ≈100/s
Read-free Replication 
Updates are reads + writes. 
Secondaries can trust the primary, 
only do writes.
Read-free Replication 
Updates are reads + writes. 
Secondaries can trust the primary, 
only do writes. 
Looking at I/O utilization, 
secondaries are very cheap 
compared to primaries.
A Traditional 
TokuMX Cluster 
• 9 machines, only 3x 
throughput benefit. 
• Secondaries are 
under-utilized.
A TokuMX Cluster With 
Read-free Replication 
• 3x write throughput. 
• 3x read throughput. 
• (maybe separately)
A TokuMX Cluster With 
Read-free Replication 
• 1 node can go down 
without losing availability.
A TokuMX Cluster With 
Read-free Replication 
• Data can survive 
destruction of 2 nodes.
A TokuMX Cluster With 
Read-free Replication 
• Only 3x hardware cost, 
down from 9x.
Dynamo Architecture 
• Developed at Amazon. 
• Used by Cassandra, Riak, Voldemort. 
• Many components, I will focus on data 
partitioning. 
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dynamo Architecture 
• Servers are equal peers, not separate 
primaries and secondaries. 
• Store overlapping subsets of data 
(MongoDB shards store disjoint subsets). 
• Data partitioning determined by 
consistent hashing.
Dynamo Partitioning 
• N servers in a ring. 
• hash(K) is a location 
around the ring. 
• Store data for K on the 
next R servers on the 
ring.
Dynamo Partitioning 
• All nodes accept writes: 
~linear write scaling. 
• Data replicated R times: 
Rx read performance/ 
reliability.
Dynamo-style Sharding in TokuMX 
• Each node is primary for some 
chunks, secondary for others. 
• Nodes store overlapping 
subsets of the data set.
Dynamo-style Sharding in TokuMX 
• S primaries in the ring: 
Sx write throughput. 
• R copies of each chunk on 
separate machines: 
Rx read throughput, 
availability & recovery 
guarantees.
Dynamo-style Sharding in TokuMX 
• Adding a node: 
– Move one secondary from each 
of next 2 nodes to the new node. 
– Initialize a new replica set on the 
new node and next 2 nodes.
Future Work 
Chunk balancer is not 
sophisticated: 
• Adding/removing machines is 
rough, overloads the machine’s 
neighbors. 
• Can we use ideas from 
Cassandra & Riak to improve 
this? 
MongoDB architecture 
requires managing multiple 
processes on each machine. 
• We can do better with good 
tools. Talk to me if you want to 
write them.
Thanks! 
Come to my talk after lunch for details about 
Fractal Trees. 
leif@tokutek.com 
@leifwalsh 
tokutek.com/tokumx 
slidesha.re/13pxgH8

More Related Content

KEY
Mongodb sharding
PPTX
Sharding Methods for MongoDB
PDF
Sharding
KEY
Sharding with MongoDB (Eliot Horowitz)
PDF
Mongodb - Scaling write performance
PPTX
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
PPTX
MongoDB Sharding
PPTX
MongoDB Capacity Planning
Mongodb sharding
Sharding Methods for MongoDB
Sharding
Sharding with MongoDB (Eliot Horowitz)
Mongodb - Scaling write performance
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB Sharding
MongoDB Capacity Planning

What's hot (20)

PPTX
MongoDB Deployment Checklist
PPTX
Ops Jumpstart: MongoDB Administration 101
PPTX
Scaling with MongoDB
PDF
Optimizing MongoDB: Lessons Learned at Localytics
PPTX
Understanding and tuning WiredTiger, the new high performance database engine...
PPTX
Webinar: Scaling MongoDB
PPTX
Introduction to Redis
PPTX
Hardware Provisioning for MongoDB
KEY
2011 mongo sf-scaling
PPT
Everything You Need to Know About Sharding
PPTX
Scaling MongoDB
PPTX
Securing Your MongoDB Deployment
PPTX
MongoDB Replication fundamentals - Desert Code Camp - October 2014
PPTX
Hadoop Meetup Jan 2019 - Hadoop On Azure
PPTX
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
PDF
MongoDB Capacity Planning
PPTX
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
PDF
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
PPTX
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
PPTX
Capacity Planning
MongoDB Deployment Checklist
Ops Jumpstart: MongoDB Administration 101
Scaling with MongoDB
Optimizing MongoDB: Lessons Learned at Localytics
Understanding and tuning WiredTiger, the new high performance database engine...
Webinar: Scaling MongoDB
Introduction to Redis
Hardware Provisioning for MongoDB
2011 mongo sf-scaling
Everything You Need to Know About Sharding
Scaling MongoDB
Securing Your MongoDB Deployment
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
MongoDB Capacity Planning
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Capacity Planning
Ad

Viewers also liked (11)

PPTX
Webinar: When to Use MongoDB
PPTX
Mongo db
PDF
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
PDF
MongoDB WiredTiger Internals
PDF
FIFA 온라인 3의 MongoDB 사용기
PPTX
MongoDB World 2015 - A Technical Introduction to WiredTiger
PPTX
Cleversafe august 2016
PDF
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
PDF
Linux tuning to improve PostgreSQL performance
PDF
Inside MongoDB: the Internals of an Open-Source Database
PDF
Realtime Sentiment Analysis Application Using Hadoop and HBase
Webinar: When to Use MongoDB
Mongo db
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
MongoDB WiredTiger Internals
FIFA 온라인 3의 MongoDB 사용기
MongoDB World 2015 - A Technical Introduction to WiredTiger
Cleversafe august 2016
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
Linux tuning to improve PostgreSQL performance
Inside MongoDB: the Internals of an Open-Source Database
Realtime Sentiment Analysis Application Using Hadoop and HBase
Ad

Similar to A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization (20)

PDF
Cpu Caches
PDF
CPU Caches - Jamie Allen
PPT
5 Pitfalls to Avoid with MongoDB
PPTX
CPU Caches
PPTX
cybersecurity notes for mca students for learning
PDF
Introduction to Akka-Streams
PPTX
MongoDB Replication fundamentals - Desert Code Camp - October 2014
PPTX
Cassandra for mission critical data
PPTX
Get More Out of MongoDB with TokuMX
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PDF
Cassandra overview
PPSX
LMAX Disruptor - High Performance Inter-Thread Messaging Library
KEY
Deployment Strategy
PDF
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
PDF
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
PDF
HPTS 2011: The NoSQL Ecosystem
PDF
The NoSQL Ecosystem
PDF
cachegrand: A Take on High Performance Caching
PPTX
Cassandra tech talk
PPTX
Apache cassandra
Cpu Caches
CPU Caches - Jamie Allen
5 Pitfalls to Avoid with MongoDB
CPU Caches
cybersecurity notes for mca students for learning
Introduction to Akka-Streams
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Cassandra for mission critical data
Get More Out of MongoDB with TokuMX
Real-Time Analytics with Kafka, Cassandra and Storm
Cassandra overview
LMAX Disruptor - High Performance Inter-Thread Messaging Library
Deployment Strategy
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
HPTS 2011: The NoSQL Ecosystem
The NoSQL Ecosystem
cachegrand: A Take on High Performance Caching
Cassandra tech talk
Apache cassandra

More from leifwalsh (7)

PDF
The Language of Compression
PDF
Write optimization in external memory data structures
PDF
Write-optimization in external memory data structures
PDF
The Level Ancestor Problem simplified
PDF
Write optimization in external memory data structures
PDF
Write-optimization in external memory data structures (Highload++ 2014)
PDF
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
The Language of Compression
Write optimization in external memory data structures
Write-optimization in external memory data structures
The Level Ancestor Problem simplified
Write optimization in external memory data structures
Write-optimization in external memory data structures (Highload++ 2014)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

Recently uploaded (20)

PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Digital Strategies for Manufacturing Companies
PDF
medical staffing services at VALiNTRY
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Introduction to Artificial Intelligence
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
top salesforce developer skills in 2025.pdf
PDF
System and Network Administraation Chapter 3
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
PTS Company Brochure 2025 (1).pdf.......
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Design an Analysis of Algorithms II-SECS-1021-03
Computer Software and OS of computer science of grade 11.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Digital Strategies for Manufacturing Companies
medical staffing services at VALiNTRY
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Introduction to Artificial Intelligence
Reimagine Home Health with the Power of Agentic AI​
top salesforce developer skills in 2025.pdf
System and Network Administraation Chapter 3
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Which alternative to Crystal Reports is best for small or large businesses.pdf
Operating system designcfffgfgggggggvggggggggg
How to Choose the Right IT Partner for Your Business in Malaysia
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Designing Intelligence for the Shop Floor.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PTS Company Brochure 2025 (1).pdf.......

A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization

  • 1. A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh
  • 2. A Traditional MongoDB Cluster • 3 shards. • 3 replicas per shard.
  • 3. A Traditional MongoDB Cluster • 3x write throughput. • 3x read throughput.
  • 4. A Traditional MongoDB Cluster • 1 node can go down without losing availability.
  • 5. A Traditional MongoDB Cluster • Data can survive destruction of 2 nodes.
  • 6. General MongoDB Cluster • Sx write throughput. • Rx read throughput. • R/2 nodes can go down without losing availability. • Data can survive destruction of R-1 nodes. • S×R hardware & maintenance cost.
  • 7. TokuMX: MongoDB with Fractal Trees • MongoDB fork. • Compression, performance, transactions. • Details about Fractal Trees after lunch.
  • 8. TokuMX: MongoDB with Fractal Trees • Read-free Replication • Fast Updates • Optimized Sharding Migrations • Ark Consensus for Replication Failover • Partitioned Collections • Clustering Indexes & Primary Keys • tokutek.com/tokumx
  • 9. Fractal Tree Performance Basics Writes are cheap: • O(1/B) I/Os per op. • ≈10k/s Reads are expensive: • Ω(1) I/O per op. • ≈100/s
  • 10. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.
  • 11. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.
  • 12. A Traditional TokuMX Cluster • 9 machines, only 3x throughput benefit. • Secondaries are under-utilized.
  • 13. A TokuMX Cluster With Read-free Replication • 3x write throughput. • 3x read throughput. • (maybe separately)
  • 14. A TokuMX Cluster With Read-free Replication • 1 node can go down without losing availability.
  • 15. A TokuMX Cluster With Read-free Replication • Data can survive destruction of 2 nodes.
  • 16. A TokuMX Cluster With Read-free Replication • Only 3x hardware cost, down from 9x.
  • 17. Dynamo Architecture • Developed at Amazon. • Used by Cassandra, Riak, Voldemort. • Many components, I will focus on data partitioning. http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
  • 18. Dynamo Architecture • Servers are equal peers, not separate primaries and secondaries. • Store overlapping subsets of data (MongoDB shards store disjoint subsets). • Data partitioning determined by consistent hashing.
  • 19. Dynamo Partitioning • N servers in a ring. • hash(K) is a location around the ring. • Store data for K on the next R servers on the ring.
  • 20. Dynamo Partitioning • All nodes accept writes: ~linear write scaling. • Data replicated R times: Rx read performance/ reliability.
  • 21. Dynamo-style Sharding in TokuMX • Each node is primary for some chunks, secondary for others. • Nodes store overlapping subsets of the data set.
  • 22. Dynamo-style Sharding in TokuMX • S primaries in the ring: Sx write throughput. • R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.
  • 23. Dynamo-style Sharding in TokuMX • Adding a node: – Move one secondary from each of next 2 nodes to the new node. – Initialize a new replica set on the new node and next 2 nodes.
  • 24. Future Work Chunk balancer is not sophisticated: • Adding/removing machines is rough, overloads the machine’s neighbors. • Can we use ideas from Cassandra & Riak to improve this? MongoDB architecture requires managing multiple processes on each machine. • We can do better with good tools. Talk to me if you want to write them.
  • 25. Thanks! Come to my talk after lunch for details about Fractal Trees. [email protected] @leifwalsh tokutek.com/tokumx slidesha.re/13pxgH8