SlideShare a Scribd company logo
Performance
Also an overloaded term with different
meanings depending on the context
See definition next
Scalability
An overloaded term that has been
perverted by technical marketing
The ability of a database to
improve performance when adding
more resources
Scalability & Performance
1
2
Throughput Response time
3
S C A L A B I L I T Y & P E R F O R M A N C E
Throughput:
The number of operations per time unit (e.g.,
transactions per second, operations per second, queries per
second)
Response time:
The time from submitting an operation (e.g., transaction,
query, individual row operation) until receiving the
answer
Database Performance Metrics
4
S C A L A B I L I T Y & P E R F O R M A N C E
The ability of a database to
deliver better performance
by adding more resources
Scalability
5
S C A L A B I L I T Y & P E R F O R M A N C E
The ability of a database to
reduce response time by
adding more resources
Speedup
Adding more resources (CPU, memory and disk)
to a centralized database yields more throughput
Adding more nodes to a distributed database (in
a cluster) yields more throughput
Vertical vs Horizontal Scalability
Do all databases scale the
same?
Scalability Factor
7
Can we measure and compare
scalabilities?
Measures scalability:
scale up for vertical
scalability and scale out
for horizontal scalability
Scalability Factor
The scale out factor provides the throughput of
a cluster size normalized to the relative
throughput of a single node
It can also be defined as the ratio between the
throughputs of a database with one node and a
database with n cluster nodes
What is the optimal scalability?
Types of Scalability
9
What is the worst scalability?
Scalability can be logarithmic or
linear, but can be also null or even
negative
Types of Scalability
Some databases have negative scalability, as adding more nodes to the system yields
a throughput lower than with a single node
Many databases have sublinear scalability
Often, scalability is null for write workloads and logarithmic for read/write workloads
Linear scalability is the optimal case: with a cluster of n nodes, you get n times the
throughput of a single node
For instance, if a single node delivers 1,000 transactions per second, a cluster of 100
nodes delivers a throughput of 100,000 transactions per second
Logarithmic Scale Out
Results from wasting capacity due to redundant work and/or contention
Open source databases such as MariaDB rely on cluster replication (see our blog post on Cluster Replication)
Cluster replication yields logarithmic scalability: since the writes are executed by all nodes, only the read fraction of
the workload can provide scalability
Shared disk databases also have logarithmic scalability: the need for a concurrency control protocol that locks disk
pages to be written results in substantial contention that increases with the cluster size
T Y P E S O F S C A L A B I L I T Y
Linear Scalability
Key-value stores (see our blog post on NoSQL) typically provide linear scalability because they are very simple,
without addressing the hard problem of scaling transactional management (the so-called ACID properties)
Transactional databases that exhibit linear scalability are very few (but since this blog series is vendor agnostic, we
don't discuss them)
T Y P E S O F S C A L A B I L I T Y
Types of Speed Up
Speed up can also show different behaviors, from null to linear
Linear speed up means that the response time obtained with one node is divided by n with n nodes
Null speed up means, for instance, that a given query always exhibit the same response time with one or more
nodes
Null speed up happens in a database without a parallel/OLAP query engine (i.e., without intra-query parallelism):
with inter-query parallelism only, each node is able to process a subset of the queries, but each query can only be
executed by a single node
The two main metrics for measuring the performance of a database are throughput
and response time
Throughput measures the number of operations (transactions, queries, inserts) per
unit of time
Response time measures how long it takes to execute a particular operation
14
2
1
3
Main Takeaways
Scalability is the ability of the database to handle bigger loads with more resources
In a distributed database, we talk about horizontal scalability where more
resources mean more nodes
In a centralized database, we talk about vertical scalability where more resources
mean more CPU, memory, and disk
15
2
2
3
Main Takeaways
Speed up is related to scalability but a different concept
Refers to the ability of reducing response time by adding more resources
Again, can be horizontal for a distributed database or vertical for a centralized
database
16
3
2
3
Main Takeaways
Scalability and speed up can be of different kinds
Negative and null are of no interest
Logarithmic scalability can be better but only for a few nodes and high proportion
of reads
17
4
Linear scalability is optimal since each new node contributes the same in terms of
additional load that can be handled
2
3
Main Takeaways
References
[Özsu & Valduriez 2020] Tamer Özsu, Patrick
Valduriez.
Principles of Distributed Database Systems, 4th
Edition.
Springer, 2020.
Relevant Posts from the Blog
How To Measure Scalability and Performance
Cluster Replication
Shared Nothing
Architectures
NoSQL
About
About the authors:
Dr. Ricardo Jimenez-Peris is the CEO and
founder of LeanXcale. Before founding
LeanXcale, he was for over 25 years a
researcher in distributed database systems,
published over 100 scientific publications and
has been director of the Distributed Systems
Lab and university professor on distributed
systems.
Dr. Patrick Valduriez is a researcher at INRIA,
co-author of the book “Principles of Distributed
Database Systems” that has educated legions
of students and engineers in this field and more
recently, Scientific Advisor of LeanXcale.
About this blog series:
This blog series aims at educating database
practitioners in topics commonly not well
understood, often due to false or confusing
marketing messages. The blog provides the
foundations and tools to let the reader
actually evaluate database systems, learn
their real capabilities and be able to compare
the performance of the different alternatives
for its targeted workload. The blog is vendor
agnostic and does not mention specific
vendors, sometimes open source databases
are mentioned to illustrate concepts.
About LeanXcale:
LeanXcale is a startup making a NewSQL
database. Since the blog is vendor
agnostic, we do not talk about LeanXcale
itself. Readers interested on LeanXcale
can visit LeanXcale web site.
Ad

Recommended

HDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
 
Unit 2.pptx
Unit 2.pptx
PriyankaAher11
 
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Eric Carter
 
DynomiteDB - No spof High-availability Redis cluster solution
DynomiteDB - No spof High-availability Redis cluster solution
Leandro Totino Pereira
 
Hadoop
Hadoop
Mallikarjuna G D
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
datastack
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
 
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage
Eric Carter
 
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Denodo
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
Surinder Mehra
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
Alluxio, Inc.
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Alluxio, Inc.
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
DataWorks Summit
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
presentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Computer Hardware | 3B
Computer Hardware | 3B
CMDLMS
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
Zabbix at scale with Elasticsearch
Zabbix at scale with Elasticsearch
Leandro Totino Pereira
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
Chris Almond
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
Understanding Cluster Replication Scalability
Understanding Cluster Replication Scalability
Ricardo Jimenez-Peris
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 

More Related Content

What's hot (20)

Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Denodo
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
Surinder Mehra
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
Alluxio, Inc.
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Alluxio, Inc.
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
DataWorks Summit
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
presentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Computer Hardware | 3B
Computer Hardware | 3B
CMDLMS
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
Zabbix at scale with Elasticsearch
Zabbix at scale with Elasticsearch
Leandro Totino Pereira
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
Chris Almond
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Data Virtualization in the Cloud: Accelerating Data Virtualization Adoption
Denodo
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
Surinder Mehra
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
Alluxio, Inc.
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Alluxio, Inc.
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
DataWorks Summit
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
presentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Computer Hardware | 3B
Computer Hardware | 3B
CMDLMS
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
Alluxio, Inc.
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
Chris Almond
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 

Similar to Understanding Distributed Databases Scalability (20)

Understanding Cluster Replication Scalability
Understanding Cluster Replication Scalability
Ricardo Jimenez-Peris
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
jasoninnes20
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
Alireza Kamrani
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Scaling Your Web Application
Scaling Your Web Application
Ketan Deshmukh
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
✔ Eric David Benari, PMP
 
XXLWEB
XXLWEB
Ömer Taşkın
 
Distributed systems - A Primer
Distributed systems - A Primer
MD Sayem Ahmed
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
mark madsen
 
Scalability
Scalability
Luigi Berrettini
 
Doc 2011101412020074
Doc 2011101412020074
Rhythm Sun
 
NOSQL -lecture 1 mongo database expalnation.pdf
NOSQL -lecture 1 mongo database expalnation.pdf
AliNasser99
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Yapp methodology anjo-kolk
Yapp methodology anjo-kolk
Toon Koppelaars
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
Calpont
 
polyserve-sql-server-scale-out-reporting
polyserve-sql-server-scale-out-reporting
Jason Goodman
 
Diagnosing MySQL performance problems
Diagnosing MySQL performance problems
Justin Swanhart
 
Building Scalable Web Apps
Building Scalable Web Apps
Matías E. Fernández
 
Understanding Cluster Replication Scalability
Understanding Cluster Replication Scalability
Ricardo Jimenez-Peris
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
jasoninnes20
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
Alireza Kamrani
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Scaling Your Web Application
Scaling Your Web Application
Ketan Deshmukh
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
✔ Eric David Benari, PMP
 
Distributed systems - A Primer
Distributed systems - A Primer
MD Sayem Ahmed
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
mark madsen
 
Doc 2011101412020074
Doc 2011101412020074
Rhythm Sun
 
NOSQL -lecture 1 mongo database expalnation.pdf
NOSQL -lecture 1 mongo database expalnation.pdf
AliNasser99
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Yapp methodology anjo-kolk
Yapp methodology anjo-kolk
Toon Koppelaars
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
guest40cda0b
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
Calpont
 
polyserve-sql-server-scale-out-reporting
polyserve-sql-server-scale-out-reporting
Jason Goodman
 
Diagnosing MySQL performance problems
Diagnosing MySQL performance problems
Justin Swanhart
 
Ad

Recently uploaded (20)

CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Complete WordPress Programming Guidance Book
Complete WordPress Programming Guidance Book
Shabista Imam
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
grete1122g
 
Why Every Growing Business Needs a Staff Augmentation Company IN USA.pdf
Why Every Growing Business Needs a Staff Augmentation Company IN USA.pdf
mary rojas
 
Microsoft-365-Administrator-s-Guide1.pdf
Microsoft-365-Administrator-s-Guide1.pdf
mazharatknl
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
AI for PV: Development and Governance for a Regulated Industry
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
Y - Recursion The Hard Way GopherCon EU 2025
Y - Recursion The Hard Way GopherCon EU 2025
Eleanor McHugh
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Which Hiring Management Tools Offer the Best ROI?
Which Hiring Management Tools Offer the Best ROI?
HireME
 
Humans vs AI Call Agents - Qcall.ai's Special Report
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
University Campus Navigation for All - Peak of Data & AI
University Campus Navigation for All - Peak of Data & AI
Safe Software
 
Streamlining CI/CD with FME Flow: A Practical Guide
Streamlining CI/CD with FME Flow: A Practical Guide
Safe Software
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
Introduction to Agile Frameworks for Product Managers.pdf
Introduction to Agile Frameworks for Product Managers.pdf
Ali Vahed
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Complete WordPress Programming Guidance Book
Complete WordPress Programming Guidance Book
Shabista Imam
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
grete1122g
 
Why Every Growing Business Needs a Staff Augmentation Company IN USA.pdf
Why Every Growing Business Needs a Staff Augmentation Company IN USA.pdf
mary rojas
 
Microsoft-365-Administrator-s-Guide1.pdf
Microsoft-365-Administrator-s-Guide1.pdf
mazharatknl
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
Best Practice for LLM Serving in the Cloud
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
AI for PV: Development and Governance for a Regulated Industry
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
Y - Recursion The Hard Way GopherCon EU 2025
Y - Recursion The Hard Way GopherCon EU 2025
Eleanor McHugh
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Which Hiring Management Tools Offer the Best ROI?
Which Hiring Management Tools Offer the Best ROI?
HireME
 
Humans vs AI Call Agents - Qcall.ai's Special Report
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
University Campus Navigation for All - Peak of Data & AI
University Campus Navigation for All - Peak of Data & AI
Safe Software
 
Streamlining CI/CD with FME Flow: A Practical Guide
Streamlining CI/CD with FME Flow: A Practical Guide
Safe Software
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
Introduction to Agile Frameworks for Product Managers.pdf
Introduction to Agile Frameworks for Product Managers.pdf
Ali Vahed
 
Ad

Understanding Distributed Databases Scalability

  • 1. Performance Also an overloaded term with different meanings depending on the context See definition next Scalability An overloaded term that has been perverted by technical marketing The ability of a database to improve performance when adding more resources Scalability & Performance 1
  • 3. 3 S C A L A B I L I T Y & P E R F O R M A N C E Throughput: The number of operations per time unit (e.g., transactions per second, operations per second, queries per second) Response time: The time from submitting an operation (e.g., transaction, query, individual row operation) until receiving the answer Database Performance Metrics
  • 4. 4 S C A L A B I L I T Y & P E R F O R M A N C E The ability of a database to deliver better performance by adding more resources Scalability
  • 5. 5 S C A L A B I L I T Y & P E R F O R M A N C E The ability of a database to reduce response time by adding more resources Speedup
  • 6. Adding more resources (CPU, memory and disk) to a centralized database yields more throughput Adding more nodes to a distributed database (in a cluster) yields more throughput Vertical vs Horizontal Scalability
  • 7. Do all databases scale the same? Scalability Factor 7 Can we measure and compare scalabilities? Measures scalability: scale up for vertical scalability and scale out for horizontal scalability
  • 8. Scalability Factor The scale out factor provides the throughput of a cluster size normalized to the relative throughput of a single node It can also be defined as the ratio between the throughputs of a database with one node and a database with n cluster nodes
  • 9. What is the optimal scalability? Types of Scalability 9 What is the worst scalability? Scalability can be logarithmic or linear, but can be also null or even negative
  • 10. Types of Scalability Some databases have negative scalability, as adding more nodes to the system yields a throughput lower than with a single node Many databases have sublinear scalability Often, scalability is null for write workloads and logarithmic for read/write workloads Linear scalability is the optimal case: with a cluster of n nodes, you get n times the throughput of a single node For instance, if a single node delivers 1,000 transactions per second, a cluster of 100 nodes delivers a throughput of 100,000 transactions per second
  • 11. Logarithmic Scale Out Results from wasting capacity due to redundant work and/or contention Open source databases such as MariaDB rely on cluster replication (see our blog post on Cluster Replication) Cluster replication yields logarithmic scalability: since the writes are executed by all nodes, only the read fraction of the workload can provide scalability Shared disk databases also have logarithmic scalability: the need for a concurrency control protocol that locks disk pages to be written results in substantial contention that increases with the cluster size T Y P E S O F S C A L A B I L I T Y
  • 12. Linear Scalability Key-value stores (see our blog post on NoSQL) typically provide linear scalability because they are very simple, without addressing the hard problem of scaling transactional management (the so-called ACID properties) Transactional databases that exhibit linear scalability are very few (but since this blog series is vendor agnostic, we don't discuss them) T Y P E S O F S C A L A B I L I T Y
  • 13. Types of Speed Up Speed up can also show different behaviors, from null to linear Linear speed up means that the response time obtained with one node is divided by n with n nodes Null speed up means, for instance, that a given query always exhibit the same response time with one or more nodes Null speed up happens in a database without a parallel/OLAP query engine (i.e., without intra-query parallelism): with inter-query parallelism only, each node is able to process a subset of the queries, but each query can only be executed by a single node
  • 14. The two main metrics for measuring the performance of a database are throughput and response time Throughput measures the number of operations (transactions, queries, inserts) per unit of time Response time measures how long it takes to execute a particular operation 14 2 1 3 Main Takeaways
  • 15. Scalability is the ability of the database to handle bigger loads with more resources In a distributed database, we talk about horizontal scalability where more resources mean more nodes In a centralized database, we talk about vertical scalability where more resources mean more CPU, memory, and disk 15 2 2 3 Main Takeaways
  • 16. Speed up is related to scalability but a different concept Refers to the ability of reducing response time by adding more resources Again, can be horizontal for a distributed database or vertical for a centralized database 16 3 2 3 Main Takeaways
  • 17. Scalability and speed up can be of different kinds Negative and null are of no interest Logarithmic scalability can be better but only for a few nodes and high proportion of reads 17 4 Linear scalability is optimal since each new node contributes the same in terms of additional load that can be handled 2 3 Main Takeaways
  • 18. References [Özsu & Valduriez 2020] Tamer Özsu, Patrick Valduriez. Principles of Distributed Database Systems, 4th Edition. Springer, 2020.
  • 19. Relevant Posts from the Blog How To Measure Scalability and Performance Cluster Replication Shared Nothing Architectures NoSQL
  • 20. About About the authors: Dr. Ricardo Jimenez-Peris is the CEO and founder of LeanXcale. Before founding LeanXcale, he was for over 25 years a researcher in distributed database systems, published over 100 scientific publications and has been director of the Distributed Systems Lab and university professor on distributed systems. Dr. Patrick Valduriez is a researcher at INRIA, co-author of the book “Principles of Distributed Database Systems” that has educated legions of students and engineers in this field and more recently, Scientific Advisor of LeanXcale. About this blog series: This blog series aims at educating database practitioners in topics commonly not well understood, often due to false or confusing marketing messages. The blog provides the foundations and tools to let the reader actually evaluate database systems, learn their real capabilities and be able to compare the performance of the different alternatives for its targeted workload. The blog is vendor agnostic and does not mention specific vendors, sometimes open source databases are mentioned to illustrate concepts. About LeanXcale: LeanXcale is a startup making a NewSQL database. Since the blog is vendor agnostic, we do not talk about LeanXcale itself. Readers interested on LeanXcale can visit LeanXcale web site.