SlideShare a Scribd company logo
Consistency and Availability
tradeoff in database clusters
Grokking Techtalk 40
About Me
● About Me
● Introduce Segmentation Platform
2
About Me
● Joint Grab for 2 years, currently working as the lead engineer of Segmentation Platform project
● Lead the Research Database group in Grokking Lab
3
About Segmentation Platform (SegP)
● Technology
○ Programming Languages: golang, java, scala
○ Batch processing (spark, scala),
○ Caching (redis),
○ Message queue (SQS, Kafka),
○ Relational database (MySQL),
○ Non-relational (Cassandra, DynamoDB, Elastic Search),
● Team's scope
○ Features development. Coordinate with business owners to develop a platform for
segmentation. Similar to segments.io but for internal users.
○ Batch data processing.
○ Real-time traffic. Build and maintain grpc apis to serve online traffic
4
What we'll discuss in this talk
● CAP theorem
● The cluster architect of Redis, Elastic Search, Cassandra
● How C-A tradeoff reflected in their designs
5
CAP Theorem
6
Consistency
The system is considered consistent if v1 is
returned to client 2 if the read request (2)
happened after the write request (1)
Client 1
Client 2
DB
System
(1) Update v=v1
(2) Get v
V value is
currently v0
7
Availability
When 1 request is sent, one algorithm is being
designed to handle that request which some
steps.
If the system can't go through the algorithm
designed for that request, they're considered
"not available" to that client.
Client 1
Client 2
DB
System
500
DB
System
(3) System return 2xx or 4xxx
(1) Client send request
(2) System
went
through the
algorithm
defined for
this request
(2) System
cannot go
through the
algorithm
defined for
this request
8
Network partition
Network partition happened when some of
the nodes cannot communicate properly to
each other and they believe that the others
was offline.
For example, Node 1 cannot communicate
with Node 2, hence Node 1 thought that
Node 2 is offline. But Node 2 still alive, and
still serve requests.
Node 1 Node 2
Client 1 Client 2
9
CAP Theorem
A distributed database has three very desirable properties:
1. Tolerance towards Network Partition
2. Consistency
3. Availability
The CAP theorem states: You can have at most two of these properties for any shared-data system
Consistency
Availability Partition tolerance
10
Redis Cluster
11
What is Redis
12
- Stands for Remote Dictionary Server
- Is a fast, open-source, in-memory key-value data store for use as a database, cache,
message broker, and queue.
- Delivers sub-millisecond response times enabling millions of requests per second for real-
time applications in Gaming, Ad-Tech, Financial Services, Healthcare, and IoT.
- Popular choice for caching, session management, gaming, leaderboards, real-time analytics,
geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub apps.
Redis cluster - Multi-master
Key is hashed into (1-16384). Depends on
the hash value, the value will be read (and
write into the node assigned that token
accordingly.)
Client
Redis node
1
Redis node
2
key -> value
5 -> "ho chi minh"
6 -> "ha noi"
token 1->8000
token 8001-
>16384
key -> hash
5 -> 18
6 -> 8003
13
6 -> "ha noi"
5 -> "ho chi
minh"
Redis cluster - Master/Replica
Redis uses asynchronous replication, with
asynchronous replica-to-master
acknowledges of the amount of data
processed.
A master can have multiple replicas.
Client write to master, but can read from
replica
Client
1
Redis
master
Redis
replica
Redis
replica
Client
2
Write
command
async updates
Read
command
Ref: https://redis.io/topics/replication
async updates
14
C-A tradeoff
Redis uses asynchronous replication
by default. Which means, by default,
it's AP.
If network partition happened between
master and replica, we'll see
inconsistent data.
Client
1
Redis
master
Redis
replica
Redis
replica
Client
2
Write
command
async updates
async updates
Read
command
return stale
data
15
Elastic Search cluster
16
What is Elasticsearch
17
● Elasticsearch is a distributed, open source search and analytics engine for all types of data,
including textual, numerical, geospatial, structured, and unstructured.
● Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V.
(now known as Elastic).
Roles in Elasticsearch Cluster
Coordinator node
Master nodeData node
Data node
Client 1
Client 2
Manages the overall
operation of a cluster
and keeps track of
the cluster state
Stores and searches
data. Performs all
data-related
operations (indexing,
searching,
aggregating) on local
shards.
Delegates client
requests to the
shards on the data
nodes, collects and
aggregates the
results into one final
result, and sends this
result back to the
client.
18
Steps for primary shards:
● Validate incoming operation and reject it if
structurally invalid
● Execute the operation locally
● Forward the operation to each replica in the current
in-sync copies set.
● Once all replicas have successfully performed the
operation and responded to the primary, the
primary acknowledges the successful completion
of the request to the client
Primary and Replica shards
user1 auth a1 login
from
homepa
ge
Destination node
Coordination node
19
p1
r11
r12
r21
p2
r22
Primary shards to
forward the
operation to replica
If network partition happen, the primary shard cannot
write to replica shard which lead to the primary shard
becomes unavailable.
By default, ElasticSearch is more CP.
C-A tradeoff
Destination node
20
p1
r11
r12
r21
p2
r22
Primary shards to
forward the
operation to replica
Cassandra
21
What is Cassandra
22
Apache Cassandra is an open source, distributed NoSQL database that began internally at
Facebook and was released as an open-source project in July 2008.
Cassandra delivers continuous availability (zero downtime), high performance, and linear
scalability that modern applications require, while also offering operational simplicity and effortless
replication across data centers and geographies.
Cassandra Ring Cluster
-> token: 44
1-20
21-40
41-60
61-80
81-100
101-120
121-140
141-160
Coordinator node
- Each nodes will be assigned a range of token
- Client could connect to any nodes to write, that
node will become the coordinator node
- Partition keys will be hashed into a token.
Coordinator will base on the token to know which
node we can store the data
user1 auth a1 login
from
homepa
ge
Destination node
23
Replication Factor
-> token: 44
1-20
21-40
41-60
61-80
81-100
101-120
121-140
141-160
Coordinator node
Replication node
- Replication Factor (RF) = number of copies we
want to store
- Replication node will be defined by the Replication
Strategy
- Simple strategy = next two nodes will be the
replication node
user1 auth a1 login
from
homepa
ge
24
Data Consistency
C1
C2
C
A
B
Client 1
Client 2
read data with
token 44
write data with
token 44
v2
v1
v2
- Client 1 connect to C1 to read, C1 write data to three nodes, but
failed at node B.
- Client 2 also connect to C2 to read data,
What would happen?
25
Consistent Level (Write)
Level Read Write
One Returns a response from
the closest replica, as
determined by the snitch.
By default, a read repair
runs in the background to
make the other replicas
consistent.
A write must be written to
the commit log and
memtable of at least one
replica node.
Quorum Returns the record after a
quorum of replicas has
responded.
A write must be written to
the commit log and
memtable on a quorum of
replica nodes
All Returns the record after all
replicas have responded.
The read operation will fail if
a replica does not respond.
A write must be written to
the commit log and
memtable on all replica
nodes in the cluster for that
partition.
26
Write with CL=ALL
C1
C2
C
A
B
Client 1
write data with
token 44
v2
v1
v2
Write with CL=ALL
- All replica succeeded -> success
- 1 replica failed -> failed
Result: Failed
27
Write with CL=QUORUM
C1
C2
C
A
B
Client 1
write data with
token 44
v2
v1
v2
Quorum = (RF + 1) / 2 = 2
- Two replicas succeeded -> success
- Less than two success -> failed
Result: Success
28
Consistent Level (Read)
Level Read Write
One Returns a response from
the closest replica, as
determined by the snitch.
By default, a read repair
runs in the background to
make the other replicas
consistent.
A write must be written to
the commit log and
memtable of at least one
replica node.
Quorum Returns the record after a
quorum of replicas has
responded.
A write must be written to
the commit log and
memtable on a quorum of
replica nodes
All Returns the record after all
replicas have responded.
The read operation will fail if
a replica does not respond.
A write must be written to
the commit log and
memtable on all replica
nodes in the cluster for that
partition.
29
Write=QUORUM, Read=One
C1
C2
C
A
B
Client 1
Client 2
read data with
token 44
write data with
token 44
v2
v1
v2
Potentially inconsistent read. If client 2 read
node B, client 2 will receive stale-data.
W (Quorum) + R (1) -> eventual consistent
30
Write=QUORUM, Read=QUORUM
C1
C2
C
A
B
Client 1
Client 2
read data with
token 44
write data with
token 44
v2
v1
v2
Any read combination will always return v2
W (QU) + R (QU) -> consistent
31
Write=All, Read=One
C1
C2
C
A
B
Client 1
Client 2
read data with
token 44
write data with
token 44
v2
v1
v2
Potentially inconsistent read. If client 2 read
node B, client 2 will receive stale-data.
W (All) + R (1) -> consistent
32
Summarize Read and Write CL
WRITE READ Consistent Read Availability Write Availability
All All Consistent Low Low
Quorum All Consistent Low Medium
One All Consistent Low High
All Quoru
m
Consistent Medium Low
Quorum Quoru
m
Consistent Medium Medium
One Quoru
m
Inconsistent Medium High
All One Consistent High Low
Quorum One Inconsistent High Medium
One One Inconsistent High High
33
Summary
Redis Cassandra Elastic Search
Availability > Consistency Tweakable availability and
consistency
Availability < Consistency
34
Q&A
35

More Related Content

PDF
Vectors are the new JSON in PostgreSQL
PDF
Inside MongoDB: the Internals of an Open-Source Database
PPTX
Apache Spark.
PDF
Introduction to the Disruptor
PDF
Incremental View Maintenance with Coral, DBT, and Iceberg
PDF
Maximizing Database Tuning in SAP SQL Anywhere
PPTX
Introduction to ML with Apache Spark MLlib
PDF
SQLAlchemy Primer
Vectors are the new JSON in PostgreSQL
Inside MongoDB: the Internals of an Open-Source Database
Apache Spark.
Introduction to the Disruptor
Incremental View Maintenance with Coral, DBT, and Iceberg
Maximizing Database Tuning in SAP SQL Anywhere
Introduction to ML with Apache Spark MLlib
SQLAlchemy Primer

What's hot (20)

PPTX
Introduction to Apache ZooKeeper
PPTX
Apache Flink and what it is used for
PPTX
Introduction to Apache Kafka
PPTX
RedisConf17- Using Redis at scale @ Twitter
PDF
Understanding InfluxDB Basics: Tags, Fields and Measurements
PDF
Apache ZooKeeper
PDF
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
PPTX
Kafka replication apachecon_2013
PDF
Scalability, Availability & Stability Patterns
KEY
Writing Scalable Software in Java
PDF
Grokking TechTalk #33: High Concurrency Architecture at TIKI
PDF
Grokking Techtalk #39: Gossip protocol and applications
PDF
High Concurrency Architecture at TIKI
PDF
Common issues with Apache Kafka® Producer
PDF
Producer Performance Tuning for Apache Kafka
PPTX
Apache Kafka
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PDF
Developing Real-Time Data Pipelines with Apache Kafka
PPTX
Kafka 101
PDF
Domain Driven Design và Event Driven Architecture
Introduction to Apache ZooKeeper
Apache Flink and what it is used for
Introduction to Apache Kafka
RedisConf17- Using Redis at scale @ Twitter
Understanding InfluxDB Basics: Tags, Fields and Measurements
Apache ZooKeeper
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Kafka replication apachecon_2013
Scalability, Availability & Stability Patterns
Writing Scalable Software in Java
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking Techtalk #39: Gossip protocol and applications
High Concurrency Architecture at TIKI
Common issues with Apache Kafka® Producer
Producer Performance Tuning for Apache Kafka
Apache Kafka
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Developing Real-Time Data Pipelines with Apache Kafka
Kafka 101
Domain Driven Design và Event Driven Architecture
Ad

Similar to Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster (20)

PPTX
NoSQL Introduction, Theory, Implementations
PDF
Cassandra - A Decentralized Structured Storage System
PDF
Highly available distributed databases, how they work, javier ramirez at teowaki
PDF
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
PDF
Everything you always wanted to know about highly available distributed datab...
PPTX
L6.sp17.pptx
PPTX
NoSql Database
PDF
Introduction to Cassandra
PDF
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
ODP
Everything you always wanted to know about Distributed databases, at devoxx l...
PDF
Nzpug welly-cassandra-02-12-2010
PDF
Cassandra introduction mars jug
PPT
No sql
PPT
5266732.ppt
PDF
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
PPTX
Cassandra training
PDF
Cassandra Fundamentals - C* 2.0
PPTX
Data Engineering for Data Scientists
PPTX
BigData Developers MeetUp
PPTX
Basics of Distributed Systems - Distributed Storage
NoSQL Introduction, Theory, Implementations
Cassandra - A Decentralized Structured Storage System
Highly available distributed databases, how they work, javier ramirez at teowaki
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Everything you always wanted to know about highly available distributed datab...
L6.sp17.pptx
NoSql Database
Introduction to Cassandra
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Everything you always wanted to know about Distributed databases, at devoxx l...
Nzpug welly-cassandra-02-12-2010
Cassandra introduction mars jug
No sql
5266732.ppt
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Cassandra training
Cassandra Fundamentals - C* 2.0
Data Engineering for Data Scientists
BigData Developers MeetUp
Basics of Distributed Systems - Distributed Storage
Ad

More from Grokking VN (20)

PDF
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
PDF
Grokking Techtalk #45: First Principles Thinking
PDF
Grokking Techtalk #42: Engineering challenges on building data platform for M...
PDF
Grokking Techtalk #43: Payment gateway demystified
PPTX
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
PDF
Grokking Techtalk #38: Escape Analysis in Go compiler
PPTX
Grokking Techtalk #37: Data intensive problem
PPTX
Grokking Techtalk #37: Software design and refactoring
PDF
Grokking TechTalk #35: Efficient spellchecking
PDF
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
PDF
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
PDF
SOLID & Design Patterns
PDF
Grokking TechTalk #31: Asynchronous Communications
PDF
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
PDF
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
PDF
Grokking TechTalk #27: Optimal Binary Search Tree
PDF
Grokking TechTalk #26: Kotlin, Understand the Magic
PDF
Grokking TechTalk #26: Compare ios and android platform
PPTX
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
PDF
Grokking TechTalk #24: Kafka's principles and protocols
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Software design and refactoring
Grokking TechTalk #35: Efficient spellchecking
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
SOLID & Design Patterns
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Compare ios and android platform
Grokking TechTalk #24: Thiết kế hệ thống Background Job Queue bằng Ruby & Pos...
Grokking TechTalk #24: Kafka's principles and protocols

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Big Data Technologies - Introduction.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Group 1 Presentation -Planning and Decision Making .pptx
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Big Data Technologies - Introduction.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Assigned Numbers - 2025 - Bluetooth® Document

Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster

  • 1. Consistency and Availability tradeoff in database clusters Grokking Techtalk 40
  • 2. About Me ● About Me ● Introduce Segmentation Platform 2
  • 3. About Me ● Joint Grab for 2 years, currently working as the lead engineer of Segmentation Platform project ● Lead the Research Database group in Grokking Lab 3
  • 4. About Segmentation Platform (SegP) ● Technology ○ Programming Languages: golang, java, scala ○ Batch processing (spark, scala), ○ Caching (redis), ○ Message queue (SQS, Kafka), ○ Relational database (MySQL), ○ Non-relational (Cassandra, DynamoDB, Elastic Search), ● Team's scope ○ Features development. Coordinate with business owners to develop a platform for segmentation. Similar to segments.io but for internal users. ○ Batch data processing. ○ Real-time traffic. Build and maintain grpc apis to serve online traffic 4
  • 5. What we'll discuss in this talk ● CAP theorem ● The cluster architect of Redis, Elastic Search, Cassandra ● How C-A tradeoff reflected in their designs 5
  • 7. Consistency The system is considered consistent if v1 is returned to client 2 if the read request (2) happened after the write request (1) Client 1 Client 2 DB System (1) Update v=v1 (2) Get v V value is currently v0 7
  • 8. Availability When 1 request is sent, one algorithm is being designed to handle that request which some steps. If the system can't go through the algorithm designed for that request, they're considered "not available" to that client. Client 1 Client 2 DB System 500 DB System (3) System return 2xx or 4xxx (1) Client send request (2) System went through the algorithm defined for this request (2) System cannot go through the algorithm defined for this request 8
  • 9. Network partition Network partition happened when some of the nodes cannot communicate properly to each other and they believe that the others was offline. For example, Node 1 cannot communicate with Node 2, hence Node 1 thought that Node 2 is offline. But Node 2 still alive, and still serve requests. Node 1 Node 2 Client 1 Client 2 9
  • 10. CAP Theorem A distributed database has three very desirable properties: 1. Tolerance towards Network Partition 2. Consistency 3. Availability The CAP theorem states: You can have at most two of these properties for any shared-data system Consistency Availability Partition tolerance 10
  • 12. What is Redis 12 - Stands for Remote Dictionary Server - Is a fast, open-source, in-memory key-value data store for use as a database, cache, message broker, and queue. - Delivers sub-millisecond response times enabling millions of requests per second for real- time applications in Gaming, Ad-Tech, Financial Services, Healthcare, and IoT. - Popular choice for caching, session management, gaming, leaderboards, real-time analytics, geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub apps.
  • 13. Redis cluster - Multi-master Key is hashed into (1-16384). Depends on the hash value, the value will be read (and write into the node assigned that token accordingly.) Client Redis node 1 Redis node 2 key -> value 5 -> "ho chi minh" 6 -> "ha noi" token 1->8000 token 8001- >16384 key -> hash 5 -> 18 6 -> 8003 13 6 -> "ha noi" 5 -> "ho chi minh"
  • 14. Redis cluster - Master/Replica Redis uses asynchronous replication, with asynchronous replica-to-master acknowledges of the amount of data processed. A master can have multiple replicas. Client write to master, but can read from replica Client 1 Redis master Redis replica Redis replica Client 2 Write command async updates Read command Ref: https://redis.io/topics/replication async updates 14
  • 15. C-A tradeoff Redis uses asynchronous replication by default. Which means, by default, it's AP. If network partition happened between master and replica, we'll see inconsistent data. Client 1 Redis master Redis replica Redis replica Client 2 Write command async updates async updates Read command return stale data 15
  • 17. What is Elasticsearch 17 ● Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. ● Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).
  • 18. Roles in Elasticsearch Cluster Coordinator node Master nodeData node Data node Client 1 Client 2 Manages the overall operation of a cluster and keeps track of the cluster state Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. 18
  • 19. Steps for primary shards: ● Validate incoming operation and reject it if structurally invalid ● Execute the operation locally ● Forward the operation to each replica in the current in-sync copies set. ● Once all replicas have successfully performed the operation and responded to the primary, the primary acknowledges the successful completion of the request to the client Primary and Replica shards user1 auth a1 login from homepa ge Destination node Coordination node 19 p1 r11 r12 r21 p2 r22 Primary shards to forward the operation to replica
  • 20. If network partition happen, the primary shard cannot write to replica shard which lead to the primary shard becomes unavailable. By default, ElasticSearch is more CP. C-A tradeoff Destination node 20 p1 r11 r12 r21 p2 r22 Primary shards to forward the operation to replica
  • 22. What is Cassandra 22 Apache Cassandra is an open source, distributed NoSQL database that began internally at Facebook and was released as an open-source project in July 2008. Cassandra delivers continuous availability (zero downtime), high performance, and linear scalability that modern applications require, while also offering operational simplicity and effortless replication across data centers and geographies.
  • 23. Cassandra Ring Cluster -> token: 44 1-20 21-40 41-60 61-80 81-100 101-120 121-140 141-160 Coordinator node - Each nodes will be assigned a range of token - Client could connect to any nodes to write, that node will become the coordinator node - Partition keys will be hashed into a token. Coordinator will base on the token to know which node we can store the data user1 auth a1 login from homepa ge Destination node 23
  • 24. Replication Factor -> token: 44 1-20 21-40 41-60 61-80 81-100 101-120 121-140 141-160 Coordinator node Replication node - Replication Factor (RF) = number of copies we want to store - Replication node will be defined by the Replication Strategy - Simple strategy = next two nodes will be the replication node user1 auth a1 login from homepa ge 24
  • 25. Data Consistency C1 C2 C A B Client 1 Client 2 read data with token 44 write data with token 44 v2 v1 v2 - Client 1 connect to C1 to read, C1 write data to three nodes, but failed at node B. - Client 2 also connect to C2 to read data, What would happen? 25
  • 26. Consistent Level (Write) Level Read Write One Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent. A write must be written to the commit log and memtable of at least one replica node. Quorum Returns the record after a quorum of replicas has responded. A write must be written to the commit log and memtable on a quorum of replica nodes All Returns the record after all replicas have responded. The read operation will fail if a replica does not respond. A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition. 26
  • 27. Write with CL=ALL C1 C2 C A B Client 1 write data with token 44 v2 v1 v2 Write with CL=ALL - All replica succeeded -> success - 1 replica failed -> failed Result: Failed 27
  • 28. Write with CL=QUORUM C1 C2 C A B Client 1 write data with token 44 v2 v1 v2 Quorum = (RF + 1) / 2 = 2 - Two replicas succeeded -> success - Less than two success -> failed Result: Success 28
  • 29. Consistent Level (Read) Level Read Write One Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to make the other replicas consistent. A write must be written to the commit log and memtable of at least one replica node. Quorum Returns the record after a quorum of replicas has responded. A write must be written to the commit log and memtable on a quorum of replica nodes All Returns the record after all replicas have responded. The read operation will fail if a replica does not respond. A write must be written to the commit log and memtable on all replica nodes in the cluster for that partition. 29
  • 30. Write=QUORUM, Read=One C1 C2 C A B Client 1 Client 2 read data with token 44 write data with token 44 v2 v1 v2 Potentially inconsistent read. If client 2 read node B, client 2 will receive stale-data. W (Quorum) + R (1) -> eventual consistent 30
  • 31. Write=QUORUM, Read=QUORUM C1 C2 C A B Client 1 Client 2 read data with token 44 write data with token 44 v2 v1 v2 Any read combination will always return v2 W (QU) + R (QU) -> consistent 31
  • 32. Write=All, Read=One C1 C2 C A B Client 1 Client 2 read data with token 44 write data with token 44 v2 v1 v2 Potentially inconsistent read. If client 2 read node B, client 2 will receive stale-data. W (All) + R (1) -> consistent 32
  • 33. Summarize Read and Write CL WRITE READ Consistent Read Availability Write Availability All All Consistent Low Low Quorum All Consistent Low Medium One All Consistent Low High All Quoru m Consistent Medium Low Quorum Quoru m Consistent Medium Medium One Quoru m Inconsistent Medium High All One Consistent High Low Quorum One Inconsistent High Medium One One Inconsistent High High 33
  • 34. Summary Redis Cassandra Elastic Search Availability > Consistency Tweakable availability and consistency Availability < Consistency 34