SlideShare a Scribd company logo
Introduction
to
Galera Cluster and Codership
2
Created by Codership Oy
  Our founders participated
in 3 MySQL cluster
developments, since
2003.
  Started Galera work
2007. Based on PhD by
Fernando Pedone.
  1.0 in 2011. Percona &
MariaDB in 2012.
  Galera is free & open
source. Support and
consulting by Codership
& partners.
3
Galera
Galera in a nutshell
  True multi-master:
Read & write to any node
  Synchronous replication
  No slave lag
  No integrity issues
  No master-slave failovers or VIP
needed
  Multi-threaded slave, no
performance penalty
  Automatic node provisioning
  Elastic:
Easy scale-out & scale-in,
all nodes read-write
Master MasterMaster
4
Sysbench disk bound (20GB data / 6GB buffer), tps
  EC2 w local disk
-  Note: pretty poor I/O
here
  Blue vs red:
innodb_flush_log_at_trx_commit
> 66% improvement
  Scale-out factors:
2N = 0.5 x 1N
4N = 0.5 x 2N
Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
5
Galera vs other HA solutions
Galera is like...
  MySQL replication without
integrity issues or slave lag
  DRBD/SAN without failover
downtime and performance
penalty
  Oracle RAC without failover
downtime
  NDB, but you get to keep
InnoDB
Galera
NDB
Failover downtime
MySQL
replication
Slow Fast
Dataintegrity
DRBD
99 %
99.999...%
PoorSolid
RAC
SAN
Backups
6
Active-Active DB = best with Load Balancer
  HA Proxy, GLB, Cisco, F5...
  Pictured: Load balancer on
each app server
-  No Single Point of Failure
-  One less layer of network components
-  PHP and JDBC drivers provide this built-in!
jdbc:mysql:loadbalance://
10.0.0.1,10.0.0.2,10.0.0.3
/<database>?
loadBalanceBlacklistTimeout=5000
  Or: Separate HW or SW load
balancer
-  Centralized administration
-  What if LB fails?
Galera
MySQL MySQLMySQL
LB LB
7
GaleraGalera
Some other architectures
MySQL MySQLMySQLMySQL MySQLMySQL
VIP
Whole stack cluster
Virtual IP failover
8
Galera
Quorum
  Galera uses quorum based failure
handling:
-  When cluster partitioning is
detected, the majority partition
"has quorum" and can continue
-  A minority partition cannot
commit transactions, but will
attempt to re-connect to primary
partition
-  Note: 50% is not majority!
=> Minimum 3 nodes
recommended.
  Load balancer will notice errors &
remove node from pool
MySQL MySQLMySQL
LB LB
9
WAN replication
  Works fine
  Use higher timeouts and send
windows
  No impact on reads
  No impact within a transaction
  adds 100-300 ms to commit
latency
  No major impact on tps
  Quorum between data
centers
-  3 data centers
-  Distribute nodes evenly
10
WAN with MySQL asynchronous replication
  You can mix Galera replication
and MySQL replication
  Good option on poor WAN
  Remember to watch out for
slave lag, etc...
  "Channel failover" if a master
node crashes
  Mixed replication useful when
you want async slave (such as
time-delayed, filtered, multi-
source...)
11
Who is using Galera?
Extra slides
13
Migration checklist
  Are your tables InnoDB?
  Make sure all tables have Primary Key
  Watch out for Triggers and Events
Tip: Don't do too many changes at once. Migrate to InnoDB first,
run a month in production, then migrate to Galera.
14
MySQL
A MySQL Galera cluster is...
InnoDBMyISAM
ReplicationAPI
WsrepAPI
SHOW STATUS LIKE "wsrep%"
SHOW VARIABLES ...
Galera group comm library
MySQL
MySQL
Snapshot State Transfer
mysqldump
rsync
xtrabackup
etc...
http://www.codership.com/downloads/download-mysqlgalera
15
Understanding the transaction sequence in Galera
BEGIN
Master Slave
SELECT
UPDATE
COMMIT
User transaction
Certification
Group
communication
=> GTIDCertification
COMMIT
Apply
commit
return
Commit
delay
Virtual
synchrony
=
Committed
events
written to
InnoDB
after small
delay
Optimistic
locking
between
nodes
=
Risk for
deadlocks
ROLLB
InnoDB
commit
COMMIT discard
Certification =
deterministic
InnoDB
commit
16
What if I only have 2 nodes?
Galera Arbitrator (garbd)
  Acts as a 3rd node in a
cluster but doesn't store the
data.
  Run it on an app server.
  Run it on any other available
server.
  Note: Do not run a 3rd node
in a VM on same hypervisor
as other Galera nodes.
(Why?)
Master-slave clustering
  Pacemaker, Heartbeat, etc...
-  Manual failover?
  Still better than MySQL
replication or DRBD: Hot
standby, multi-threaded
slave...
  Prioritize data integrity:
set global wsrep_on=0
# (at failover)
  Prioritize failover speed:
pc.ignore_quorum=on
# (at startup)
17
Optimistic locking cluster-wide
  ...theoretical chance of deadlocks
-  In most cases less than 1 out of 10.000 trx
-  Correct solution: Catch exceptions in app and retry
-  Design: Avoid hot-spots in tables
-  Workaround: Directing all writes (or all problematic writes) to
single node brings back 100% InnoDB compatibility
18
Snapshot options
SST = Full snapshot
  Mysqldump & rsync will block donor
-  Dedicate 1 node to act as donor
  Xtrabackup is a non-blocking option
  Really big databases
-  wsrep_sst_method=skip + manual backup & restore
-  wsrep_sst_method=fedex :-)
IST = Incremental State Transfer
  Logic: IST is preferred over SST
  gcache.size <= DB size
gcache.size >= wsrep_replicated_bytes * <outage duration>
Benchmarks
http://codership.com/info/benchmarks
20
Sysbench disk bound (20GB data / 6GB buffer), tps
  EC2 w local disk
-  Note: pretty poor I/O here
  Blue vs red: turning off
innodb_flush_log_at_trx_com
mit gives 66% improvement
  Scale-out factors:
2N = 0.5 x 1N
4N = 0.5 x 2N
  5th node was EC2 weakness.
Later test scaled a little more
up to 8 nodes
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
21
Sysbench disk bound (20GB data / 6GB buffer), latency
  As before
  Not syncing InnoDB
decreases latency
  Scale-out decreases
latency
  Galera does not add
latency overhead
http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
22
Galera and NDB shootout: sysbench "out of the box"
  Galera is 4x better
Ok, so what does this really
mean?
  That Galera is better...
-  For this workload
-  With default settings
(Severalnines)
-  Pretty user friendly and
general purpose
  NDB
-  Excels at key-value and
heavy-write workloads
(which sysbench is not)
-  Would benefit here from
PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
23
Drupal on Galera: baseline w single server
  Drupal, Apache, PHP,
MySQL 5.1
  JMeter
-  3 types of users: poster,
commenter, reader
-  Gaussian (15, 7) think time
  Large EC2 instance
  Ideal scalability: linear until
tipping point at 140-180 users
-  Constrained by Apache/PHP
CPU utilization
-  Could scale out by adding
more Apache in front of
single MySQL
http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
24
Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)
  Drupal, Apache, PHP,
MySQL 5.1 w Galera
  1-4 identical nodes
-  Whole stack cluster
-  MySQL connection to
localhost
  Multiply nr of users
-  180, 360, 540, 720
  3 nodes = linear scalability,
4 nodes still near-linear
  Minimal latency overhead
http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
25
Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)
  Like before
  Constant nr of users
-  180, 180, 180, 180
  Scaling from 1 to 2
-  drastically reduces
latency
-  tps back to linear
scalability
  Scaling to 3 and 4
-  No more tps as there
was no bottleneck.
-  Slightly better latency
-  Note: No overhead from
additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
26
WAN replication, EC2 eu-west + us-east, tps
http://codership.com/content/synchronous-replication-loves-you-again
client eu-west
db in us-east
27
WAN replication, EC2 eu-west + us-east, latency
http://codership.com/content/synchronous-replication-loves-you-again
client eu-west
db in us-east
28
Conclusion: WAN only adds commit latency, which is usually ok
EU-west <-> US-east
-  90 ms
-  "best case"
EU <-> JPN
-  275 ms
EU <-> JPN <-> USA
-  295 ms
You can choose latency
between:
-  user and web server = ok
-  web server and db = bad
-  db and db = great!
http://codership.com/content/synchronous-replication-loves-you-again
http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/

More Related Content

PDF
MariaDB Galera Cluster presentation
PPTX
MariaDB Galera Cluster
PPTX
Maria DB Galera Cluster for High Availability
PDF
MariaDB Server Performance Tuning & Optimization
PDF
Maxscale_메뉴얼
PDF
Galera cluster for high availability
PPTX
Maxscale 소개 1.1.1
PDF
Zero Downtime Schema Changes - Galera Cluster - Best Practices
MariaDB Galera Cluster presentation
MariaDB Galera Cluster
Maria DB Galera Cluster for High Availability
MariaDB Server Performance Tuning & Optimization
Maxscale_메뉴얼
Galera cluster for high availability
Maxscale 소개 1.1.1
Zero Downtime Schema Changes - Galera Cluster - Best Practices

What's hot (20)

PDF
MariaDB MaxScale
PPTX
Running MariaDB in multiple data centers
PDF
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
PPTX
MariaDB High Availability
PDF
MariaDB Galera Cluster - Simple, Transparent, Highly Available
PDF
MariaDB MaxScale: an Intelligent Database Proxy
PDF
Ceph RBD Update - June 2021
PPTX
MySQL_MariaDB-성능개선-202201.pptx
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
PDF
Galera Cluster DDL and Schema Upgrades 220217
PPTX
MySQL Performance Tips & Best Practices
PDF
MySQL Backup & Recovery
PDF
Introduction to Galera
PDF
M|18 Architectural Overview: MariaDB MaxScale
PDF
MySQL Server Settings Tuning
PDF
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
PDF
[2018] MySQL 이중화 진화기
PDF
MySQL GTID Concepts, Implementation and troubleshooting
PPT
Using galera replication to create geo distributed clusters on the wan
MariaDB MaxScale
Running MariaDB in multiple data centers
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
MariaDB High Availability
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB MaxScale: an Intelligent Database Proxy
Ceph RBD Update - June 2021
MySQL_MariaDB-성능개선-202201.pptx
How to Manage Scale-Out Environments with MariaDB MaxScale
Galera Cluster DDL and Schema Upgrades 220217
MySQL Performance Tips & Best Practices
MySQL Backup & Recovery
Introduction to Galera
M|18 Architectural Overview: MariaDB MaxScale
MySQL Server Settings Tuning
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
[2018] MySQL 이중화 진화기
MySQL GTID Concepts, Implementation and troubleshooting
Using galera replication to create geo distributed clusters on the wan
Ad

Viewers also liked (19)

PDF
Galera cluster for MySQL - Introduction Slides
PPT
InnoDB Plugin - II Fórum da Comunidade MySQL
PDF
Galera Cluster 3.0 Features
PPTX
Introdução ao MySQL 5.6
PDF
Awsではじめるgluster fs 20120726-public
PDF
Scaling with sync_replication using Galera and EC2
PPT
Taking Full Advantage of Galera Multi Master Cluster
PPT
Galera webinar migration to galera cluster from my sql async replication
PPTX
Open stack HA - Theory to Reality
PDF
MySQL 5.5 Guide to InnoDB Status
PPT
Codership's galera cluster installation and quickstart webinar march 2016
PPT
Galera Cluster Best Practices for DBA's and DevOps Part 1
PDF
The InnoDB Storage Engine for MySQL
PDF
Gluster for Geeks: Performance Tuning Tips & Tricks
PDF
Red Hat Gluster Storage Performance
PDF
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
PDF
MaxScaleを触ってみた
PDF
UNIFAL - MySQL Storage Engine - 5.0/5.6
Galera cluster for MySQL - Introduction Slides
InnoDB Plugin - II Fórum da Comunidade MySQL
Galera Cluster 3.0 Features
Introdução ao MySQL 5.6
Awsではじめるgluster fs 20120726-public
Scaling with sync_replication using Galera and EC2
Taking Full Advantage of Galera Multi Master Cluster
Galera webinar migration to galera cluster from my sql async replication
Open stack HA - Theory to Reality
MySQL 5.5 Guide to InnoDB Status
Codership's galera cluster installation and quickstart webinar march 2016
Galera Cluster Best Practices for DBA's and DevOps Part 1
The InnoDB Storage Engine for MySQL
Gluster for Geeks: Performance Tuning Tips & Tricks
Red Hat Gluster Storage Performance
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
MaxScaleを触ってみた
UNIFAL - MySQL Storage Engine - 5.0/5.6
Ad

Similar to Introduction to Galera Cluster (20)

PPTX
Migrating to XtraDB Cluster
PPTX
Migrating to XtraDB Cluster
PDF
Plny12 galera-cluster-best-practices
PDF
Galera Cluster 4 for MySQL 8 Release Webinar slides
PDF
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
PDF
High Availability with Galera Cluster - SkySQL Road Show 2013 in Berlin
PDF
MySQL Replication vs Galera_ which is better for your workload_.pptx_.pdf
PDF
The MySQL High Availability Landscape and where Galera Cluster fits in
PPTX
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
ODP
Do more with Galera Cluster in your OpenStack cloud
PDF
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
PDF
Using and Benchmarking Galera in different architectures (PLUK 2012)
PDF
[@NaukriEngineering] Introduction to Galera cluster
PDF
Robust ha solutions with proxysql
PDF
MySQL Galera 集群
PDF
Oss4b - pxc introduction
PDF
Percon XtraDB Cluster in a nutshell
PDF
Highly Available Load Balanced Galera MySql Cluster
PDF
M|18 Under the Hood: Galera Cluster
PPT
MySQL HA Percona cluster @ MySQL meetup Mumbai
Migrating to XtraDB Cluster
Migrating to XtraDB Cluster
Plny12 galera-cluster-best-practices
Galera Cluster 4 for MySQL 8 Release Webinar slides
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
High Availability with Galera Cluster - SkySQL Road Show 2013 in Berlin
MySQL Replication vs Galera_ which is better for your workload_.pptx_.pdf
The MySQL High Availability Landscape and where Galera Cluster fits in
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Do more with Galera Cluster in your OpenStack cloud
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
Using and Benchmarking Galera in different architectures (PLUK 2012)
[@NaukriEngineering] Introduction to Galera cluster
Robust ha solutions with proxysql
MySQL Galera 集群
Oss4b - pxc introduction
Percon XtraDB Cluster in a nutshell
Highly Available Load Balanced Galera MySql Cluster
M|18 Under the Hood: Galera Cluster
MySQL HA Percona cluster @ MySQL meetup Mumbai

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Tartificialntelligence_presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles - August'25-Week II
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Tartificialntelligence_presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology

Introduction to Galera Cluster

  • 2. 2 Created by Codership Oy   Our founders participated in 3 MySQL cluster developments, since 2003.   Started Galera work 2007. Based on PhD by Fernando Pedone.   1.0 in 2011. Percona & MariaDB in 2012.   Galera is free & open source. Support and consulting by Codership & partners.
  • 3. 3 Galera Galera in a nutshell   True multi-master: Read & write to any node   Synchronous replication   No slave lag   No integrity issues   No master-slave failovers or VIP needed   Multi-threaded slave, no performance penalty   Automatic node provisioning   Elastic: Easy scale-out & scale-in, all nodes read-write Master MasterMaster
  • 4. 4 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: innodb_flush_log_at_trx_commit > 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N Sysbench disk bound, 20GB data / 6GB InnoDB buffer, tps http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 5. 5 Galera vs other HA solutions Galera is like...   MySQL replication without integrity issues or slave lag   DRBD/SAN without failover downtime and performance penalty   Oracle RAC without failover downtime   NDB, but you get to keep InnoDB Galera NDB Failover downtime MySQL replication Slow Fast Dataintegrity DRBD 99 % 99.999...% PoorSolid RAC SAN Backups
  • 6. 6 Active-Active DB = best with Load Balancer   HA Proxy, GLB, Cisco, F5...   Pictured: Load balancer on each app server -  No Single Point of Failure -  One less layer of network components -  PHP and JDBC drivers provide this built-in! jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadBalanceBlacklistTimeout=5000   Or: Separate HW or SW load balancer -  Centralized administration -  What if LB fails? Galera MySQL MySQLMySQL LB LB
  • 7. 7 GaleraGalera Some other architectures MySQL MySQLMySQLMySQL MySQLMySQL VIP Whole stack cluster Virtual IP failover
  • 8. 8 Galera Quorum   Galera uses quorum based failure handling: -  When cluster partitioning is detected, the majority partition "has quorum" and can continue -  A minority partition cannot commit transactions, but will attempt to re-connect to primary partition -  Note: 50% is not majority! => Minimum 3 nodes recommended.   Load balancer will notice errors & remove node from pool MySQL MySQLMySQL LB LB
  • 9. 9 WAN replication   Works fine   Use higher timeouts and send windows   No impact on reads   No impact within a transaction   adds 100-300 ms to commit latency   No major impact on tps   Quorum between data centers -  3 data centers -  Distribute nodes evenly
  • 10. 10 WAN with MySQL asynchronous replication   You can mix Galera replication and MySQL replication   Good option on poor WAN   Remember to watch out for slave lag, etc...   "Channel failover" if a master node crashes   Mixed replication useful when you want async slave (such as time-delayed, filtered, multi- source...)
  • 11. 11 Who is using Galera?
  • 13. 13 Migration checklist   Are your tables InnoDB?   Make sure all tables have Primary Key   Watch out for Triggers and Events Tip: Don't do too many changes at once. Migrate to InnoDB first, run a month in production, then migrate to Galera.
  • 14. 14 MySQL A MySQL Galera cluster is... InnoDBMyISAM ReplicationAPI WsrepAPI SHOW STATUS LIKE "wsrep%" SHOW VARIABLES ... Galera group comm library MySQL MySQL Snapshot State Transfer mysqldump rsync xtrabackup etc... http://www.codership.com/downloads/download-mysqlgalera
  • 15. 15 Understanding the transaction sequence in Galera BEGIN Master Slave SELECT UPDATE COMMIT User transaction Certification Group communication => GTIDCertification COMMIT Apply commit return Commit delay Virtual synchrony = Committed events written to InnoDB after small delay Optimistic locking between nodes = Risk for deadlocks ROLLB InnoDB commit COMMIT discard Certification = deterministic InnoDB commit
  • 16. 16 What if I only have 2 nodes? Galera Arbitrator (garbd)   Acts as a 3rd node in a cluster but doesn't store the data.   Run it on an app server.   Run it on any other available server.   Note: Do not run a 3rd node in a VM on same hypervisor as other Galera nodes. (Why?) Master-slave clustering   Pacemaker, Heartbeat, etc... -  Manual failover?   Still better than MySQL replication or DRBD: Hot standby, multi-threaded slave...   Prioritize data integrity: set global wsrep_on=0 # (at failover)   Prioritize failover speed: pc.ignore_quorum=on # (at startup)
  • 17. 17 Optimistic locking cluster-wide   ...theoretical chance of deadlocks -  In most cases less than 1 out of 10.000 trx -  Correct solution: Catch exceptions in app and retry -  Design: Avoid hot-spots in tables -  Workaround: Directing all writes (or all problematic writes) to single node brings back 100% InnoDB compatibility
  • 18. 18 Snapshot options SST = Full snapshot   Mysqldump & rsync will block donor -  Dedicate 1 node to act as donor   Xtrabackup is a non-blocking option   Really big databases -  wsrep_sst_method=skip + manual backup & restore -  wsrep_sst_method=fedex :-) IST = Incremental State Transfer   Logic: IST is preferred over SST   gcache.size <= DB size gcache.size >= wsrep_replicated_bytes * <outage duration>
  • 20. 20 Sysbench disk bound (20GB data / 6GB buffer), tps   EC2 w local disk -  Note: pretty poor I/O here   Blue vs red: turning off innodb_flush_log_at_trx_com mit gives 66% improvement   Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N   5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 21. 21 Sysbench disk bound (20GB data / 6GB buffer), latency   As before   Not syncing InnoDB decreases latency   Scale-out decreases latency   Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
  • 22. 22 Galera and NDB shootout: sysbench "out of the box"   Galera is 4x better Ok, so what does this really mean?   That Galera is better... -  For this workload -  With default settings (Severalnines) -  Pretty user friendly and general purpose   NDB -  Excels at key-value and heavy-write workloads (which sysbench is not) -  Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth
  • 23. 23 Drupal on Galera: baseline w single server   Drupal, Apache, PHP, MySQL 5.1   JMeter -  3 types of users: poster, commenter, reader -  Gaussian (15, 7) think time   Large EC2 instance   Ideal scalability: linear until tipping point at 140-180 users -  Constrained by Apache/PHP CPU utilization -  Could scale out by adding more Apache in front of single MySQL http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 24. 24 Drupal on Galera: Scale-out with 1-4 Galera nodes (tps)   Drupal, Apache, PHP, MySQL 5.1 w Galera   1-4 identical nodes -  Whole stack cluster -  MySQL connection to localhost   Multiply nr of users -  180, 360, 540, 720   3 nodes = linear scalability, 4 nodes still near-linear   Minimal latency overhead http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 25. 25 Drupal on Galera: Scale-out with 1-4 Galera nodes (latency)   Like before   Constant nr of users -  180, 180, 180, 180   Scaling from 1 to 2 -  drastically reduces latency -  tps back to linear scalability   Scaling to 3 and 4 -  No more tps as there was no bottleneck. -  Slightly better latency -  Note: No overhead from additional nodes! http://codership.com/content/scaling-drupal-stack-galera-part-2-mystery-failed-login
  • 26. 26 WAN replication, EC2 eu-west + us-east, tps http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  • 27. 27 WAN replication, EC2 eu-west + us-east, latency http://codership.com/content/synchronous-replication-loves-you-again client eu-west db in us-east
  • 28. 28 Conclusion: WAN only adds commit latency, which is usually ok EU-west <-> US-east -  90 ms -  "best case" EU <-> JPN -  275 ms EU <-> JPN <-> USA -  295 ms You can choose latency between: -  user and web server = ok -  web server and db = bad -  db and db = great! http://codership.com/content/synchronous-replication-loves-you-again http://www.mysqlperformanceblog.com/2012/01/11/making-the-impossible-3-nodes-intercontinental-replication/