SlideShare a Scribd company logo
Patroni:
Kubernetes-native
PostgreSQL companion
PGConf APAC 2018
Singapore
ALEXANDER KUKUSHKIN
23-03-2018
2
ABOUT ME
Alexander Kukushkin
Database Engineer @ZalandoTech
Email: alexander.kukushkin@zalando.de
Twitter: @cyberdemn
3
ZALANDO
15 markets
6 fulfillment centers
20 million active customers
3.6 billion € net sales 2016
165 million visits per month
12,000 employees in Europe
4
FACTS & FIGURES
> 300 databases
on premise
> 150
on AWS EC2
> 200
on K8S
5
Bot pattern and Patroni
Postgres-operator
Patroni on Kubernetes, first attempt
Kubernetes-native Patroni
Live-demo
AGENDA
6
● small python daemon
● implements “bot” pattern
● runs next to PostgreSQL
● decides on promotion/demotion
● uses DCS to run leader election and keep cluster state
Bot pattern and Patroni
7
● Distributed Consensus/Configuration Store (Key-Value)
● Uses RAFT (Etcd, Consul) or ZAB (ZooKeeper)
● Write succeed only if majority of nodes acknowledge it
(quorum)
● Supports Atomic operations (CompareAndSet)
● Can expire objects after TTL
http://thesecretlivesofdata.com/raft/
DCS
8
Bot pattern: leader alive
Primary
NODE A
Standby
NODE B
Standby
NODE C
UPDATE(“/leader”, “A”, ttl=30,
prevValue=”A”)Success
WATCH (/leader)
WATCH (/leader)
/leader: “A”, ttl: 30
9
Bot pattern: master dies, leader key holds
Primary
Standby
Standby
WATCH (/leader)
WATCH (/leader)
/leader: “A”, ttl: 17
NODE A
NODE B
NODE C
10
Bot pattern: leader key expires
Standby
Standby
Notify (/leader, expired=true)
Notify (/leader, expired=true)
/leader: “A”, ttl: 0
NODE B
NODE C
11
Bot pattern: who will be the next master?
Standby
Standby
Node B:
GET A:8008/patroni -> failed/timeout
GET C:8008/patroni -> wal_position: 100
Node C:
GET A:8008/patroni -> failed/timeout
GET B:8008/patroni -> wal_position: 100
NODE B
NODE C
12
Bot pattern: leader race among equals
Standby
Standby
/leader: “C”, ttl: 30
CREATE (“/leader”, “C”,
ttl=30, prevExists=False)
CREATE (“/leader”, “B”,
ttl=30, prevExists=False)
FAIL
SUCCESS
NODE B
NODE C
13
Bot pattern: promote and continue
replication
Standby
Primary
/leader: “C”, ttl: 30WATCH(/leader
)
promote
NODE B
NODE C
14
DCS STRUCTURE
● /service/cluster-name/
○ config {"postgresql":{"parameters":{"max_connections":300}}}
○ initialize ”6303731710761975832” (database system identifier)
○ members/
■ dbnode1 {"role":"replica","state":"running”,"conn_url":"postgres://172.17.0.2:5432/postgres"}
■ dbnode2 {"role":"master","state":"running”,"conn_url":"postgres://172.17.0.3:5432/postgres"}
○ leader dbnode2
○ optime/
■ leader “67393608” # ← absolute wal positition
15
AWS DEPLOYMENT
16
“Kubernetes is an open-source system for automating deployment, scaling,
and management of containerized applications.
It groups containers that make up an application into logical units (Pods) for
easy management and discovery. Kubernetes builds upon 15 years of
experience of running production workloads at Google, combined with
best-of-breed ideas and practices from the community.”
kubernetes.io
KUBERNETES
17
Spilo & Patroni on K8S v1
Node
Pod: demo-0
role: replica
PersistentVolume
PersistentVolume
Node
Pod: demo-1
role: master
StatefulSet: demo
Secret: demoUPDATE()
WATCH()
Service: demo-replica
labelSelector: role=replica
Service: demo
labelSelector: role=master
18
Spilo & Patroni on K8S v1
● We will deploy Etcd on Kubernetes
● Depoy Spilo with PetSet (old name for StatefulSet)
● And quickly hack a callback script for Patroni, which will
label the Pod we are running in with the current role
(master, replica)
● And use Services with labelSelectors for traffic routing
19
Can we get rid from Etcd?
● Use labelSelector to find all Kubernetes objects
associated with the given cluster
○ Pods - cluster members
○ ConfigMaps or Endpoints to keep configuration
● Every iteration of HA loop we will update labels and
metadata on the objects (the same way as we updating
keys in Etcd)
● It is even possible to do CAS operation using K8S API
20
No K8S API for expiring objects
How to do leader election?
21
Do it on the client side!
● Leader should periodically update ConfigMap or Endpoint
○ Update must happen as CAS operation
○ Demote to read-only in case of failure
● All other members should check that leader ConfigMap (or
Endpoint) is being updated
○ If there are no updates during TTL => do leader election
22
Kubernetes-native Patroni
Node
Pod: demo-0
role: replica
PersistentVolume
PersistentVolume
Node
Pod: demo-1
role: master
StatefulSet: demo
Endpoint: demo Service: demo
Secret: demo
UPDATE()
W
ATCH()
Endpoint: demo-config
Service: demo-replica
labelSelector: role=replica
23
DEMO TIME
24
● No dependency on Etcd
● When using Endpoint for leader
election we can also maintain
subsets with the IP of the
leader Pod
● 100% Kubernetes-native
solution
Kubernetes API as DCS
CONSPROS
● Can’t tolerante arbitrary clock
skew rate
● OpenShift doesn’t allow to put
IP from the Pods rage into the
Endpoint
● SLA for K8S API on GCE
prommiss only 99.5% availability
25
DEPLOYMENT
26
How to deploy it
● kubectl create -f your-cluster.yaml
● Use Patroni Helm Chart + Spilo
● Use postgres-operator
27
POSTGRES-OPERATOR
● Creates CustomResourceDefinition Postgresql and watches it
● When new Postgresql object is created - deploys a new cluster
○ Creates Secrets, Endpoints, Services and StatefulSet
● When Postgresql object is updated - updates StatefulSet
○ and does a rolling upgrade
● Periodically syncs running clusters with the manifests
● When Postgresql object is deleted - cleans everything up
28
DEPLOYMENT WITH OPERATOR
29
CLUSTER STATUS
30
PostgreSQL
manifest
Stateful set
Spilo pod
Kubernetes cluster
PATRONI
Postgres
operator
pod
Endpoint
Service
Client
application
Postgres
operator
config mapCluster
secrets
Database
deployer
create
create
create
watch
deploy
Update with
actual master
role
31
Monitoring & Backups
● Things to monitor:
○ Pods status (via K8S API)
○ Patroni & PostgreSQL state
○ Replication state and lag
● Always do Backups!
○ And always test them!
GET http://$POD_IP:8008/patroni
for every Pod in the cluster, check
that state=running and compare
xlog_position with the master
32
Our learnings
● We run Kubernetes on top of AWS infrastructure
○ Availability of K8S API in our case is very close to 100%
○ PersistentVolume (EBS) attach/detach sometimes buggy and slow
● Kubernetes cluster upgrade
○ Require rotating all nodes and can cause multiple switchovers
■ Thanks to postgres-operator it is solved, now we need only one
● Kubernetes node autoscaler
○ Sometimes terminates the nodes were Spilo/Patroni/PostgreSQL runs
■ Patroni handles it gracefully, by doing a switchover
33
LINKS
● Patroni: https://github.com/zalando/patroni
● Patroni Documentation: https://patroni.readthedocs.io
● Spilo: https://github.com/zalando/spilo
● Helm chart: https://github.com/unguiculus/charts/tree/feature/patroni/incubator/patroni
● Postgres-operator: https://github.com/zalando-incubator/postgres-operator
Thank you!

More Related Content

PDF
Patroni - HA PostgreSQL made easy
PDF
High Availability PostgreSQL with Zalando Patroni
PDF
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
PDF
PostgreSQL HA
PDF
PostgreSQL High Availability in a Containerized World
PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
PDF
Galera cluster for high availability
Patroni - HA PostgreSQL made easy
High Availability PostgreSQL with Zalando Patroni
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
PostgreSQL HA
PostgreSQL High Availability in a Containerized World
All about Zookeeper and ClickHouse Keeper.pdf
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
Galera cluster for high availability

What's hot (20)

PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PPTX
Cloud Native PostgreSQL
 
PDF
Producer Performance Tuning for Apache Kafka
PDF
Automation with ansible
PPTX
Everything You Need To Know About Persistent Storage in Kubernetes
PPTX
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
PDF
Building an Observability platform with ClickHouse
PDF
2019.06.27 Intro to Ceph
PDF
AvailabilityZoneとHostAggregate
PDF
Upgrade from MySQL 5.7 to MySQL 8.0
PDF
Mastering PostgreSQL Administration
 
PDF
BlueStore, A New Storage Backend for Ceph, One Year In
PDF
MySQL Administrator 2021 - 네오클로바
PDF
AWS Aurora 운영사례 (by 배은미)
PPTX
Prometheus and Grafana
PDF
Linux tuning to improve PostgreSQL performance
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PDF
Kubernetes a comprehensive overview
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Better than you think: Handling JSON data in ClickHouse
The Full MySQL and MariaDB Parallel Replication Tutorial
Cloud Native PostgreSQL
 
Producer Performance Tuning for Apache Kafka
Automation with ansible
Everything You Need To Know About Persistent Storage in Kubernetes
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
Building an Observability platform with ClickHouse
2019.06.27 Intro to Ceph
AvailabilityZoneとHostAggregate
Upgrade from MySQL 5.7 to MySQL 8.0
Mastering PostgreSQL Administration
 
BlueStore, A New Storage Backend for Ceph, One Year In
MySQL Administrator 2021 - 네오클로바
AWS Aurora 운영사례 (by 배은미)
Prometheus and Grafana
Linux tuning to improve PostgreSQL performance
[pgday.Seoul 2022] PostgreSQL with Google Cloud
Kubernetes a comprehensive overview
Ad

Similar to Patroni: Kubernetes-native PostgreSQL companion (20)

PDF
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PDF
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
PDF
515_Patroni-training_postgres_high_availability.pdf
PPTX
Kubernetes Immersion
TXT
Live issues resolution on Kubernates Cluster
PPTX
Kubernetes-introduction to kubernetes for beginers.pptx
PDF
Kubernetes as a Concrete Abstraction Layer
PDF
Kubernetes from scratch at veepee sysadmins days 2019
PDF
Deploying PostgreSQL on Kubernetes
PPTX
Kubernetes Internals
PPTX
Kubernetes #1 intro
PDF
Kubernetes Architecture - beyond a black box - Part 1
PPTX
A brief study on Kubernetes and its components
PDF
Kubernetes From Scratch .pdf
PDF
Kubernetes for Java developers
PDF
Cluster management with Kubernetes
PPTX
Observability and Orchestration of your GitOps Deployments with Keptn
PDF
Cloud Native PostgreSQL - APJ
 
PPTX
DevOps with Kubernetes
PPTX
Kubernetes20151017a
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
515_Patroni-training_postgres_high_availability.pdf
Kubernetes Immersion
Live issues resolution on Kubernates Cluster
Kubernetes-introduction to kubernetes for beginers.pptx
Kubernetes as a Concrete Abstraction Layer
Kubernetes from scratch at veepee sysadmins days 2019
Deploying PostgreSQL on Kubernetes
Kubernetes Internals
Kubernetes #1 intro
Kubernetes Architecture - beyond a black box - Part 1
A brief study on Kubernetes and its components
Kubernetes From Scratch .pdf
Kubernetes for Java developers
Cluster management with Kubernetes
Observability and Orchestration of your GitOps Deployments with Keptn
Cloud Native PostgreSQL - APJ
 
DevOps with Kubernetes
Kubernetes20151017a
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Tartificialntelligence_presentation.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
A Presentation on Artificial Intelligence
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Tartificialntelligence_presentation.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A Presentation on Artificial Intelligence
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II

Patroni: Kubernetes-native PostgreSQL companion

  • 1. Patroni: Kubernetes-native PostgreSQL companion PGConf APAC 2018 Singapore ALEXANDER KUKUSHKIN 23-03-2018
  • 2. 2 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: [email protected] Twitter: @cyberdemn
  • 3. 3 ZALANDO 15 markets 6 fulfillment centers 20 million active customers 3.6 billion € net sales 2016 165 million visits per month 12,000 employees in Europe
  • 4. 4 FACTS & FIGURES > 300 databases on premise > 150 on AWS EC2 > 200 on K8S
  • 5. 5 Bot pattern and Patroni Postgres-operator Patroni on Kubernetes, first attempt Kubernetes-native Patroni Live-demo AGENDA
  • 6. 6 ● small python daemon ● implements “bot” pattern ● runs next to PostgreSQL ● decides on promotion/demotion ● uses DCS to run leader election and keep cluster state Bot pattern and Patroni
  • 7. 7 ● Distributed Consensus/Configuration Store (Key-Value) ● Uses RAFT (Etcd, Consul) or ZAB (ZooKeeper) ● Write succeed only if majority of nodes acknowledge it (quorum) ● Supports Atomic operations (CompareAndSet) ● Can expire objects after TTL http://thesecretlivesofdata.com/raft/ DCS
  • 8. 8 Bot pattern: leader alive Primary NODE A Standby NODE B Standby NODE C UPDATE(“/leader”, “A”, ttl=30, prevValue=”A”)Success WATCH (/leader) WATCH (/leader) /leader: “A”, ttl: 30
  • 9. 9 Bot pattern: master dies, leader key holds Primary Standby Standby WATCH (/leader) WATCH (/leader) /leader: “A”, ttl: 17 NODE A NODE B NODE C
  • 10. 10 Bot pattern: leader key expires Standby Standby Notify (/leader, expired=true) Notify (/leader, expired=true) /leader: “A”, ttl: 0 NODE B NODE C
  • 11. 11 Bot pattern: who will be the next master? Standby Standby Node B: GET A:8008/patroni -> failed/timeout GET C:8008/patroni -> wal_position: 100 Node C: GET A:8008/patroni -> failed/timeout GET B:8008/patroni -> wal_position: 100 NODE B NODE C
  • 12. 12 Bot pattern: leader race among equals Standby Standby /leader: “C”, ttl: 30 CREATE (“/leader”, “C”, ttl=30, prevExists=False) CREATE (“/leader”, “B”, ttl=30, prevExists=False) FAIL SUCCESS NODE B NODE C
  • 13. 13 Bot pattern: promote and continue replication Standby Primary /leader: “C”, ttl: 30WATCH(/leader ) promote NODE B NODE C
  • 14. 14 DCS STRUCTURE ● /service/cluster-name/ ○ config {"postgresql":{"parameters":{"max_connections":300}}} ○ initialize ”6303731710761975832” (database system identifier) ○ members/ ■ dbnode1 {"role":"replica","state":"running”,"conn_url":"postgres://172.17.0.2:5432/postgres"} ■ dbnode2 {"role":"master","state":"running”,"conn_url":"postgres://172.17.0.3:5432/postgres"} ○ leader dbnode2 ○ optime/ ■ leader “67393608” # ← absolute wal positition
  • 16. 16 “Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units (Pods) for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.” kubernetes.io KUBERNETES
  • 17. 17 Spilo & Patroni on K8S v1 Node Pod: demo-0 role: replica PersistentVolume PersistentVolume Node Pod: demo-1 role: master StatefulSet: demo Secret: demoUPDATE() WATCH() Service: demo-replica labelSelector: role=replica Service: demo labelSelector: role=master
  • 18. 18 Spilo & Patroni on K8S v1 ● We will deploy Etcd on Kubernetes ● Depoy Spilo with PetSet (old name for StatefulSet) ● And quickly hack a callback script for Patroni, which will label the Pod we are running in with the current role (master, replica) ● And use Services with labelSelectors for traffic routing
  • 19. 19 Can we get rid from Etcd? ● Use labelSelector to find all Kubernetes objects associated with the given cluster ○ Pods - cluster members ○ ConfigMaps or Endpoints to keep configuration ● Every iteration of HA loop we will update labels and metadata on the objects (the same way as we updating keys in Etcd) ● It is even possible to do CAS operation using K8S API
  • 20. 20 No K8S API for expiring objects How to do leader election?
  • 21. 21 Do it on the client side! ● Leader should periodically update ConfigMap or Endpoint ○ Update must happen as CAS operation ○ Demote to read-only in case of failure ● All other members should check that leader ConfigMap (or Endpoint) is being updated ○ If there are no updates during TTL => do leader election
  • 22. 22 Kubernetes-native Patroni Node Pod: demo-0 role: replica PersistentVolume PersistentVolume Node Pod: demo-1 role: master StatefulSet: demo Endpoint: demo Service: demo Secret: demo UPDATE() W ATCH() Endpoint: demo-config Service: demo-replica labelSelector: role=replica
  • 24. 24 ● No dependency on Etcd ● When using Endpoint for leader election we can also maintain subsets with the IP of the leader Pod ● 100% Kubernetes-native solution Kubernetes API as DCS CONSPROS ● Can’t tolerante arbitrary clock skew rate ● OpenShift doesn’t allow to put IP from the Pods rage into the Endpoint ● SLA for K8S API on GCE prommiss only 99.5% availability
  • 26. 26 How to deploy it ● kubectl create -f your-cluster.yaml ● Use Patroni Helm Chart + Spilo ● Use postgres-operator
  • 27. 27 POSTGRES-OPERATOR ● Creates CustomResourceDefinition Postgresql and watches it ● When new Postgresql object is created - deploys a new cluster ○ Creates Secrets, Endpoints, Services and StatefulSet ● When Postgresql object is updated - updates StatefulSet ○ and does a rolling upgrade ● Periodically syncs running clusters with the manifests ● When Postgresql object is deleted - cleans everything up
  • 30. 30 PostgreSQL manifest Stateful set Spilo pod Kubernetes cluster PATRONI Postgres operator pod Endpoint Service Client application Postgres operator config mapCluster secrets Database deployer create create create watch deploy Update with actual master role
  • 31. 31 Monitoring & Backups ● Things to monitor: ○ Pods status (via K8S API) ○ Patroni & PostgreSQL state ○ Replication state and lag ● Always do Backups! ○ And always test them! GET http://$POD_IP:8008/patroni for every Pod in the cluster, check that state=running and compare xlog_position with the master
  • 32. 32 Our learnings ● We run Kubernetes on top of AWS infrastructure ○ Availability of K8S API in our case is very close to 100% ○ PersistentVolume (EBS) attach/detach sometimes buggy and slow ● Kubernetes cluster upgrade ○ Require rotating all nodes and can cause multiple switchovers ■ Thanks to postgres-operator it is solved, now we need only one ● Kubernetes node autoscaler ○ Sometimes terminates the nodes were Spilo/Patroni/PostgreSQL runs ■ Patroni handles it gracefully, by doing a switchover
  • 33. 33 LINKS ● Patroni: https://github.com/zalando/patroni ● Patroni Documentation: https://patroni.readthedocs.io ● Spilo: https://github.com/zalando/spilo ● Helm chart: https://github.com/unguiculus/charts/tree/feature/patroni/incubator/patroni ● Postgres-operator: https://github.com/zalando-incubator/postgres-operator