SlideShare a Scribd company logo
PostgreSQL + Kafka
The Delight of Change Data Capture
Jeff Klukas - Data Engineer at Simple
1
2
Overview
Commit logs: what are they?
Write-ahead logging (WAL)
Commit logs as a data store
Demo: change data capture
Use cases
3
https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
Commit Logs
4
Ordered Immutable Durable
Commit Logs
5
Commit Logs
Ordered Immutable Durable
In practice, old logs can be deleted or archived
6
Write-Ahead Logging (WAL)
7
– https://www.postgresql.org/docs/current/static/wal-intro.html
“WAL's central concept is that changes to
data files (where tables and indexes reside)
must be written only after those changes
have been logged, that is, after log records
describing the changes have been flushed to
permanent storage”
8
– https://www.postgresql.org/docs/9.4/static/logicaldecoding-explanation.html
“Logical decoding is the process of
extracting all persistent changes to a
database's tables into a coherent, easy to
understand format which can be interpreted
without detailed knowledge of the
database's internal state.”
9
10
Topic Partitions
11
Topics
12
Compacted Topics
13
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
14
INSERT INTO transactions
VALUES (56789, 20.00);
{
"transaction_id": {"int": 56789},
"amount": {"double": 20.00}
}
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
15
UPDATE transactions
SET amount = 25.00
WHERE transaction_id = 56789;
{
"transaction_id": {"int": 56789},
"amount": {"double": 25.00}
}
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
16
DELETE FROM transactions
WHERE transaction_id = 56789;
null
Bottled Water - Message Key
{ "transaction_id": { "int": 56789 } }
Bottled Water - Message Value
17
tx-service
tx-postgres
Use Cases
18
tx-service
tx-postgres
tx-pgkafka
Kafka topic: tx-pgkafka
19
tx-service
tx-postgres
tx-pgkafka
demux-service
Kafka topic: tx-pgkafka
20
tx-service
tx-postgres
tx-pgkafka
demux-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
21
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
22
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift
(Data Warehouse)
Amazon S3
(Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
23
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift
(Data Warehouse)
Amazon S3
(Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Change Data Capture
24
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift
(Data Warehouse)
Amazon S3
(Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Messaging
25
tx-service
tx-postgres
tx-pgkafka
demux-service
activity-service
activity-postgres
activity-pgkafka
Amazon Redshift
(Data Warehouse)
Amazon S3
(Data Lake)
analytics-service
Kafka topic: tx-pgkafka
Kafka topic: customers-table
Kafka topic: transactions-table
Kafka topic: activity-pgkafka
Analytics
26
Recap
Commit logs: what are they?
Write-ahead logging (WAL)
Commit logs as a data store
Demo: change data capture
Use cases
27
• Blog post on Simple’s CDC pipeline
• https://www.simple.com/engineering
• Bottled Water: https://github.com/confluentinc/bottledwater-pg
• Debezium (CDC to Kafka from Postgres, MySQL, or MongoDB)
• http://debezium.io/
• https://wecode.wepay.com/posts/streaming-databases-in-
realtime-with-mysql-debezium-kafka
• https://www.confluent.io/kafka-summit-sf17/
• Martin Kleppmann, Making Sense of Stream Processing eBook
Also See…
Thank You
28
Extras
29
30
The Dual Write Problem
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
31
Redshift Architecture
Amazon
Redshift
Replicating to Redshift
32
33
Table Schema
CREATE TABLE pgkafka_txservice_transactions (
pg_lsn NUMERIC(20,0) ENCODE raw,
pg_txn_id BIGINT ENCODE lzo,
pg_operation CHAR(6) ENCODE bytedict,
pg_txn_timestamp TIMESTAMP ENCODE lzo,
ingestion_timestamp TIMESTAMP ENCODE lzo,
transaction_id INT ENCODE lzo,
amount NUMERIC(18,2) ENCODE lzo
)
DISTKEY transaction_id
SORTKEY (transaction_id, pg_lsn, pg_operation);
Amazon
Redshift
34
Deduplication
CREATE TABLE deduped LIKE pgkafka_txservice_transactions;
INSERT INTO deduped SELECT * FROM (
SELECT *, ROW_NUMBER()
OVER (PARTITION BY pg_lsn ORDER BY ingestion_timestamp DESC)
FROM pgkafka_txservice_transactions
) WHERE row_number = 1;
DROP TABLE pgkafka_txservice_transactions;
ALTER TABLE deduped RENAME TO pgkafka_txservice_transactions;
Amazon
Redshift
35
View of Current State
CREATE VIEW current_txservice_transactions AS
SELECT transaction_id, amount,
FROM (
SELECT *, ROW_NUMBER()
OVER (PARTITION BY transaction_id
ORDER BY pg_lsn, pg_operation) AS n,
COUNT(*)
OVER (PARTITION BY transaction_id ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS c
FROM pgkafka_txservice_transactions)
WHERE n = c
AND pg_operation <> 'delete';
Amazon
Redshift

More Related Content

PPTX
PySpark dataframe
PDF
Spark shuffle introduction
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
PPTX
Kafka 101
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PDF
Parquet and AVRO
PySpark dataframe
Spark shuffle introduction
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Kafka 101
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Parquet and AVRO

What's hot (20)

PPTX
What to Expect From Oracle database 19c
PDF
Understanding oracle rac internals part 1 - slides
PDF
Deep Dive: Memory Management in Apache Spark
PDF
Grafana Loki: like Prometheus, but for Logs
PPTX
Oracle GoldenGate 21c New Features and Best Practices
PDF
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
PPTX
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Introduction to Apache NiFi 1.11.4
PDF
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PPTX
A Deep Dive Into Understanding Apache Cassandra
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
Seastore: Next Generation Backing Store for Ceph
PPTX
Introduction to Apache Kafka
PDF
From my sql to postgresql using kafka+debezium
PPTX
Hive 3 - a new horizon
PPTX
Hive + Tez: A Performance Deep Dive
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PPTX
Kafka
What to Expect From Oracle database 19c
Understanding oracle rac internals part 1 - slides
Deep Dive: Memory Management in Apache Spark
Grafana Loki: like Prometheus, but for Logs
Oracle GoldenGate 21c New Features and Best Practices
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Introduction to Apache NiFi 1.11.4
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Scaling your Data Pipelines with Apache Spark on Kubernetes
A Deep Dive Into Understanding Apache Cassandra
Apache NiFi in the Hadoop Ecosystem
Seastore: Next Generation Backing Store for Ceph
Introduction to Apache Kafka
From my sql to postgresql using kafka+debezium
Hive 3 - a new horizon
Hive + Tez: A Performance Deep Dive
APACHE KAFKA / Kafka Connect / Kafka Streams
Kafka
Ad

Viewers also liked (16)

PDF
Square's Machine Learning Infrastructure and Applications - Rong Yan
PDF
Machine learning in production
PPTX
Managing and Versioning Machine Learning Models in Python
PDF
Serverless machine learning operations
PPTX
Production machine learning_infrastructure
PDF
Python as part of a production machine learning stack by Michael Manapat PyDa...
PDF
Multi runtime serving pipelines for machine learning
PDF
Building A Production-Level Machine Learning Pipeline
PPTX
Production and Beyond: Deploying and Managing Machine Learning Models
PDF
Machine learning in production with scikit-learn
PDF
Using PySpark to Process Boat Loads of Data
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PPTX
Machine Learning In Production
PDF
Machine Learning Pipelines
PDF
Spark and machine learning in microservices architecture
PPTX
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Square's Machine Learning Infrastructure and Applications - Rong Yan
Machine learning in production
Managing and Versioning Machine Learning Models in Python
Serverless machine learning operations
Production machine learning_infrastructure
Python as part of a production machine learning stack by Michael Manapat PyDa...
Multi runtime serving pipelines for machine learning
Building A Production-Level Machine Learning Pipeline
Production and Beyond: Deploying and Managing Machine Learning Models
Machine learning in production with scikit-learn
Using PySpark to Process Boat Loads of Data
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Machine Learning In Production
Machine Learning Pipelines
Spark and machine learning in microservices architecture
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Ad

Similar to PostgreSQL + Kafka: The Delight of Change Data Capture (20)

PPTX
Capture the Streams of Database Changes
PPTX
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
PDF
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PPTX
kafka for db as postgres
PPTX
The Future of Data Engineering - 2019 InfoQ QConSF
PDF
Introduction to Kafka Streams
PDF
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
PPTX
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
PDF
10 essentials steps for kafka streaming services
PDF
First Steps with Apache Kafka on Google Cloud Platform
PDF
The State of Stream Processing
PPTX
Software architecture for data applications
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
PDF
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Introducing Change Data Capture with Debezium
PDF
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
PDF
Kafka as your Data Lake - is it Feasible?
Capture the Streams of Database Changes
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
kafka for db as postgres
The Future of Data Engineering - 2019 InfoQ QConSF
Introduction to Kafka Streams
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
10 essentials steps for kafka streaming services
First Steps with Apache Kafka on Google Cloud Platform
The State of Stream Processing
Software architecture for data applications
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Introducing Change Data Capture with Debezium
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Kafka as your Data Lake - is it Feasible?

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
Advanced IT Governance
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Sensors and Actuators in IoT Systems using pdf
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Transforming Manufacturing operations through Intelligent Integrations
Advanced Soft Computing BINUS July 2025.pdf
Chapter 2 Digital Image Fundamentals.pdf
Advanced IT Governance
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Sensors and Actuators in IoT Systems using pdf

PostgreSQL + Kafka: The Delight of Change Data Capture