SlideShare a Scribd company logo
MariaDB Maxscale
Streaming Changes to Kafka
in Real Time
Markus Mäkelä
Massimiliano Pinto
What Is Real-Time Analytics?
How Real-Time Analytics Differs From Batch Analytics
Batch
Real-Time
Data oriented process
Scope is static
Data is complete
Output reflects input
Time oriented process
Scope is dynamic
Data is incremental
Output reflects changes in input
Change Data Capture
The MariaDB MaxScale CDC System
What Is Change Data Capture in MaxScale?
● Captures changes in committed data
○ MariaDB replication protocol awareness
● Stored as Apache Avro
○ Compact and efficient serialization format
● Simple data streaming service
○ Provides continuous data streams
What Does the CDC System Consist Of?
● Binlog replication relay (a.k.a Binlog Server)
● Data conversion service
● CDC protocol
● Kafka producer
Replication Proxy Layer
The Binlogrouter Module
Binlog Events
● The master database sends events from its binlog files
● Events sent are a binary representation of the binlog file
contents with a header prepended
● Once all events have been sent the master pauses until new
events are ready to be sent
Binlog Event details
378 | Gtid | 10122 | 420 | BEGIN GTID 0-11-10045
420 | Table_map | 10122 | 465 | table_id: 18 (test.t4)
465 | Write_rows_v1 | 10122 | 503 | table_id: 18 flags: STMT_END_F
503 | Xid | 10122 | 534 | COMMIT /* xid=823 */
Transaction -- TRX1
BEGIN;
INSERT INTO test.t4 VALUES (101);
COMMIT;
Binlog Events Receiving
mysql-bin.01045
Replication
Protocol
● MariaDB Replication Slave registration allows
MaxScale to receive binlog events from master
● Binlog events are stored in binlog files, same
way as master server does
Row based replication with full row image required on Master
set global binlog_format='ROW';
set global binlog_row_image='full';
MariaDB Master Server
MaxScale
Binlog Server
Binlog to Avro Conversion
The Avrorouter Module
Apache Avro™
● A data serialization format
○ Consists of a file header and one or more data blocks
● Specifies an Object Container file format
● Efficient storage of high volume data
○ Schema always stored with data
○ Compact integer representation
○ Supports compression
● Easy to process in parallel due to how the data blocks are stored
● Tooling for Avro is readily available
○ Easy to extract and load into other systems
Source: http://avro.apache.org/
Avro file conversion
mysql-bin.01045 AVRO_file_001
AVRO_file_002
AVRO converter
● Binlog files are converted to Avro file containers
○ one per database table
● On schema changes a new file sequence is created
● Tunable flows of events
#4
#2
#3
#1
Data Warehouse Platforms
Avro Schema
{
"type": "record",
"namespace": "MaxScaleChangeDataSchema.avro",
"name": "ChangeRecord",
"fields": ...
}
• Defines how the data is stored
• Contains some static fields
• MaxScale records always named as ChangeRecord in
MaxScaleChangeDataSchema.avro namespace
Avro Schema - Fields
"fields": [
{ "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" },
{ "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" },
{ "name": "timestamp", "type": "int" },
{ "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES",
"symbols": [ "insert", "update_before", "update_after", "delete" ] } },
… More fields …
]
• MaxScale adds six default fields
⬠ Three GTID components
⬠ Event index inside transaction
⬠ Event timestamp
⬠ Type of captured event
• A list of field information
• Constructed from standard Avro
data types
Avro Schema - Fields
"fields": [
{ "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" },
{ "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" },
{ "name": "timestamp", "type": "int" },
{ "name": "event_type", "type": {
"type": "enum",
"name": "EVENT_TYPES",
"symbols": [ "insert", "update_before", "update_after", "delete" ]
}
}
{ "name": "id", "type": "int", "real_type": "int", "length": -1},
{ "name": "data", "type": "string", "real_type": "varchar", "length": 255},
]
CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255));
Avro schema file db1.tbl1.000001.avsc
Data Streaming
The CDC Protocol
Data Streaming in MaxScale
• Provide real time transactional data to data lake for analytics
• Capture changed data from the binary log events
• From MariaDB to CDC clients in real-time
CDC Protocol
● Register as change data client
● Receive change data records
● Query last GTID
● Query change data record statistics
● One client receives an events stream for one table
CDC clients
Change Data
Listener Protocol
CDC Client
● Simple Python 3 command line client for the CDC protocol
● Continuous stream consumer
○ A building block for more complex systems
○ Outputs newline delimited JSON or raw Avro data
● Shipped as a part of MaxScale 2.0
CDC Client - Example Output
[alex@localhost ~]$ cdc.py --user umaxuser --password maxpwd db1.tbl1
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord",
"fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"},
{"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name":
"timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name":
"EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}},
{"name": "id", "type": "int", "real_type": "int", "length": -1},
{"name": "data", "type": "string", "real_type": "varchar", "length": 255}]}
• Schema is sent first
• Events come after the schema
• New schema sent if the schema changes
CDC Client - Example Output
{"sequence": 2, "server_id": 3000, "data": "Hello", "event_type": "insert", "id": 1, "domain": 0, "timestamp": 1490878875,
"event_number": 1}
{"sequence": 3, "server_id": 3000, "data": "world!", "event_type": "insert", "id": 2, "domain": 0, "timestamp": 1490878880,
"event_number": 1}
{"sequence": 4, "server_id": 3000, "data": "Hello", "event_type": "update_before", "id": 1, "domain": 0, "timestamp": 1490878914,
"event_number": 1}
{"sequence": 4, "server_id": 3000, "data": "Greetings", "event_type": "update_after", "id": 1, "domain": 0, "timestamp":
1490878914, "event_number": 2}
{"sequence": 5, "server_id": 3000, "data": "world!", "event_type": "delete", "id": 2, "domain": 0, "timestamp": 1490878929,
"event_number": 1}
INSERT INTO t1 (data) VALUES ("Hello"); -- TRX1
INSERT INTO t1 (data) VALUES ("world!"); -- TRX2
UPDATE t1 SET data = "Greetings" WHERE id = 1; -- TRX3
DELETE FROM t1 WHERE id = 2; -- TRX4
CDC Client - Example Output
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord",
"fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name":
"sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp",
"type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES",
"symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int",
"real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length":
255}, {"name": "account_balance", "type": "float", "real_type": "float", "length": -1}]}
{"domain": 0, "server_id": 3000, "sequence": 7, "event_number": 1, "timestamp": 1496682140,
"event_type": "insert", "id": 3, "data": "New Schema", "account_balance": 25.0}
ALTER TABLE t1 ADD COLUMN account_balance FLOAT;
INSERT INTO t1 (data, account_balance) VALUES ("New Schema", 25.0);
Kafka Producer
The CDC Kafka Producer
Why Kafka?
[vagrant@maxscale ~]$ ./bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic MyTopic
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name":
"server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"},
{"type": "int", "name": "timestamp"},
{"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"},
{"type": "int", "name": "id", "real_type": "int", "length": -1}], "name": "ChangeRecord"}
{"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1}
{"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2}
● Isolation of producers and consumers
○ Data can be produced and consumed at any
time
● Good for intermediate storage of streams
○ Data is stored until it is processed
○ Distributed storage makes data persistent
● Widely supported for real time analytics
○ Druid
○ Apache Storm
● Tooling for Kafka already exists
CDC Kafka Producer
● A Proof-of-Concept Kafka Producer
● Reads JSON generated by the MaxScale CDC Client
● Publishes JSON records to a Kafka cluster
● Simple usage
cdc.py -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 |
cdc_kafka_producer.py --kafka-broker=127.0.0.1:9092 --kafka-topic=MyTopic
Change Data
Listener Protocol
From MaxScale to Kafka
Kafka Producer
CDC Consumer/Kafka Producer
CDC Client
Binlog Server
Everything Together
mysql-bin.01045
AVRO_file_001
AVRO_file_002
AVRO converter
CDC clients
Change Data
Capture Listener
AVRO streaming
MariaDB
Master
MaxScale for Streaming Changes
MaxScale solution provides:
● Easy replication setup from MariaDB database
● Integrated and configurable Avro file conversion
● Easy data streaming to compatible solutions
● Ready to use Python scripts
Thank you

More Related Content

PDF
Sql query patterns, optimized
PDF
Optimizing MariaDB for maximum performance
PPT
MySQL Atchitecture and Concepts
PPTX
MySQL8.0_performance_schema.pptx
PDF
MySQL partitions tutorial
DOCX
MySQL_SQL_Tunning_v0.1.3.docx
PDF
InnoDB Flushing and Checkpoints
PDF
Redo log improvements MYSQL 8.0
Sql query patterns, optimized
Optimizing MariaDB for maximum performance
MySQL Atchitecture and Concepts
MySQL8.0_performance_schema.pptx
MySQL partitions tutorial
MySQL_SQL_Tunning_v0.1.3.docx
InnoDB Flushing and Checkpoints
Redo log improvements MYSQL 8.0

What's hot (20)

PDF
Using Optimizer Hints to Improve MySQL Query Performance
PDF
MySQL Administrator 2021 - 네오클로바
PDF
MySQL Performance Schema in Action
PDF
MySQL Index Cookbook
PDF
InnoDB Internal
PPTX
ProxySQL for MySQL
PDF
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
PDF
InnoDB Architecture and Performance Optimization, Peter Zaitsev
PDF
InnoDB MVCC Architecture (by 권건우)
PDF
MongoDB WiredTiger Internals: Journey To Transactions
PDF
Demystifying MySQL Replication Crash Safety
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
PDF
Understanding Presto - Presto meetup @ Tokyo #1
PPTX
Maxscale 소개 1.1.1
PDF
MySQL 8.0 Optimizer Guide
PDF
Introduction to Cassandra
PDF
MySQL Group Replication - Ready For Production? (2018-04)
PDF
MySQL Performance for DevOps
PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Using Optimizer Hints to Improve MySQL Query Performance
MySQL Administrator 2021 - 네오클로바
MySQL Performance Schema in Action
MySQL Index Cookbook
InnoDB Internal
ProxySQL for MySQL
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB MVCC Architecture (by 권건우)
MongoDB WiredTiger Internals: Journey To Transactions
Demystifying MySQL Replication Crash Safety
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Understanding Presto - Presto meetup @ Tokyo #1
Maxscale 소개 1.1.1
MySQL 8.0 Optimizer Guide
Introduction to Cassandra
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Performance for DevOps
The MySQL Query Optimizer Explained Through Optimizer Trace
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Ad

Similar to Streaming Operational Data with MariaDB MaxScale (20)

PDF
M|18 Real-time Analytics with the New Streaming Data Adapters
PDF
mypipe: Buffering and consuming MySQL changes via Kafka
PPTX
Change Data Capture in Scylla
PDF
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
PPTX
Change data capture
PDF
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
PDF
About "Apache Cassandra"
PPTX
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
PPTX
Data Stream Processing for Beginners with Kafka and CDC
PDF
Streaming ETL - from RDBMS to Dashboard with KSQL
PDF
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
PDF
Event-Driven Microservices: Back to the Basics
PDF
Data engineering Stl Big Data IDEA user group
PPTX
Evolving Streaming Applications
PPTX
Capture the Streams of Database Changes
PDF
JConWorld_ Continuous SQL with Kafka and Flink
PPTX
Data Architectures for Robust Decision Making
PDF
ksqlDB - Stream Processing simplified!
PPTX
Cloud storage
PPTX
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Real-time Analytics with the New Streaming Data Adapters
mypipe: Buffering and consuming MySQL changes via Kafka
Change Data Capture in Scylla
Evolve Your Schemas in a Better Way! A Deep Dive into Avro Schema Compatibili...
Change data capture
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
About "Apache Cassandra"
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Data Stream Processing for Beginners with Kafka and CDC
Streaming ETL - from RDBMS to Dashboard with KSQL
From bytes to objects: describing your events | Dale Lane and Kate Stanley, IBM
Event-Driven Microservices: Back to the Basics
Data engineering Stl Big Data IDEA user group
Evolving Streaming Applications
Capture the Streams of Database Changes
JConWorld_ Continuous SQL with Kafka and Flink
Data Architectures for Robust Decision Making
ksqlDB - Stream Processing simplified!
Cloud storage
M|18 Analyzing Data with the MariaDB AX Platform
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Electronic commerce courselecture one. Pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Approach and Philosophy of On baking technology
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Tartificialntelligence_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Approach and Philosophy of On baking technology
Group 1 Presentation -Planning and Decision Making .pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Tartificialntelligence_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II

Streaming Operational Data with MariaDB MaxScale

  • 1. MariaDB Maxscale Streaming Changes to Kafka in Real Time Markus Mäkelä Massimiliano Pinto
  • 2. What Is Real-Time Analytics? How Real-Time Analytics Differs From Batch Analytics
  • 3. Batch Real-Time Data oriented process Scope is static Data is complete Output reflects input Time oriented process Scope is dynamic Data is incremental Output reflects changes in input
  • 4. Change Data Capture The MariaDB MaxScale CDC System
  • 5. What Is Change Data Capture in MaxScale? ● Captures changes in committed data ○ MariaDB replication protocol awareness ● Stored as Apache Avro ○ Compact and efficient serialization format ● Simple data streaming service ○ Provides continuous data streams
  • 6. What Does the CDC System Consist Of? ● Binlog replication relay (a.k.a Binlog Server) ● Data conversion service ● CDC protocol ● Kafka producer
  • 7. Replication Proxy Layer The Binlogrouter Module
  • 8. Binlog Events ● The master database sends events from its binlog files ● Events sent are a binary representation of the binlog file contents with a header prepended ● Once all events have been sent the master pauses until new events are ready to be sent
  • 9. Binlog Event details 378 | Gtid | 10122 | 420 | BEGIN GTID 0-11-10045 420 | Table_map | 10122 | 465 | table_id: 18 (test.t4) 465 | Write_rows_v1 | 10122 | 503 | table_id: 18 flags: STMT_END_F 503 | Xid | 10122 | 534 | COMMIT /* xid=823 */ Transaction -- TRX1 BEGIN; INSERT INTO test.t4 VALUES (101); COMMIT;
  • 10. Binlog Events Receiving mysql-bin.01045 Replication Protocol ● MariaDB Replication Slave registration allows MaxScale to receive binlog events from master ● Binlog events are stored in binlog files, same way as master server does Row based replication with full row image required on Master set global binlog_format='ROW'; set global binlog_row_image='full'; MariaDB Master Server MaxScale Binlog Server
  • 11. Binlog to Avro Conversion The Avrorouter Module
  • 12. Apache Avro™ ● A data serialization format ○ Consists of a file header and one or more data blocks ● Specifies an Object Container file format ● Efficient storage of high volume data ○ Schema always stored with data ○ Compact integer representation ○ Supports compression ● Easy to process in parallel due to how the data blocks are stored ● Tooling for Avro is readily available ○ Easy to extract and load into other systems Source: http://avro.apache.org/
  • 13. Avro file conversion mysql-bin.01045 AVRO_file_001 AVRO_file_002 AVRO converter ● Binlog files are converted to Avro file containers ○ one per database table ● On schema changes a new file sequence is created ● Tunable flows of events #4 #2 #3 #1 Data Warehouse Platforms
  • 14. Avro Schema { "type": "record", "namespace": "MaxScaleChangeDataSchema.avro", "name": "ChangeRecord", "fields": ... } • Defines how the data is stored • Contains some static fields • MaxScale records always named as ChangeRecord in MaxScaleChangeDataSchema.avro namespace
  • 15. Avro Schema - Fields "fields": [ { "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" }, { "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" }, { "name": "timestamp", "type": "int" }, { "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES", "symbols": [ "insert", "update_before", "update_after", "delete" ] } }, … More fields … ] • MaxScale adds six default fields ⬠ Three GTID components ⬠ Event index inside transaction ⬠ Event timestamp ⬠ Type of captured event • A list of field information • Constructed from standard Avro data types
  • 16. Avro Schema - Fields "fields": [ { "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" }, { "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" }, { "name": "timestamp", "type": "int" }, { "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES", "symbols": [ "insert", "update_before", "update_after", "delete" ] } } { "name": "id", "type": "int", "real_type": "int", "length": -1}, { "name": "data", "type": "string", "real_type": "varchar", "length": 255}, ] CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)); Avro schema file db1.tbl1.000001.avsc
  • 18. Data Streaming in MaxScale • Provide real time transactional data to data lake for analytics • Capture changed data from the binary log events • From MariaDB to CDC clients in real-time
  • 19. CDC Protocol ● Register as change data client ● Receive change data records ● Query last GTID ● Query change data record statistics ● One client receives an events stream for one table CDC clients Change Data Listener Protocol
  • 20. CDC Client ● Simple Python 3 command line client for the CDC protocol ● Continuous stream consumer ○ A building block for more complex systems ○ Outputs newline delimited JSON or raw Avro data ● Shipped as a part of MaxScale 2.0
  • 21. CDC Client - Example Output [alex@localhost ~]$ cdc.py --user umaxuser --password maxpwd db1.tbl1 {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord", "fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int", "real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length": 255}]} • Schema is sent first • Events come after the schema • New schema sent if the schema changes
  • 22. CDC Client - Example Output {"sequence": 2, "server_id": 3000, "data": "Hello", "event_type": "insert", "id": 1, "domain": 0, "timestamp": 1490878875, "event_number": 1} {"sequence": 3, "server_id": 3000, "data": "world!", "event_type": "insert", "id": 2, "domain": 0, "timestamp": 1490878880, "event_number": 1} {"sequence": 4, "server_id": 3000, "data": "Hello", "event_type": "update_before", "id": 1, "domain": 0, "timestamp": 1490878914, "event_number": 1} {"sequence": 4, "server_id": 3000, "data": "Greetings", "event_type": "update_after", "id": 1, "domain": 0, "timestamp": 1490878914, "event_number": 2} {"sequence": 5, "server_id": 3000, "data": "world!", "event_type": "delete", "id": 2, "domain": 0, "timestamp": 1490878929, "event_number": 1} INSERT INTO t1 (data) VALUES ("Hello"); -- TRX1 INSERT INTO t1 (data) VALUES ("world!"); -- TRX2 UPDATE t1 SET data = "Greetings" WHERE id = 1; -- TRX3 DELETE FROM t1 WHERE id = 2; -- TRX4
  • 23. CDC Client - Example Output {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord", "fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int", "real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length": 255}, {"name": "account_balance", "type": "float", "real_type": "float", "length": -1}]} {"domain": 0, "server_id": 3000, "sequence": 7, "event_number": 1, "timestamp": 1496682140, "event_type": "insert", "id": 3, "data": "New Schema", "account_balance": 25.0} ALTER TABLE t1 ADD COLUMN account_balance FLOAT; INSERT INTO t1 (data, account_balance) VALUES ("New Schema", 25.0);
  • 24. Kafka Producer The CDC Kafka Producer
  • 25. Why Kafka? [vagrant@maxscale ~]$ ./bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic MyTopic {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name": "server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"}, {"type": "int", "name": "timestamp"}, {"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"}, {"type": "int", "name": "id", "real_type": "int", "length": -1}], "name": "ChangeRecord"} {"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1} {"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2} ● Isolation of producers and consumers ○ Data can be produced and consumed at any time ● Good for intermediate storage of streams ○ Data is stored until it is processed ○ Distributed storage makes data persistent ● Widely supported for real time analytics ○ Druid ○ Apache Storm ● Tooling for Kafka already exists
  • 26. CDC Kafka Producer ● A Proof-of-Concept Kafka Producer ● Reads JSON generated by the MaxScale CDC Client ● Publishes JSON records to a Kafka cluster ● Simple usage cdc.py -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 | cdc_kafka_producer.py --kafka-broker=127.0.0.1:9092 --kafka-topic=MyTopic
  • 27. Change Data Listener Protocol From MaxScale to Kafka Kafka Producer CDC Consumer/Kafka Producer CDC Client
  • 28. Binlog Server Everything Together mysql-bin.01045 AVRO_file_001 AVRO_file_002 AVRO converter CDC clients Change Data Capture Listener AVRO streaming MariaDB Master
  • 29. MaxScale for Streaming Changes MaxScale solution provides: ● Easy replication setup from MariaDB database ● Integrated and configurable Avro file conversion ● Easy data streaming to compatible solutions ● Ready to use Python scripts