SlideShare a Scribd company logo
From the Trenches:
Improving Kafka Connect Source
Connector Ingestion from 7 Hours to 30
Minutes
Improving Kafka Connect Ingestion
K af k a Summi t Lond on 20 24 – Raf ae l N at al i
Kafka Summit London 2024
Rafael Natali
/rafaelnatali
@rafaelmnatali
marionete.co.uk
PROBLEMS
INNEFICIENT
SLOW
HOPELESS
FAULTY
SLUGGISH
UNWORKABLE
USELESS
INVESTIGATION
MONITORING DOCUMENTATION
Enable JMX Metrics
Integrate Prometheus + Grafana
Overall view of the Kafka Connect
https://www.confluent.io/en-gb/blog/how-to-increase-throughput-on-kafka-connect-source-connectors/
RECORDSSENDTOTAL
17:00h 00:00h
20,000,000
<16kb*
BATCH.SIZEAVG
*KafkaBrokerdefaultvalue
35
RECORDSPERREQUESTAVG
35
<16kb
7h
ASSUMPTION
Increasingthebatch.sizewillmake
theingestionfaster.
TESTING
BATCH.SIZEINCREASE
batch.size = number of records * record size average in bytes
"producer.override.batch.size": 739500
batch.size = 1500 * 493 bytes
batch.size = 739500 bytes
RESULTS
1500
RECORDSPERREQUESTAVG
600kb
BATCH.SIZEAVG
RECORDSSENDTOTAL
09:15 09:45h
20,000,000
SUMMARY
7h 30min
<16kb 600Kb
35 1500

More Related Content

PDF
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
PDF
VoxxedDays Minsk - Building scalable WebSocket backend
PPTX
Californium: Scalable Cloud Services for the Internet of Things with CoAP
PDF
Stream Processing using Apache Spark and Apache Kafka
PPT
Kafka Explainaton
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
VoxxedDays Minsk - Building scalable WebSocket backend
Californium: Scalable Cloud Services for the Internet of Things with CoAP
Stream Processing using Apache Spark and Apache Kafka
Kafka Explainaton
Kafka Connect & Streams - the ecosystem around Kafka

Similar to From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 Hours to 30 Minutes (20)

PDF
Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection ...
PDF
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
PDF
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
PDF
Amsterdam meetup at ING June 18, 2019
PDF
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...
PDF
Mininet: Moving Forward
PDF
Yet another json rpc library (mole rpc)
PDF
Apache Kafka - Scalable Message Processing and more!
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
PDF
From my sql to postgresql using kafka+debezium
PDF
Spring 5 Webflux - Advances in Java 2018
PDF
Microservices with Spring 5 Webflux - jProfessionals
PPTX
Kick Your Database to the Curb
PDF
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
PDF
Introducing Change Data Capture with Debezium
PDF
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
PDF
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
PDF
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
PDF
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Case-Study: Building Real-Time Applications at Scale-Cyclist Crash Detection ...
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Amsterdam meetup at ING June 18, 2019
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...
Mininet: Moving Forward
Yet another json rpc library (mole rpc)
Apache Kafka - Scalable Message Processing and more!
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
From my sql to postgresql using kafka+debezium
Spring 5 Webflux - Advances in Java 2018
Microservices with Spring 5 Webflux - jProfessionals
Kick Your Database to the Curb
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Introducing Change Data Capture with Debezium
MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
PDF
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Ad

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
SOPHOS-XG Firewall Administrator PPT.pptx
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
A comparative analysis of optical character recognition models for extracting...
Group 1 Presentation -Planning and Decision Making .pptx
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Tartificialntelligence_presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
TLE Review Electricity (Electricity).pptx
Digital-Transformation-Roadmap-for-Companies.pptx
OMC Textile Division Presentation 2021.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Assigned Numbers - 2025 - Bluetooth® Document

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 Hours to 30 Minutes