SlideShare a Scribd company logo
From Message to
Cluster
A Realworld Introduction to Kafka Capacity Planning.
Jason “Jase” Bell - @jasonbelldata
https://digitalis.io
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
MeetupCat is my spirit animal.
Flight Mode is ON! You may……
• Heckle.
• Ask Questions.
• Heckle More.
• Talk about steak.
• Heckle again.
What I’m Going To
Cover
What I’m Going To
Cover
• The Old Days.
• The Now Times.
• The Stuff We Don’t Talk About
• The Message
• What I Usually Ask For
• Retention
• Estimated Capacity
• Compression
• Stress Testing
• Network and Disk Throughput
• Topic Partitions
• Kafka Connect
• KSQL
• Replicator
• Parting Thoughts…..
• ———————————————————
• Rapturous Applause
• Encore (Probably Eye of the Tiger……)
The Old Days
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Now Times
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Stuff We Don’t
Talk About
We think we know what
we need from our Kafka
Cluster
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
The Message
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
{
"text": "RT @PostGradProblem: In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball
team during ...",
"truncated": true,
"in_reply_to_user_id": null,
"in_reply_to_status_id": null,
"favorited": false,
"source": "<a href="http://twitter.com/" rel="nofollow">Twitter for iPhone</a>",
"in_reply_to_screen_name": null,
"in_reply_to_status_id_str": null,
"id_str": "54691802283900928",
"entities": {
"user_mentions": [
{
"indices": [
3,
19
],
"screen_name": "PostGradProblem",
"id_str": "271572434",
"name": "PostGradProblems",
"id": 271572434
}
],
"urls": [ ],
"hashtags": [ ]
},
"contributors": null,
"retweeted": false,
"in_reply_to_user_id_str": null,
"place": null,
"retweet_count": 4,
"created_at": "Sun Apr 03 23:48:36 +0000 2011",
"retweeted_status": {
"text": "In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during
company time. #PGP",
"truncated": false,
"in_reply_to_user_id": null,
"in_reply_to_status_id": null,
"favorited": false,
"source": "<a href="http://www.hootsuite.com" rel="nofollow">HootSuite</a>",
"in_reply_to_screen_name": null,
"in_reply_to_status_id_str": null,
"id_str": "54640519019642881",
"entities": {
"user_mentions": [ ],
"urls": [ ],
"hashtags": [
Twitter JSON Payload ~6kb
What I Usually Ask
For
•Average Message Size
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
•Desired Partitions
What I’ll Ask Team For…
•Average Message Size
•Estimated Daily Quantity
•Any Peak Per Hour Quantity
•Desired Replication Factor
•Desired Partitions
•Minimum In-sync Replicas
What I’ll Ask Team For…
•Average Message Size - (6 KB)
•Estimated Daily Quantity - (10,000,000/d)
•Any Peak Per Hour Quantity - (1,250,000)
•Desired Replication Factor - (4)
•Desired Partitions - (10)
•Minimum In-sync Replicas - (2)
What I’ll Ask Team For…
Estimated Capacity
Estimated Capacity
(Message size x 3) x Daily Qty
x 1.4 (add 40%)
= Volume per replicated broker.
Estimated Capacity
(6KB x 3) x 10,0000,000 = 184,320,000 KB
x 1.4 (add 40%)
= 258,048,000 KB
= 248.09 GB
Roughly translates to 2.940 MB/sec
Estimated Capacity
The x3 gives me a payload size with key,
header, timestamp and the value. It’s just a
rough calculation.
Estimated Capacity
The x3 gives me a payload size with key,
header, timestamp and the value. It’s just a
rough calculation.
Adding 40% overhead will give you some
breathing space when someone does a
stress test and doesn’t tell you…..
Retention
(6KB x 3) x 10,0000,000 = 184,320,000 KB
x 1.4 (add 40%)
= 258,048,000 KB
= 248.09 GB
248.09 GB/day x 14 days retention
= 3.4 TB per broker.
Estimated Capacity
df -hIs your friend…..
Estimated Capacity
du -H .Is also your friend…..
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Compression
Producer configuration compression.type defaults to “none”.
Options are gzip, snappy, lz4 and zstd.
Expect ~20%-40% message compression depending on the algorithm used.
Stress Testing
kafka-producer-perf-test --topic TOPIC --record-size SIZE_IN_BYTES
$ bin/kafka-producer-perf-test --topic testtopic --record-size 1000 --num-
records 10000  --throughput 1000 --producer-props
bootstrap.servers=localhost:9092
5003 records sent, 1000.4 records/sec (0.95 MB/sec), 1.6 ms avg latency,
182.0 ms max latency.
10000 records sent, 998.801438 records/sec (0.95 MB/sec), 1.12 ms avg
latency, 182.00 ms max latency, 1 ms 50th, 2 ms 95th, 19 ms 99th, 23 ms
99.9th.
kafka-consumer-perf-test --broker-list host1:port1,host2:port2 --topic
TOPIC
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Network and Disk
Throughput
• D - Data to be written (MB/sec)
• R - Replication Factor
• C - Number of Consumer Groups (readers for each write)
The Volume of Writes: (D * R)
The Volume of Reads within Replication: ((R-1) * D)
Reads happen internally by the replicas, this gives us:
The Volume of Reads within Replication: ((R - 1) * D)
Reads happen internally by the replicas, this gives us:
Adding the consumers we end up with:
The Volume of Reads within Replication: (((R + C) - 1) * D)
We have memory! We have Caching!
M/(D * R) = seconds of writes cached.
We have memory! We have Caching!
M/(D * R) = seconds of writes cached.
We have to assume that consumers might drop from the cache, consumers are running
slower than expected or even that replicas might restart due to failure, patching or
rolling restarts.
Lagging Readers L = R + C - 1
Disk Throughput: D * R + L * D
Network (reads) Throughput: ((R + C -1) * D)
Network (writes) Throughput: D * R
Topic Partitions
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
You can set partitions either creating
the topic (—partitions n) or afterwards.
Having a large number of partitions will have effects on Zookeeper znodes.
• More network requests
• If leader or broker goes down it may affect startup
time as the broker returns to the cluster.
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
If you need to reduce partitions create a new topic and reduce the partition count.
Kafka Connect
The latency trap…..
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Think about second and third order
consequences if a connector would fail.
What is the impact?
The latency trap…..
KSQL
ksqlDB
•4 CPU Cores
•32GB RAM
•100GB SSD Disk
•1Gbit Network
Baseline Server Requirements
ksqlDB
•Partition Count of 4
•Replication Factor of 1
Default Outbound Topic Assumptions
(These settings can be modified within your CREATE query)
ksqlDB
Some queries will require repartitioning
and intermediate topics for certain
operations, taking all available records.
Default Outbound Topic Assumptions
ksqlDB
Processing Small Message/Many Columns
= CPU Saturation
Default Outbound Topic Assumptions
ksqlDB
Processing Large Message/Small Columns
= Network Saturation
Default Outbound Topic Assumptions
Replicator
Data Centre to Data Centre is going to lead to increased network latency.
On producers and consumers, use send.buffer.bytes and receive.buffer.bytes.
On brokers, use socket.send.buffer.bytes and socket.receive.buffer.bytes. 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Parting Thoughts
Consumer Group Lag Reports are your guiding light.
(If you have Rundeck setup a scheduled job to email
you the log output)
1
kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe
—group CONSUMER_GROUP --new-consumer
Kafka is about trade offs, from the producer right the
way through to the consumer (and beyond).
There’s no right or wrong answer, just
experimentation, monitoring and learning.
2
While securing Kafka is important there is also a
cost as certificates are verified and take up CPU
resources.
Your throughput will be affected.
3
The Kafka Ecosystem has increased in features over
the last few years. This has lead to increased topic
and disk space usages that need to be factored in to
capacity planning calculations.
4
"Can you create me a topic please?”
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Thank you.
Many thanks to Shay and David for organising, everyone who attended and sent
kind wishes. Lastly, a huge thank you to MeetupCat.
Photo supplied by @jbfletch_

More Related Content

PPTX
Kafka Tutorial: Advanced Producers
PPTX
Kafka 101
PPTX
Introduction to Apache Kafka
PDF
Fundamentals of Apache Kafka
PDF
Kafka Streams State Stores Being Persistent
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Capture the Streams of Database Changes
Kafka Tutorial: Advanced Producers
Kafka 101
Introduction to Apache Kafka
Fundamentals of Apache Kafka
Kafka Streams State Stores Being Persistent
HBase and HDFS: Understanding FileSystem Usage in HBase
Kafka Tutorial - introduction to the Kafka streaming platform
Capture the Streams of Database Changes

What's hot (20)

PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
PPTX
MySQL Slow Query log Monitoring using Beats & ELK
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Apache Kafka Best Practices
PDF
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
PDF
Kafka streams windowing behind the curtain
PPTX
Kafka Tutorial: Kafka Security
PDF
Making Apache Spark Better with Delta Lake
PPTX
Kafka 101
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
Introduction to Kafka
PPTX
Introduction to Apache Kafka
PPTX
Autoscaling Flink with Reactive Mode
PDF
Apache Kafka - Martin Podval
PPTX
Apache kafka
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
Facebook Messages & HBase
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
Cassandra Introduction & Features
Deploying Kafka Streams Applications with Docker and Kubernetes
MySQL Slow Query log Monitoring using Beats & ELK
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache Kafka Best Practices
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kafka streams windowing behind the curtain
Kafka Tutorial: Kafka Security
Making Apache Spark Better with Delta Lake
Kafka 101
Apache Kafka Fundamentals for Architects, Admins and Developers
Introduction to Kafka
Introduction to Apache Kafka
Autoscaling Flink with Reactive Mode
Apache Kafka - Martin Podval
Apache kafka
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache Kafka Architecture & Fundamentals Explained
Facebook Messages & HBase
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Cassandra Introduction & Features
Ad

Similar to From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning (20)

PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
PDF
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
PDF
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
PPTX
M6d cassandrapresentation
PPTX
Apache Kafka
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
PDF
Micro-batching: High-performance writes
PPT
High Frequency Trading and NoSQL database
PDF
Designs, Lessons and Advice from Building Large Distributed Systems
PDF
Optimizing MongoDB: Lessons Learned at Localytics
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PPTX
MongoDB for Time Series Data: Sharding
PDF
«Scrapy internals» Александр Сибиряков, Scrapinghub
PDF
Memory: The New Disk
PPTX
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
PDF
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
PDF
Top 5 mistakes when writing Spark applications
PPTX
Kafka overview v0.1
PDF
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
M6d cassandrapresentation
Apache Kafka
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance writes
High Frequency Trading and NoSQL database
Designs, Lessons and Advice from Building Large Distributed Systems
Optimizing MongoDB: Lessons Learned at Localytics
Cassandra @ Sony: The good, the bad, and the ugly part 2
MongoDB for Time Series Data: Sharding
«Scrapy internals» Александр Сибиряков, Scrapinghub
Memory: The New Disk
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
Top 5 mistakes when writing Spark applications
Kafka overview v0.1
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
PDF
Migration, backup and restore made easy using Kannika
PDF
Five Things You Need to Know About Data Streaming in 2025
PDF
Data in Motion Tour Seoul 2024 - Keynote
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
PDF
Unlocking value with event-driven architecture by Confluent
PDF
Il Data Streaming per un’AI real-time di nuova generazione
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
PDF
Building API data products on top of your real-time data infrastructure
PDF
Speed Wins: From Kafka to APIs in Minutes
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Webinar Think Right - Shift Left - 19-03-2025.pptx
Migration, backup and restore made easy using Kannika
Five Things You Need to Know About Data Streaming in 2025
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Roadmap Demo
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Build a Real-Time Decision Support Application for Financial Market Traders w...
Strumenti e Strategie di Stream Governance con Confluent Platform
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Building Real-Time Gen AI Applications with SingleStore and Confluent
Unlocking value with event-driven architecture by Confluent
Il Data Streaming per un’AI real-time di nuova generazione
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Building API data products on top of your real-time data infrastructure
Speed Wins: From Kafka to APIs in Minutes
Evolving Data Governance for the Real-time Streaming and AI Era

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Tartificialntelligence_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
August Patch Tuesday
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Getting Started with Data Integration: FME Form 101
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Heart disease approach using modified random forest and particle swarm optimi...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Tartificialntelligence_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MIND Revenue Release Quarter 2 2025 Press Release
August Patch Tuesday
Spectral efficient network and resource selection model in 5G networks
A comparative analysis of optical character recognition models for extracting...
Getting Started with Data Integration: FME Form 101
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
TLE Review Electricity (Electricity).pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning

  • 1. From Message to Cluster A Realworld Introduction to Kafka Capacity Planning. Jason “Jase” Bell - @jasonbelldata
  • 4. MeetupCat is my spirit animal.
  • 5. Flight Mode is ON! You may…… • Heckle. • Ask Questions. • Heckle More. • Talk about steak. • Heckle again.
  • 6. What I’m Going To Cover
  • 7. What I’m Going To Cover
  • 8. • The Old Days. • The Now Times. • The Stuff We Don’t Talk About • The Message • What I Usually Ask For • Retention • Estimated Capacity • Compression • Stress Testing • Network and Disk Throughput • Topic Partitions • Kafka Connect • KSQL • Replicator • Parting Thoughts….. • ——————————————————— • Rapturous Applause • Encore (Probably Eye of the Tiger……)
  • 13. The Stuff We Don’t Talk About
  • 14. We think we know what we need from our Kafka Cluster
  • 24. { "text": "RT @PostGradProblem: In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during ...", "truncated": true, "in_reply_to_user_id": null, "in_reply_to_status_id": null, "favorited": false, "source": "<a href="http://twitter.com/" rel="nofollow">Twitter for iPhone</a>", "in_reply_to_screen_name": null, "in_reply_to_status_id_str": null, "id_str": "54691802283900928", "entities": { "user_mentions": [ { "indices": [ 3, 19 ], "screen_name": "PostGradProblem", "id_str": "271572434", "name": "PostGradProblems", "id": 271572434 } ], "urls": [ ], "hashtags": [ ] }, "contributors": null, "retweeted": false, "in_reply_to_user_id_str": null, "place": null, "retweet_count": 4, "created_at": "Sun Apr 03 23:48:36 +0000 2011", "retweeted_status": { "text": "In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during company time. #PGP", "truncated": false, "in_reply_to_user_id": null, "in_reply_to_status_id": null, "favorited": false, "source": "<a href="http://www.hootsuite.com" rel="nofollow">HootSuite</a>", "in_reply_to_screen_name": null, "in_reply_to_status_id_str": null, "id_str": "54640519019642881", "entities": { "user_mentions": [ ], "urls": [ ], "hashtags": [ Twitter JSON Payload ~6kb
  • 25. What I Usually Ask For
  • 26. •Average Message Size What I’ll Ask Team For…
  • 27. •Average Message Size •Estimated Daily Quantity What I’ll Ask Team For…
  • 28. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity What I’ll Ask Team For…
  • 29. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor What I’ll Ask Team For…
  • 30. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor •Desired Partitions What I’ll Ask Team For…
  • 31. •Average Message Size •Estimated Daily Quantity •Any Peak Per Hour Quantity •Desired Replication Factor •Desired Partitions •Minimum In-sync Replicas What I’ll Ask Team For…
  • 32. •Average Message Size - (6 KB) •Estimated Daily Quantity - (10,000,000/d) •Any Peak Per Hour Quantity - (1,250,000) •Desired Replication Factor - (4) •Desired Partitions - (10) •Minimum In-sync Replicas - (2) What I’ll Ask Team For…
  • 34. Estimated Capacity (Message size x 3) x Daily Qty x 1.4 (add 40%) = Volume per replicated broker.
  • 35. Estimated Capacity (6KB x 3) x 10,0000,000 = 184,320,000 KB x 1.4 (add 40%) = 258,048,000 KB = 248.09 GB Roughly translates to 2.940 MB/sec
  • 36. Estimated Capacity The x3 gives me a payload size with key, header, timestamp and the value. It’s just a rough calculation.
  • 37. Estimated Capacity The x3 gives me a payload size with key, header, timestamp and the value. It’s just a rough calculation. Adding 40% overhead will give you some breathing space when someone does a stress test and doesn’t tell you…..
  • 38. Retention (6KB x 3) x 10,0000,000 = 184,320,000 KB x 1.4 (add 40%) = 258,048,000 KB = 248.09 GB 248.09 GB/day x 14 days retention = 3.4 TB per broker.
  • 39. Estimated Capacity df -hIs your friend…..
  • 40. Estimated Capacity du -H .Is also your friend…..
  • 43. Producer configuration compression.type defaults to “none”. Options are gzip, snappy, lz4 and zstd. Expect ~20%-40% message compression depending on the algorithm used.
  • 45. kafka-producer-perf-test --topic TOPIC --record-size SIZE_IN_BYTES
  • 46. $ bin/kafka-producer-perf-test --topic testtopic --record-size 1000 --num- records 10000  --throughput 1000 --producer-props bootstrap.servers=localhost:9092 5003 records sent, 1000.4 records/sec (0.95 MB/sec), 1.6 ms avg latency, 182.0 ms max latency. 10000 records sent, 998.801438 records/sec (0.95 MB/sec), 1.12 ms avg latency, 182.00 ms max latency, 1 ms 50th, 2 ms 95th, 19 ms 99th, 23 ms 99.9th.
  • 50. • D - Data to be written (MB/sec) • R - Replication Factor • C - Number of Consumer Groups (readers for each write)
  • 51. The Volume of Writes: (D * R)
  • 52. The Volume of Reads within Replication: ((R-1) * D) Reads happen internally by the replicas, this gives us:
  • 53. The Volume of Reads within Replication: ((R - 1) * D) Reads happen internally by the replicas, this gives us: Adding the consumers we end up with: The Volume of Reads within Replication: (((R + C) - 1) * D)
  • 54. We have memory! We have Caching! M/(D * R) = seconds of writes cached.
  • 55. We have memory! We have Caching! M/(D * R) = seconds of writes cached. We have to assume that consumers might drop from the cache, consumers are running slower than expected or even that replicas might restart due to failure, patching or rolling restarts. Lagging Readers L = R + C - 1
  • 56. Disk Throughput: D * R + L * D Network (reads) Throughput: ((R + C -1) * D) Network (writes) Throughput: D * R
  • 62. You can set partitions either creating the topic (—partitions n) or afterwards.
  • 63. Having a large number of partitions will have effects on Zookeeper znodes. • More network requests • If leader or broker goes down it may affect startup time as the broker returns to the cluster.
  • 65. If you need to reduce partitions create a new topic and reduce the partition count.
  • 69. Think about second and third order consequences if a connector would fail. What is the impact?
  • 71. KSQL
  • 72. ksqlDB •4 CPU Cores •32GB RAM •100GB SSD Disk •1Gbit Network Baseline Server Requirements
  • 73. ksqlDB •Partition Count of 4 •Replication Factor of 1 Default Outbound Topic Assumptions (These settings can be modified within your CREATE query)
  • 74. ksqlDB Some queries will require repartitioning and intermediate topics for certain operations, taking all available records. Default Outbound Topic Assumptions
  • 75. ksqlDB Processing Small Message/Many Columns = CPU Saturation Default Outbound Topic Assumptions
  • 76. ksqlDB Processing Large Message/Small Columns = Network Saturation Default Outbound Topic Assumptions
  • 78. Data Centre to Data Centre is going to lead to increased network latency.
  • 79. On producers and consumers, use send.buffer.bytes and receive.buffer.bytes. On brokers, use socket.send.buffer.bytes and socket.receive.buffer.bytes. 
  • 82. Consumer Group Lag Reports are your guiding light. (If you have Rundeck setup a scheduled job to email you the log output) 1
  • 83. kafka-consumer-groups --bootstrap-server BROKER_ADDRESS --describe —group CONSUMER_GROUP --new-consumer
  • 84. Kafka is about trade offs, from the producer right the way through to the consumer (and beyond). There’s no right or wrong answer, just experimentation, monitoring and learning. 2
  • 85. While securing Kafka is important there is also a cost as certificates are verified and take up CPU resources. Your throughput will be affected. 3
  • 86. The Kafka Ecosystem has increased in features over the last few years. This has lead to increased topic and disk space usages that need to be factored in to capacity planning calculations. 4
  • 87. "Can you create me a topic please?”
  • 89. Thank you. Many thanks to Shay and David for organising, everyone who attended and sent kind wishes. Lastly, a huge thank you to MeetupCat. Photo supplied by @jbfletch_