SlideShare a Scribd company logo
© 2024 NetApp, Inc. All rights reserved.
© 2024 NetApp, Inc. All rights reserved.
Kafka Summit, Bangalore 2024
Superpower your Apache Kafka®
applications development
with complementary
open source technologies
Paul Brebner
Instaclustr Technology Evangelist
© 2024 NetApp, Inc. All rights reserved.
Focus on complementary technologies –
different to Kafka
“Colours seem more brilliant when they are in contrast
with their complementary colours.” Monet
© 2024 NetApp, Inc. All rights reserved.
Complementary Colours
Matisse, Goldfish -
Red/Green
complementary colors
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
Contrasting flowers from the Bengaluru market
Bengaluru market flowers (Paul Brebner)
© 2024 NetApp, Inc. All rights reserved.
Complementary Kafka Technologies
Cassandra PostgreSQL
Superset
Camel
Cadence
OpenTelemetry
TensorFlow
RisingWave
LLMs
Guava EventBus
Kubernetes
Prometheus
Grafana
Parallel Consumer
OpenSearch + Dashboard
Matisse, Goldfish - Red/Green
complementary colors
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
C.f. analogous Kafka technologies
• Apache Pulsar, Flink, Storm, Spark Streaming, Beam,
ActiveMQ, RocketMQ, StreamPark, RisingWave etc.
Van Gogh, Sunflowers on
Yellow Background,
(Source: Wikimedia)
But we will look at
RisingWave
© 2024 NetApp, Inc. All rights reserved.
Approach
Use Cases
Technologies
Superpowers
© 2024 NetApp, Inc. All rights reserved.
0. Apache Kafka®
© 2024 NetApp, Inc. All rights reserved.
Apache Kafka®
Postal Delivery Service
Railway Post Office:
Mail bags snatched by speeding train
(Source: Wikimedia CCL)
© 2024 NetApp, Inc. All rights reserved.
Apache Kafka visual introduction
My first Kafka talk: Visual introduction to a Kafka postal service
© 2024 NetApp, Inc. All rights reserved.
Christmas tree lights simulation
Christmas 2017
My first Kafka demo application
100% Kafka
A simple simulation –
to start with
© 2024 NetApp, Inc. All rights reserved.
Use case 1: “Kongo” IoT logistics simulation
• Real-time logistics
• IoT transportation and rules checking
• Complex simulation
© 2024 NetApp, Inc. All rights reserved.
Design 1: Pure Kafka, many topics
1000s of locations (warehouses, trucks)
and millions of goods
Each location has a topic
and multiple consumer groups
(all goods at that location)
7,000 TPS → SLOW!
Many topics/partitions (without increasing
cluster resources) reduced throughput on
older versions of Kafka
© 2024 NetApp, Inc. All rights reserved.
1. Guava EventBus
© 2024 NetApp, Inc. All rights reserved.
Guava EventBus
Telegram messengers
(Source: Wikimedia CCL)
© 2024 NetApp, Inc. All rights reserved.
Design 2: One topic + Guava EventBus for notifications
Single topic, one consumer group
Kafka supplemented with Guava
Event Bus to handle high fan-out
notifications
1.2M TPS → FAST!
Uber’s Cadence can be/has been
used for scalable notifications
© 2024 NetApp, Inc. All rights reserved.
Use case 2: Anomaly detection at scale
One of these things is not like the others…
(Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved.
Streaming anomaly detection
Incoming Event Stream
Run Anomaly Check – Quickly!
Persist new event
Get previous 50 events for key
Run algorithm
Fast writes → Cassandra
Application scaling → Kubernetes
Initially single threaded consumers
© 2024 NetApp, Inc. All rights reserved.
2. Apache Cassandra®
© 2024 NetApp, Inc. All rights reserved.
Apache Cassandra®
Fast Writes
Office typing pool, 1918
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
Apache Cassandra®
What?
• NoSQL horizontally scalable key-value database
Superpowers
• Fast writes (lots of typewriters)
• Wide column store
• Good for ML feature stores
• Clustering columns
• Good for hierarchical data modeling (eg. Geospatial)
• In-built multi-DC replication
© 2024 NetApp, Inc. All rights reserved.
3. Kubernetes
© 2024 NetApp, Inc. All rights reserved.
Kubernetes
Greek Triremes ruled the seas
Captained by Helmsmen (Kubernetes)
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
Kubernetes
What?
• Automation of containerized applications
Superpowers
• Available on public clouds (E.g. AWS EKS)
• Ephemeral Pods are the unit of concurrency
• Easy to scale applications with more or less Pods
© 2024 NetApp, Inc. All rights reserved.
But scalability isn’t great
© 2024 NetApp, Inc. All rights reserved.
4. Prometheus
5. Grafana
© 2024 NetApp, Inc. All rights reserved.
Kubernetes
Abacus counting
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
Prometheus + Grafana
What?
• Prometheus: Monitoring and alerting
• Grafana: Graphing
Superpowers
• Instrumentation or agents (exporters) to expose application metrics
• Time series data with counter, gauge, histogram, and summary metrics
• Instaclustr monitoring API supports Prometheus metrics for Apache Kafka clusters
• Integration of Kafka Cluster metrics and Kafka application (e.g. producers and
consumers) is powerful
à Metrics suggested optimizations
© 2024 NetApp, Inc. All rights reserved.
Slow Kafka consumers problem
Slow consumers require more partitions/consumers
(Source: Getty Images)
Little’s Law: Concurrency (Partitions=Consumers) = Time x Throughput
© 2024 NetApp, Inc. All rights reserved.
2 pool solution
The famous Bondi Ocean Pool in Sydney Australia has 2 pools
(Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved.
Optimize consumer speed/concurrency using 2 stage pipeline
Less consumers
(around 100) gives
higher throughput—
a surprise!
Hint: Less partitions
1. Minimize polling time
(thread pool 1)
2. Maximize anomaly
detector concurrency
(thread pool 2)
1
2
© 2024 NetApp, Inc. All rights reserved.
19 billion checks/day after tuning
© 2024 NetApp, Inc. All rights reserved.
6. Kafka Parallel
Consumer
© 2024 NetApp, Inc. All rights reserved.
Kafka Parallel Consumer
Jacquard Loom, Berlin
Makes multiple ribbons
concurrently
(Source: Paul Brebner)
© 2024 NetApp, Inc. All rights reserved.
Kafka Parallel Consumer: Multi-threaded consumer
• Multiple ordering options—c.f. default Kafka only guarantees order within partitions!
PARTITION → KEY → UNORDERED
Increasing concurrency →
• Concurrency from 1 to lots—depends on client resources, and partitions/key
space sizes
• KEY has higher concurrency than partition and is ordered by KEY—
reasonable compromise
• Higher concurrency for less partitions/consumers
© 2024 NetApp, Inc. All rights reserved.
Experimental results
3, 50, and 200 times improvement, unordered best
1 consumer
10 partitions
100 keys
10ms latency
© 2024 NetApp, Inc. All rights reserved.
Use case 3: Pipelines
Berlin “Beer” (?) Pipeline
(Source: Paul Brebner)
© 2024 NetApp, Inc. All rights reserved.
Kafka® Connect data pipelines
REST Tidal Data to OpenSearch REST Tidal Data to PostgreSQL + Superset
Alternative sinks
Kafka Connectors
© 2024 NetApp, Inc. All rights reserved.
7. OpenSearch
8. Dashboard
© 2024 NetApp, Inc. All rights reserved.
OpenSearch + Dashboard
Library of Congress
Card Division 1919
(city block long)
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
OpenSearch + Dashboard
What?
• Open source version of Elasticsearch
• Based on Lucene—powerful and scalable text searching
Superpowers
• Ingestion, indexing, and searching of JSON documents
• Complex linguistic and geospatial queries
• Integrated dashboard for visualization
© 2024 NetApp, Inc. All rights reserved.
9. PostgreSQL®
®
© 2024 NetApp, Inc. All rights reserved.
PostgreSQL®
Elephant vs. tree
Elephants are powerful
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
PostgreSQL®
What?
• Powerful SQL database
Superpowers
• Extensible
• JSONB+GIN indexes (efficient storage and search of JSON)
®
© 2024 NetApp, Inc. All rights reserved.
10. Apache Superset™
© 2024 NetApp, Inc. All rights reserved.
Apache Superset™
Superhero Supersets
All superheroes (B) are a
superset of those who
use weapons (A)
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
Apache Superset™
What?
• Powerful data visualization tool
Superpowers
• Reads from SQL sources
• Lots of visualization and graph types, including geospatial
© 2024 NetApp, Inc. All rights reserved.
11. Apache Camel™
© 2024 NetApp, Inc. All rights reserved.
Apache Camel™
Camel train
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
Apache Camel™
What?
• Apache Camel – integration framework
• Apache Camel Kafka Connectors
Superpowers
• Large number of open source Kafka Connectors—179 sources and sinks
• Auto-generated from Camel components
© 2024 NetApp, Inc. All rights reserved.
Use case 4: Drone delivery
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
12. Uber’s Cadence®
© 2024 NetApp, Inc. All rights reserved.
Cadence®
Railway signal“man”
(signalwoman!)
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence®
What?
• Scalable code-as-workflows engine
Superpowers
• Sequenced, stateful, long-running, scheduled steps
• Scalable and reliable using event-sourcing
o Workflows are failproof, history is replayed until the point of failure and resumed
© 2024 NetApp, Inc. All rights reserved.
Drone delivery application
Computationally
expensive mission
critical
calculations
Kafka microservices integration
of fast/slow systems
© 2024 NetApp, Inc. All rights reserved.
Drone way point flight calculations
Returning to base leg
• Drone flight path is computed in an activity
• Using location, distance, bearing, speed,
and charge
• Every 10 seconds
• On failure, the drone won’t crash and will
continue flying from the last location
© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence + Apache Kafka = similarities
Cadence (Workflows) Kafka (Streaming Events)
Scalable (event sourcing) Scalable (partitions, cluster)
Persistent (event sourcing) Persistent (event replaying)
Reliable workflow execution (event sourcing) Reliable event delivery
Asynchronous signals Asynchronous events
Open source Open source
Available as a managed service Available as a managed service
© 2024 NetApp, Inc. All rights reserved.
Uber’s Cadence =
Orchestration (synchronous/timed sequences)
(Source: Getty Images)
Different architectural
(musical) styles
© 2024 NetApp, Inc. All rights reserved.
Apache Kafka =
Choreography (asynchronous)
Different architectural
(musical) styles
(Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved.
Combined Cadence + Kafka = Ballet!
Integrated in a
new style
© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka = Complementary timescales
(Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka = Complementary timescales
Cadence (Slow Workflows) Kafka (Fast Streaming Events)
Synchronous events Asynchronous events
Stateful flows Stateless events
Sequences One-off events
Slow/long running flows Fast/instantaneous events
Sleep/schedule events Real-time processing of events
Complex flow logic Complex stream processing (Kafka Streams)
© 2024 NetApp, Inc. All rights reserved.
Cadence + Kafka =
Integration → Drone Ballet
Drone show, Japan
(Source: Getty Images)
© 2024 NetApp, Inc. All rights reserved.
How many drones can we fly?
(Source: Shutterstock)
© 2024 NetApp, Inc. All rights reserved.
Cluster Details (VCPUS):
Client (8), Cadence (6), Cassandra (18)
© 2024 NetApp, Inc. All rights reserved.
Load test:
2,000 drones + 2,000 orders = 4,000 workflows
© 2024 NetApp, Inc. All rights reserved.
20 Drones flying
Purple = base
Black = drone
Orange = shop
Red = delivery location
Green = successful delivery
© 2024 NetApp, Inc. All rights reserved.
Use case 5: Streaming ML
(Source: Getty Images) (Source: Getty Images)
Busy! Not Busy!
Shop busy/not busy prediction
© 2024 NetApp, Inc. All rights reserved.
Drone learning problem
Kafka Streams
Kafka Streams computes
aggregated hourly shop and order
details →
Busy/NotBusy categorization
Sent to TensorFlow
Train model to predict shop
busy/not busy an hour ahead
Simulation produces streaming
spatiotemporal data (drone and
order state and locations)
© 2024 NetApp, Inc. All rights reserved.
13. TensorFlow
© 2024 NetApp, Inc. All rights reserved.
TensorFlow
What does the
future hold?
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
TensorFlow
What?
• Neural network ML library
Superpowers
• Supports incremental ML
• From streaming Kafka data
© 2024 NetApp, Inc. All rights reserved.
TensorFlow
Watch out for
• ML over streaming spatiotemporal data with concept drifts is tricky
o Time/space bias
- Wild model accuracy oscillation
o Concept shift can result in very low-accuracy models initially
- Train/use multiple models
© 2024 NetApp, Inc. All rights reserved.
Use case 6:
Santa’s elves' toy and box packing
KafkaStreams, ChatGPT, RisingWave, and OpenTelemetry
Streaming joins to match toys and boxes
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
14. OpenTelemetry
© 2024 NetApp, Inc. All rights reserved.
OpenTelemetry
X-ray vision!
(Source: Wikimedia Public Domain)
© 2024 NetApp, Inc. All rights reserved.
OpenTelemetry
• OpenTelemetry is the new standard for distributed tracing
• Combines tracing (OpenTracing), metrics, and logs
• Automatic instrumentation
• Lots of open source visualization tools
- Jager, SigNoz, Uptrace, etc.
• Used in new client monitoring KIP-714
- Kafka 3.7.0
© 2024 NetApp, Inc. All rights reserved.
SigNoz service map for
toy+boxes application
© 2024 NetApp, Inc. All rights reserved.
15. RisingWave
© 2024 NetApp, Inc. All rights reserved.
RisingWave
Wave processing
(Source: Adobe Stock)
© 2024 NetApp, Inc. All rights reserved.
RisingWave
What?
• Stream processing database—also as a service
Superpowers
• Stateful stream processing
o SQL syntax
o Using cloud native storage
o Potential replacement for Kafka Streams
• PostgreSQL compatible
o Works with Apache Superset for visualization
© 2024 NetApp, Inc. All rights reserved.
16. LLMs
© 2024 NetApp, Inc. All rights reserved.
LLMs
The Answer?
(Source: Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
LLMs/GenAI
• E.g. ChatGPT
- not open source
+ there may be suitable open source alternatives
for code generation
• Worked well to generate
+ Kafka clients
+ Kafka Streams DSL
+ and test-cases
• Not as accurate for RisingWave
- lack of examples?
© 2024 NetApp, Inc. All rights reserved.
Bonus Technologies from my Instaclustr colleagues
● Kafka benchmarking
○ Apache JMeter for Kafka benchmarking (Thanks to Anup Shirolkar)
○ OpenMessaging (Thanks to Alastair Daivis)
● Strimzi – a Kafka Operator for Kubernetes, and Debezium (CDC using Kafka Connect)
(Thanks to Felix Alipaz-Dicke)
● Kafka GUIs (Thanks to Ana-Maria Minda)
○ Kafdrop
○ AKHQ
○ UI for Apache Kafka
○ These all work with Kafka + Instaclustr console and provide complementary features
© 2024 NetApp, Inc. All rights reserved.
Ballet pattern à Hanoi street intersection pattern
● A working integrated synchronous + asynchronous system
© 2024 NetApp, Inc. All rights reserved.
I survived as a pedestrian!
© 2024 NetApp, Inc. All rights reserved.
Try us out
• We offer Apache Kafka and
these open source technologies
as a managed service
• You can use the others with our
managed services
• FREE 30-day trial of developer-
sized clusters
© 2024 NetApp, Inc. All rights reserved.
Paul Brebner | Instaclustr Technology Evangelist
www.Instaclustr.com/paul-brebner à All my blogs
Thank You!

More Related Content

Similar to Superpower Your Apache Kafka Applications Development with Complementary Open Source Technologies (20)

PDF
BigDataFest Building Modern Data Streaming Apps
ssuser73434e
 
PDF
Cloud operations with streaming analytics using big data tools
Miguel Pérez Colino
 
PDF
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
PPTX
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 
PDF
Data Infrastructure for a World of Music
Lars Albertsson
 
PDF
Big Data Architecture Workshop - Vahid Amiri
datastack
 
PDF
Redpanda and ClickHouse
Altinity Ltd
 
PDF
British Gas Connected Homes: Data Engineering
DataStax Academy
 
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
PPTX
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
PPTX
Kafka for data scientists
Jenn Rawlins
 
PPTX
Software architecture for data applications
Ding Li
 
PPTX
Ai big dataconference_jeffrey ricker_kappa_architecture
Olga Zinkevych
 
PPTX
kafka simplicity and complexity
Paolo Platter
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PDF
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
PDF
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 
BigDataFest Building Modern Data Streaming Apps
ssuser73434e
 
Cloud operations with streaming analytics using big data tools
Miguel Pérez Colino
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 
Data Infrastructure for a World of Music
Lars Albertsson
 
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Redpanda and ClickHouse
Altinity Ltd
 
British Gas Connected Homes: Data Engineering
DataStax Academy
 
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Kafka for data scientists
Jenn Rawlins
 
Software architecture for data applications
Ding Li
 
Ai big dataconference_jeffrey ricker_kappa_architecture
Olga Zinkevych
 
kafka simplicity and complexity
Paolo Platter
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka - Free Friday
Otávio Carvalho
 
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
NetflixOSS Open House Lightning talks
Ruslan Meshenberg
 
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
tjademargis
 

More from Paul Brebner (20)

PPTX
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
PDF
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
PDF
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
PDF
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
PDF
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
PDF
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
PDF
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
A Visual Introduction to Apache Kafka
Paul Brebner
 
PDF
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
PDF
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
PDF
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
PDF
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
PDF
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 
Ad

Recently uploaded (20)

PPTX
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
PDF
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
PDF
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
 
DOCX
Best AI-Powered Wearable Tech for Remote Health Monitoring in 2025
SEOLIFT - SEO Company London
 
PPTX
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
PDF
AI Software Development Process, Strategies and Challenges
Net-Craft.com
 
PPTX
arctitecture application system design os dsa
za241967
 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 
PPTX
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
PDF
Which Hiring Management Tools Offer the Best ROI?
HireME
 
PDF
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
 
PDF
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
 
PDF
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
PDF
Best Software Development at Best Prices
softechies7
 
PPTX
Agentforce – TDX 2025 Hackathon Achievement
GetOnCRM Solutions
 
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
PDF
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
DOCX
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
 
Best AI-Powered Wearable Tech for Remote Health Monitoring in 2025
SEOLIFT - SEO Company London
 
Threat Modeling a Batch Job Framework - Teri Radichel - AWS re:Inforce 2025
2nd Sight Lab
 
AI Software Development Process, Strategies and Challenges
Net-Craft.com
 
arctitecture application system design os dsa
za241967
 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
 
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
Which Hiring Management Tools Offer the Best ROI?
HireME
 
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
 
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
 
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
Best Software Development at Best Prices
softechies7
 
Agentforce – TDX 2025 Hackathon Achievement
GetOnCRM Solutions
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
Ad

Superpower Your Apache Kafka Applications Development with Complementary Open Source Technologies

  • 1. © 2024 NetApp, Inc. All rights reserved. © 2024 NetApp, Inc. All rights reserved. Kafka Summit, Bangalore 2024 Superpower your Apache Kafka® applications development with complementary open source technologies Paul Brebner Instaclustr Technology Evangelist
  • 2. © 2024 NetApp, Inc. All rights reserved. Focus on complementary technologies – different to Kafka “Colours seem more brilliant when they are in contrast with their complementary colours.” Monet
  • 3. © 2024 NetApp, Inc. All rights reserved. Complementary Colours Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
  • 4. © 2024 NetApp, Inc. All rights reserved. Contrasting flowers from the Bengaluru market Bengaluru market flowers (Paul Brebner)
  • 5. © 2024 NetApp, Inc. All rights reserved. Complementary Kafka Technologies Cassandra PostgreSQL Superset Camel Cadence OpenTelemetry TensorFlow RisingWave LLMs Guava EventBus Kubernetes Prometheus Grafana Parallel Consumer OpenSearch + Dashboard Matisse, Goldfish - Red/Green complementary colors (Source: Wikimedia)
  • 6. © 2024 NetApp, Inc. All rights reserved. C.f. analogous Kafka technologies • Apache Pulsar, Flink, Storm, Spark Streaming, Beam, ActiveMQ, RocketMQ, StreamPark, RisingWave etc. Van Gogh, Sunflowers on Yellow Background, (Source: Wikimedia) But we will look at RisingWave
  • 7. © 2024 NetApp, Inc. All rights reserved. Approach Use Cases Technologies Superpowers
  • 8. © 2024 NetApp, Inc. All rights reserved. 0. Apache Kafka®
  • 9. © 2024 NetApp, Inc. All rights reserved. Apache Kafka® Postal Delivery Service Railway Post Office: Mail bags snatched by speeding train (Source: Wikimedia CCL)
  • 10. © 2024 NetApp, Inc. All rights reserved. Apache Kafka visual introduction My first Kafka talk: Visual introduction to a Kafka postal service
  • 11. © 2024 NetApp, Inc. All rights reserved. Christmas tree lights simulation Christmas 2017 My first Kafka demo application 100% Kafka A simple simulation – to start with
  • 12. © 2024 NetApp, Inc. All rights reserved. Use case 1: “Kongo” IoT logistics simulation • Real-time logistics • IoT transportation and rules checking • Complex simulation
  • 13. © 2024 NetApp, Inc. All rights reserved. Design 1: Pure Kafka, many topics 1000s of locations (warehouses, trucks) and millions of goods Each location has a topic and multiple consumer groups (all goods at that location) 7,000 TPS → SLOW! Many topics/partitions (without increasing cluster resources) reduced throughput on older versions of Kafka
  • 14. © 2024 NetApp, Inc. All rights reserved. 1. Guava EventBus
  • 15. © 2024 NetApp, Inc. All rights reserved. Guava EventBus Telegram messengers (Source: Wikimedia CCL)
  • 16. © 2024 NetApp, Inc. All rights reserved. Design 2: One topic + Guava EventBus for notifications Single topic, one consumer group Kafka supplemented with Guava Event Bus to handle high fan-out notifications 1.2M TPS → FAST! Uber’s Cadence can be/has been used for scalable notifications
  • 17. © 2024 NetApp, Inc. All rights reserved. Use case 2: Anomaly detection at scale One of these things is not like the others… (Source: Shutterstock)
  • 18. © 2024 NetApp, Inc. All rights reserved. Streaming anomaly detection Incoming Event Stream Run Anomaly Check – Quickly! Persist new event Get previous 50 events for key Run algorithm Fast writes → Cassandra Application scaling → Kubernetes Initially single threaded consumers
  • 19. © 2024 NetApp, Inc. All rights reserved. 2. Apache Cassandra®
  • 20. © 2024 NetApp, Inc. All rights reserved. Apache Cassandra® Fast Writes Office typing pool, 1918 (Source: Wikimedia)
  • 21. © 2024 NetApp, Inc. All rights reserved. Apache Cassandra® What? • NoSQL horizontally scalable key-value database Superpowers • Fast writes (lots of typewriters) • Wide column store • Good for ML feature stores • Clustering columns • Good for hierarchical data modeling (eg. Geospatial) • In-built multi-DC replication
  • 22. © 2024 NetApp, Inc. All rights reserved. 3. Kubernetes
  • 23. © 2024 NetApp, Inc. All rights reserved. Kubernetes Greek Triremes ruled the seas Captained by Helmsmen (Kubernetes) (Source: Wikimedia)
  • 24. © 2024 NetApp, Inc. All rights reserved. Kubernetes What? • Automation of containerized applications Superpowers • Available on public clouds (E.g. AWS EKS) • Ephemeral Pods are the unit of concurrency • Easy to scale applications with more or less Pods
  • 25. © 2024 NetApp, Inc. All rights reserved. But scalability isn’t great
  • 26. © 2024 NetApp, Inc. All rights reserved. 4. Prometheus 5. Grafana
  • 27. © 2024 NetApp, Inc. All rights reserved. Kubernetes Abacus counting (Source: Wikimedia)
  • 28. © 2024 NetApp, Inc. All rights reserved. Prometheus + Grafana What? • Prometheus: Monitoring and alerting • Grafana: Graphing Superpowers • Instrumentation or agents (exporters) to expose application metrics • Time series data with counter, gauge, histogram, and summary metrics • Instaclustr monitoring API supports Prometheus metrics for Apache Kafka clusters • Integration of Kafka Cluster metrics and Kafka application (e.g. producers and consumers) is powerful à Metrics suggested optimizations
  • 29. © 2024 NetApp, Inc. All rights reserved. Slow Kafka consumers problem Slow consumers require more partitions/consumers (Source: Getty Images) Little’s Law: Concurrency (Partitions=Consumers) = Time x Throughput
  • 30. © 2024 NetApp, Inc. All rights reserved. 2 pool solution The famous Bondi Ocean Pool in Sydney Australia has 2 pools (Source: Shutterstock)
  • 31. © 2024 NetApp, Inc. All rights reserved. Optimize consumer speed/concurrency using 2 stage pipeline Less consumers (around 100) gives higher throughput— a surprise! Hint: Less partitions 1. Minimize polling time (thread pool 1) 2. Maximize anomaly detector concurrency (thread pool 2) 1 2
  • 32. © 2024 NetApp, Inc. All rights reserved. 19 billion checks/day after tuning
  • 33. © 2024 NetApp, Inc. All rights reserved. 6. Kafka Parallel Consumer
  • 34. © 2024 NetApp, Inc. All rights reserved. Kafka Parallel Consumer Jacquard Loom, Berlin Makes multiple ribbons concurrently (Source: Paul Brebner)
  • 35. © 2024 NetApp, Inc. All rights reserved. Kafka Parallel Consumer: Multi-threaded consumer • Multiple ordering options—c.f. default Kafka only guarantees order within partitions! PARTITION → KEY → UNORDERED Increasing concurrency → • Concurrency from 1 to lots—depends on client resources, and partitions/key space sizes • KEY has higher concurrency than partition and is ordered by KEY— reasonable compromise • Higher concurrency for less partitions/consumers
  • 36. © 2024 NetApp, Inc. All rights reserved. Experimental results 3, 50, and 200 times improvement, unordered best 1 consumer 10 partitions 100 keys 10ms latency
  • 37. © 2024 NetApp, Inc. All rights reserved. Use case 3: Pipelines Berlin “Beer” (?) Pipeline (Source: Paul Brebner)
  • 38. © 2024 NetApp, Inc. All rights reserved. Kafka® Connect data pipelines REST Tidal Data to OpenSearch REST Tidal Data to PostgreSQL + Superset Alternative sinks Kafka Connectors
  • 39. © 2024 NetApp, Inc. All rights reserved. 7. OpenSearch 8. Dashboard
  • 40. © 2024 NetApp, Inc. All rights reserved. OpenSearch + Dashboard Library of Congress Card Division 1919 (city block long) (Source: Wikimedia)
  • 41. © 2024 NetApp, Inc. All rights reserved. OpenSearch + Dashboard What? • Open source version of Elasticsearch • Based on Lucene—powerful and scalable text searching Superpowers • Ingestion, indexing, and searching of JSON documents • Complex linguistic and geospatial queries • Integrated dashboard for visualization
  • 42. © 2024 NetApp, Inc. All rights reserved. 9. PostgreSQL® ®
  • 43. © 2024 NetApp, Inc. All rights reserved. PostgreSQL® Elephant vs. tree Elephants are powerful (Source: Adobe Stock)
  • 44. © 2024 NetApp, Inc. All rights reserved. PostgreSQL® What? • Powerful SQL database Superpowers • Extensible • JSONB+GIN indexes (efficient storage and search of JSON) ®
  • 45. © 2024 NetApp, Inc. All rights reserved. 10. Apache Superset™
  • 46. © 2024 NetApp, Inc. All rights reserved. Apache Superset™ Superhero Supersets All superheroes (B) are a superset of those who use weapons (A) (Source: Adobe Stock)
  • 47. © 2024 NetApp, Inc. All rights reserved. Apache Superset™ What? • Powerful data visualization tool Superpowers • Reads from SQL sources • Lots of visualization and graph types, including geospatial
  • 48. © 2024 NetApp, Inc. All rights reserved. 11. Apache Camel™
  • 49. © 2024 NetApp, Inc. All rights reserved. Apache Camel™ Camel train (Source: Adobe Stock)
  • 50. © 2024 NetApp, Inc. All rights reserved. Apache Camel™ What? • Apache Camel – integration framework • Apache Camel Kafka Connectors Superpowers • Large number of open source Kafka Connectors—179 sources and sinks • Auto-generated from Camel components
  • 51. © 2024 NetApp, Inc. All rights reserved. Use case 4: Drone delivery (Source: Adobe Stock)
  • 52. © 2024 NetApp, Inc. All rights reserved. 12. Uber’s Cadence®
  • 53. © 2024 NetApp, Inc. All rights reserved. Cadence® Railway signal“man” (signalwoman!) (Source: Wikimedia)
  • 54. © 2024 NetApp, Inc. All rights reserved. Uber’s Cadence® What? • Scalable code-as-workflows engine Superpowers • Sequenced, stateful, long-running, scheduled steps • Scalable and reliable using event-sourcing o Workflows are failproof, history is replayed until the point of failure and resumed
  • 55. © 2024 NetApp, Inc. All rights reserved. Drone delivery application Computationally expensive mission critical calculations Kafka microservices integration of fast/slow systems
  • 56. © 2024 NetApp, Inc. All rights reserved. Drone way point flight calculations Returning to base leg • Drone flight path is computed in an activity • Using location, distance, bearing, speed, and charge • Every 10 seconds • On failure, the drone won’t crash and will continue flying from the last location
  • 57. © 2024 NetApp, Inc. All rights reserved. Uber’s Cadence + Apache Kafka = similarities Cadence (Workflows) Kafka (Streaming Events) Scalable (event sourcing) Scalable (partitions, cluster) Persistent (event sourcing) Persistent (event replaying) Reliable workflow execution (event sourcing) Reliable event delivery Asynchronous signals Asynchronous events Open source Open source Available as a managed service Available as a managed service
  • 58. © 2024 NetApp, Inc. All rights reserved. Uber’s Cadence = Orchestration (synchronous/timed sequences) (Source: Getty Images) Different architectural (musical) styles
  • 59. © 2024 NetApp, Inc. All rights reserved. Apache Kafka = Choreography (asynchronous) Different architectural (musical) styles (Source: Getty Images)
  • 60. © 2024 NetApp, Inc. All rights reserved. Combined Cadence + Kafka = Ballet! Integrated in a new style
  • 61. © 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Complementary timescales (Source: Getty Images)
  • 62. © 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Complementary timescales Cadence (Slow Workflows) Kafka (Fast Streaming Events) Synchronous events Asynchronous events Stateful flows Stateless events Sequences One-off events Slow/long running flows Fast/instantaneous events Sleep/schedule events Real-time processing of events Complex flow logic Complex stream processing (Kafka Streams)
  • 63. © 2024 NetApp, Inc. All rights reserved. Cadence + Kafka = Integration → Drone Ballet Drone show, Japan (Source: Getty Images)
  • 64. © 2024 NetApp, Inc. All rights reserved. How many drones can we fly? (Source: Shutterstock)
  • 65. © 2024 NetApp, Inc. All rights reserved. Cluster Details (VCPUS): Client (8), Cadence (6), Cassandra (18)
  • 66. © 2024 NetApp, Inc. All rights reserved. Load test: 2,000 drones + 2,000 orders = 4,000 workflows
  • 67. © 2024 NetApp, Inc. All rights reserved. 20 Drones flying Purple = base Black = drone Orange = shop Red = delivery location Green = successful delivery
  • 68. © 2024 NetApp, Inc. All rights reserved. Use case 5: Streaming ML (Source: Getty Images) (Source: Getty Images) Busy! Not Busy! Shop busy/not busy prediction
  • 69. © 2024 NetApp, Inc. All rights reserved. Drone learning problem Kafka Streams Kafka Streams computes aggregated hourly shop and order details → Busy/NotBusy categorization Sent to TensorFlow Train model to predict shop busy/not busy an hour ahead Simulation produces streaming spatiotemporal data (drone and order state and locations)
  • 70. © 2024 NetApp, Inc. All rights reserved. 13. TensorFlow
  • 71. © 2024 NetApp, Inc. All rights reserved. TensorFlow What does the future hold? (Source: Adobe Stock)
  • 72. © 2024 NetApp, Inc. All rights reserved. TensorFlow What? • Neural network ML library Superpowers • Supports incremental ML • From streaming Kafka data
  • 73. © 2024 NetApp, Inc. All rights reserved. TensorFlow Watch out for • ML over streaming spatiotemporal data with concept drifts is tricky o Time/space bias - Wild model accuracy oscillation o Concept shift can result in very low-accuracy models initially - Train/use multiple models
  • 74. © 2024 NetApp, Inc. All rights reserved. Use case 6: Santa’s elves' toy and box packing KafkaStreams, ChatGPT, RisingWave, and OpenTelemetry Streaming joins to match toys and boxes (Source: Adobe Stock)
  • 75. © 2024 NetApp, Inc. All rights reserved. 14. OpenTelemetry
  • 76. © 2024 NetApp, Inc. All rights reserved. OpenTelemetry X-ray vision! (Source: Wikimedia Public Domain)
  • 77. © 2024 NetApp, Inc. All rights reserved. OpenTelemetry • OpenTelemetry is the new standard for distributed tracing • Combines tracing (OpenTracing), metrics, and logs • Automatic instrumentation • Lots of open source visualization tools - Jager, SigNoz, Uptrace, etc. • Used in new client monitoring KIP-714 - Kafka 3.7.0
  • 78. © 2024 NetApp, Inc. All rights reserved. SigNoz service map for toy+boxes application
  • 79. © 2024 NetApp, Inc. All rights reserved. 15. RisingWave
  • 80. © 2024 NetApp, Inc. All rights reserved. RisingWave Wave processing (Source: Adobe Stock)
  • 81. © 2024 NetApp, Inc. All rights reserved. RisingWave What? • Stream processing database—also as a service Superpowers • Stateful stream processing o SQL syntax o Using cloud native storage o Potential replacement for Kafka Streams • PostgreSQL compatible o Works with Apache Superset for visualization
  • 82. © 2024 NetApp, Inc. All rights reserved. 16. LLMs
  • 83. © 2024 NetApp, Inc. All rights reserved. LLMs The Answer? (Source: Wikimedia)
  • 84. © 2024 NetApp, Inc. All rights reserved. LLMs/GenAI • E.g. ChatGPT - not open source + there may be suitable open source alternatives for code generation • Worked well to generate + Kafka clients + Kafka Streams DSL + and test-cases • Not as accurate for RisingWave - lack of examples?
  • 85. © 2024 NetApp, Inc. All rights reserved. Bonus Technologies from my Instaclustr colleagues ● Kafka benchmarking ○ Apache JMeter for Kafka benchmarking (Thanks to Anup Shirolkar) ○ OpenMessaging (Thanks to Alastair Daivis) ● Strimzi – a Kafka Operator for Kubernetes, and Debezium (CDC using Kafka Connect) (Thanks to Felix Alipaz-Dicke) ● Kafka GUIs (Thanks to Ana-Maria Minda) ○ Kafdrop ○ AKHQ ○ UI for Apache Kafka ○ These all work with Kafka + Instaclustr console and provide complementary features
  • 86. © 2024 NetApp, Inc. All rights reserved. Ballet pattern à Hanoi street intersection pattern ● A working integrated synchronous + asynchronous system
  • 87. © 2024 NetApp, Inc. All rights reserved. I survived as a pedestrian!
  • 88. © 2024 NetApp, Inc. All rights reserved. Try us out • We offer Apache Kafka and these open source technologies as a managed service • You can use the others with our managed services • FREE 30-day trial of developer- sized clusters
  • 89. © 2024 NetApp, Inc. All rights reserved. Paul Brebner | Instaclustr Technology Evangelist www.Instaclustr.com/paul-brebner à All my blogs Thank You!