SlideShare a Scribd company logo
Paul Brebner
Technology Evangelist
www.instaclustr.com
sales@instaclustr.com
© Instaclustr Pty Limited, 2022 [https://www.instaclustr.com/
company/policies/terms-conditions/]. Except as permitted by the
copyright law applicable to you, you may not reproduce, distribute,
publish, display, communicate or transmit any of the content of this
document, in any form, but any means, without the prior written
permission of Instaclustr Pty Limited.
In this Visual Introduction to Kafka,
we’re going to build a
Postal
Service
We’ll learn about Kafka Producers,
Consumers, Topics, Partitions, Keys,
Records, Delivery Semantics
(Guaranteed delivery, and who gets what
messages), Consumer Groups, Kafka
Connect and Streams!
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka is a distributed streams processing system, it
allows distributed producers to send messages to
distributed consumers via a Kafka cluster.
What is
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka?
Kafka has lots of benefits:
It’s Fast: It has high throughput and low latency
It’s Scalable: It’s horizontally scalable, to scale just add nodes and partitions
It’s Reliable: It’s distributed and fault tolerant
It has Zero Data Loss: Messages are persisted to disk with an immutable log
It’s Open Source: An Apache project
And it’s available as an Instaclustr Managed Service: On multiple cloud platforms
Managed
Service
Fast
Scalable
Reliable
Durable
Open Source
©Instaclustr Pty Limited 2019, 2021, 2022
But the usual
Kafka diagram
(right) is a bit
monochrome
and boring.
©Instaclustr Pty Limited 2019, 2021, 2022
This visual introduction
will be more
colourful
and it’s going to be
an extended story…
©Instaclustr Pty Limited 2019, 2021, 2022
Let’s build a modern day fully electronic postal service
T
o send messages from A to B
Postal
Service
A B
©Instaclustr Pty Limited 2019, 2021, 2022
T
o B, the consumer,
the recipient of the
message.
A is a producer,
it sends a message…
First, we need an “A”.
©Instaclustr Pty Limited 2019, 2021, 2022
Due to the decline in
“snail mail” volumes,
direct deliveries have
been canceled.
CANCELED
Actually,
not.
©Instaclustr Pty Limited 2019, 2021, 2022
Consumers poll for messages by
visiting the counter at the post
office.
Poste Restante is not a post office
in a restaurant, it’s called general
delivery (in the US).
The mail is delivered to a post
office, and they hold it for you
until you call for it.
Instead we have
“Poste Restante”
Image: La Poste Restante, Francois-Auguste Biard (Wikimedia)
©Instaclustr Pty Limited 2019, 2021, 2022
Disconnected delivery—consumer doesn’t need to be
available to receive messages
There’s less effort for the messaging service— only
has to deliver to a few locations not many consumer
addresses
And it can scale better and handle more
complex delivery semantics! Postal
Service
Kafka topics act like a Post Office.
What are the benefits?
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Topics have 1 or more Partitions. Partitions function like multiple counters
and enable high concurrency.
A single counter introduces
delays and limits concurrency.
More counters
increases concurrency and
reduces delays.
First lets see how it scales.
What if there are many consumers for a topic?
©Instaclustr Pty Limited 2019, 2021, 2022
Santa
North
Pole
Let’s see what a message looks like.
In Kafka a message is
called a Record and is a
bit like a letter.
The topic is the
destination,
The North Pole.
©Instaclustr Pty Limited 2019, 2021, 2022
Santa
North
Pole Time semantics are flexible,
either the time of event
creation, ingestion, or
processing.
timestamp,
offset,
partition
T
opic
The “Postmark” includes a
timestamp, offset in the
topic, and the partition it
was sent to.
©Instaclustr Pty Limited 2019, 2021, 2022
Santa
North
Pole
We want this letter
sent to Santa not
just a random Elf.
timestamp,
offset,
partition
T
opic
Key Partition
(optional)
There’s also a thing called a Key, which is
optional. It refines the destination so it’s a
bit like the rest of the address.
©Instaclustr Pty Limited 2019, 2021, 2022
Santa
North
Pole
And the value is the contents (just a byte array).
Kafka Producers and consumers need to have a shared
serializer and de-serializer for both the key and value.
timestamp,
offset,
partition
T
opic
Key Partition
(optional)
Value (Content)
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka doesn’t look
inside the value, but
the Producer and
Consumer do, and the
Consumer can try and
make sense of the
message
(Can you?!)
Image: Dear Santa by Zack Poitras / http://theinclusive.net/article.php?id=268
©Instaclustr Pty Limited 2019, 2021, 2022
let’s look at
delivery semantics
For example, do we care if the
message actually arrives or not?
Next
©Instaclustr Pty Limited 2019, 2021, 2022
Last century, homing pigeons were
prone to getting lost or eaten by
predators, so the same message was
sent with several pigeons.
Yes we do!
Guaranteed
message
delivery is
desirable.
©Instaclustr Pty Limited 2019, 2021, 2022
How does Kafka guarantee delivery?
The message is always
persisted to disk.
This makes it
resilient to
power failure
A Message (M1) is
written to a broker (2).
Producer
M1
M1
Broker
1
Broker
2
Broker
3
©Instaclustr Pty Limited 2019, 2021, 2022
Producer
Broker
1
Broker Broker
3
M1 M1 M1
The message is also replicated on
multiple brokers, 3 is typical.
2
©Instaclustr Pty Limited 2019, 2021, 2022
Producer
M1
M1 M1
And makes it resilient to
loss of some servers
(all but one).
Broker
1
©Instaclustr Pty Limited 2019, 2021, 2022
Finally the producer gets acknowledgement once the message is
persisted and replicated (configurable for number, and sync or async).
Producer
M1
Broker
1
Broker
2
Broker
3
M1 M1
Acknowledgement
This also increases the
read concurrency as
partitions are spread
over multiple brokers.
The message is now
available from more
than one broker in
case some fail.
©Instaclustr Pty Limited 2019, 2021, 2022
let’s look at another aspect of
delivery semantics
Who gets the messages and how many
times are messages delivered?
Now
©Instaclustr Pty Limited 2019, 2021, 2022
Producer
Consumer
Consumer
Consumer
Consumer
?
Kafka is “pub-sub”. It’s loosely coupled,
producers and consumers don’t know about
each other.
©Instaclustr Pty Limited 2019, 2021, 2022
Filtering, or which consumers get which messages, is topic based.
- Producers send messages to topics.
- Consumers subscribe to topics of interest, e.g. parties.
- When they poll they only receive messages sent to those topics.
None of these consumers will receive messages sent to the “Work” topic.
Producer
Consumer
Consumer
Consumer
Consumer
Topic “Parties”
Topic “Work”
Consumers subscribed
to Topic “Parties”
Consumers poll to
receive messages
from “Parties”
Consumers not subscribed to
“Work” messages
©Instaclustr Pty Limited 2019, 2021, 2022
A few more details and we can see how this works.
Kafka works like Amish Barn raising.
Partitions and a consumer group share work
across multiple consumers, the more
partitions a topic has the more consumers
it supports.
Image: Paul Cyr ©2018 NorthernMainePhotos.com
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka also works like Clones.
It supports delivery of the same message to
multiple consumers with consumer groups.
Kafka doesn’t throw messages away
immediately they are delivered, so the
same message can be delivered to
multiple consumer groups.
Image: Shutterstock.com
©Instaclustr Pty Limited 2019, 2021, 2022
Consumers subscribed to ”parties” topic are allocated partitions.
When they poll they will only get messages from their allocated
partitions.
Consumer
Partition n
Topic “Parties”
Partition 1
Producer
Partition 2
Consumer Group
Consumer
Consumer Group
Consumer
Consumer
©Instaclustr Pty Limited 2019, 2021, 2022
This enables consumers in the same group to share the work
around. Each consumer gets only a subset of the available
messages.
Partition n
Topic “Parties”
Partition 1
Producer
Partition 2
Consumer Group
Consumer
Consumer
Consumers share
work within groups
Consumer
©Instaclustr Pty Limited 2019, 2021, 2022
Multiple groups enable message broadcasting. Messages
are duplicated across groups, as each consumer group
receives a copy of each message.
Consumer
Consumer
Consumer
Consumer
Topic “Parties”
Partition 1
Partition 2
Partition n
Producer
Consumer Group
Consumer Group
Messages are
duplicated across
Consumer groups
©Instaclustr Pty Limited 2019, 2021, 2022
Which messages are delivered to which consumers?
The final aspect of delivery semantics
is to do with message keys.
If a message has a key, then Kafka uses
Partition based delivery.
Messages with
the same key are
always sent to the same partition and
therefore the same consumer. And the
order (within partitions) is guaranteed.
Key
©Instaclustr Pty Limited 2019, 2021, 2022
But if the key is null, then Kafka uses
round robin delivery.
Each message is delivered to the next partition.
Round
robin
delivery
©Instaclustr Pty Limited 2019, 2021, 2022
Let’s look at a concrete example with two consumer groups:
Group 1: Nerds
which has multiple consumers
Group 2: The Pugsters
which has a single consumer, Zug
Image: Shutterstock.com
Bill
Paul
Penny
Kate
Millie
Jenny
Image: Nenad Aksic / Shutterstock.com
©Instaclustr Pty Limited 2019, 2021, 2022
Consumer 1
(Bill)
Consumer 2
(Jenny)
Consumer 1 (Zug
from The Pugsters)
Topic “Parties”
Partition 1
Partition 2
Partition n
Producer
Group “Nerds”
Group “Pugsters”
Consumers
subscribe to
“Parties”
Each message (1, 2, etc.) is sent to the next
partition, and consumers allocated to that
partition will receive the message when they
poll next.
Looking at the case
where there’s
No Keyfirst
Round robin
No
Key
1
2
etc
1
2
1
2
Consumer n
©Instaclustr Pty Limited 2019, 2021, 2022
Here’s what actually happens.
We’re not showing the producer,
topics, or partitions for simplicity.
You’ll have to imagine them.
Bill
Paul
Penny
Kate
Millie
Jenny
No
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Penny
Kate
Millie
Jenny
Both Groups subscribe to T
opic“parties”
(assuming 6 partitions, each consumer in the Nerds
group gets 1 partition each; Zug gets them all)
1
Paul
Subscribe to
“Parties”
No
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Pau
l
Penny
Kate
Millie
Jenny
Producer sends record with the
value “Pool party—Invitation” to
“parties” topic (there’s no key)
2
Invitation
No
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Value
Bill
Paul
Penny
Kate
Millie
Jenny
Bill and Zug receive a copy of the
invitation and plan to attend
3
Invitation Invitation
No
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Pen ny
Pau
l
Kate
Millie
Jenny
The Producer sends another record with the
value “Pool party—Canceled”
4
No
Key
Invitation
Canceled
©Instaclustr Pty Limited 2019, 2021, 2022
Invitation
Bill
Paul
Penny
Kate
Millie
Jenny
In the Nerds group, Jenny gets the message this time as it’s round robin, and Zug
gets it as he’s the only consumer in his group:
▶ Jenny ignores it as she didn’t get the original invite
▶ Bill wastes his time trying to go (as he doesn’t know it’s canceled)
▶ The rest of the gang aren’t surprised at not receiving any invites and
stay home to do some hacking
5
Invitation
Canceled
No
Key
Invitation
Canceled
©Instaclustr Pty Limited 2019, 2021, 2022
Zug plans
something else
fun instead… A
jam session with
his band
Image: Shutterstock.com
©Instaclustr Pty Limited 2019, 2021, 2022
Consumer 1
(Bill)
Consumer 2
(Jenny)
Consumer 1
(Zug)
Topic “Parties”
Partition 1
Partition 2
Partition n
Producer
Group “Nerds”
Group “Pugster”
Consumers
subscribe to
“Parties”
The key is hashed to a partition, so the Message is
always sent to that partition. Assume there are 3
messages, and messages 1 and 2 are hashed to the
same partition.
How does it work if
there is a Key?
1,2
3
etc
1,2
3
1,2
3
Consumer n
Hashed to
partition Key
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Paul
Penny
Kate
Millie
Jenny
As before Both Groups subscribe to
Topic “parties”
The Producer sends a record with the
key equal to “Pool Party” and the
value equal to “Invitation” to
“parties” topic
Here’s what happens with a
key, assuming that the key is
the “title” of the message
(“Pool Party”), and the value
is invitation or canceled
1
2
Key
Invitation
Key
Value
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Paul
Penny
Kate
Millie
Jenny
As before, Bill and Zug receive a copy of the
invitation and plan to attend
3
Invitation
Key
Key
Invitation
Key
Value
©Instaclustr Pty Limited 2019, 2021, 2022
Value
Bill
Pen n
y
Kate
Millie
Jenny
The Producer sends another record with
the same key but with the value
“canceled” to “parties” topic
4
Key
Invitation
Key
Value
Canceled
Paul
Value
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Invitation
Key
Value
Paul
Penny
Kate
Millie
Jenny
This time, Bill and Zug receive the cancelation
(the same consumers as the key is identical)
5
BK
il
el
y
Key
Value
©Instaclustr Pty Limited 2019, 2021, 2022
Invitation
Key
Value
Bill
Canceled
Key
Value
Key
Invitation
BK
il
el
y
Key
Canceled
Key
Value
Key
Paul
Penny
Kate
Millie
The Producer sends out another
invitation to a Halloween party.
The key is different this time.
6
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Key
Jenny
Bill
Paul
Penny
Kate
Millie
Jenny
Jenny receives the Halloween invitation as the key is
different and the record is sent to Jenny’s partition.
Zug is the only consumer in his group so he gets
every record no matter what partition it’s sent to.
7
Key
©Instaclustr Pty Limited 2019, 2021, 2022
Bill
Jenny’s
partitionkey
This time Zug
gets dressed up
and has fun at
the party.
Image: Shutterstock.com
©Instaclustr Pty Limited 2019, 2021, 2022
But wait! There’s more—
event reprocessing
(time travel)!
Kafka stores message streams on disk, so
Consumers can go back and request the same
messages they’ve already received, earlier messages, or
ignore some messages etc.
Image: Shutterstock.com
©Instaclustr Pty Limited 2019, 2021, 2022
©Instaclustr Pty Limited 2019, 2021, 2022
So Zug can go
“back to the
future”!
©Instaclustr Pty Limited 2019, 2021, 2022
But! The postal system
is global and
heterogeneous
©Instaclustr Pty Limited 2019, 2021, 2022
How can
post offices
be connected?
©Instaclustr Pty Limited 2019, 2021, 2022
Underground pneumatic tubes delivered mail between
postal facilities in USA cities in the 1900’s
(Source: Wikimediacommons)
Compressed Air?
©Instaclustr Pty Limited 2019, 2021, 2022
Sink
Source
Kafka Connect enables message flows across
heterogenous systems.
From Sources to Sinks via a Lake (Kafka)
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Connect Architecture:
Source and Sink Connectors
©Instaclustr Pty Limited 2019, 2021, 2022
Form Pipelines (Berlin)
©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Paul Brebner)
For Beer?
©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Paul Brebner)
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elasticsearch sink connector Tides Index
Example of a Kafka Connect IoT pipeline:
Tidal Data à Kafka à Elasticsearch à Kibana
©Instaclustr Pty Limited, 2021
REST source connector
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24
04:18",
"v":"0.597"}]}
©Instaclustr Pty Limited 2019, 2021, 2022
Tides Topic
REST call
JSON result
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24 04:18",
"v":"0.597"}]}
Elasticsearch sink connector Tides Index
Example of a Kafka Connect IoT pipeline:
Tidal Data à Kafka à Elasticsearch à Kibana
©Instaclustr Pty Limited, 2021
REST source connector
{"metadata": {
"id":"8724580",
"name":"Key West",
"lat":"24.5508”,
"lon":"-81.8081"},
"data":[{
"t":"2020-09-24
04:18",
"v":"0.597"}]}
©Instaclustr Pty Limited 2019, 2021, 2022
Connectors
require
Configuration
Source: https://commons.wikimedia.org/wiki/File:Royal_mail_sorting.jpg
What’s Still Missing?
Mail Sorting
©Instaclustr Pty Limited 2019, 2021, 2022
Automated!
Source: https://commons.wikimedia.org/wiki/File:Post_Sorting_Machine_(4479045801).jpg
©Instaclustr Pty Limited 2019, 2021, 2022
At Scale!
(Source: https://commons.wikimedia.org/w/index.php?search=mail+sorting&title=Special:MediaSearch&go=Go&type=image)
©Instaclustr Pty Limited 2019, 2021, 2022
Kafka Streams:
Topics in, Topics out, via Streams
©Instaclustr Pty Limited 2019, 2021, 2022
All Kafka APIs: Producer, Consumer, Connect, Streams
(Source: Shutterstock)
Simple Streams Topology
©Instaclustr Pty Limited 2019, 2021, 2022
Join
Group
Filter
Aggregate
etc.
Complex Streams Topology
(Source: Shutterstock)
Kafka Streams = Rapids!?
©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Shutterstock)
Streams get complicated quickly!
One way to keep dry…
©Instaclustr Pty Limited 2019, 2021, 2022
Or, This Diagram Which Explains
The Order of Streams DSL Operations
(Source: https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html)
©Instaclustr Pty Limited 2019, 2021, 2022
Dr Black has been murdered in the
Billiard Room with a Candlestick!
Whodunnit?!
[KSTREAM-FILTER-0000000024]: Conservatory: Professor Plum has no alibi
[KSTREAM-FILTER-0000000024]: Library: Colonel Mustard has no alibi
[KSTREAM-FILTER-0000000024]: Billiard Room: Mrs White has no alibi
Cluedo Kafka Streams Example
Tracks who’s in
what rooms and
when, and emits
list of suspects
without an alibi
©Instaclustr Pty Limited 2019, 2021, 2022
Topology of Cluedo Streams Example
This tool is very useful for
visualizing and debugging streams
https://zz85.github.io/kafka-streams-viz/
©Instaclustr Pty Limited 2019, 2021, 2022
Some Kafka Use Cases
©Instaclustr Pty Limited 2019, 2021, 2022
Example 1 - Kafka ”Kongo” Logistics IoT
Application – Goods, Warehouses, Trucks,
Sensors and Rules
©Instaclustr Pty Limited 2019, 2021, 2022
Detect transportation and storage violations in
real-time
©Instaclustr Pty Limited 2019, 2021, 2022
And Kafka Streams to prevent Truck Overloading
©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Shutterstock)
Example 2 - One of these things is not
like the others
©Instaclustr Pty Limited 2019, 2021, 2022
(Source: Shutterstock)
Massively Scalable Anomaly
Detection with Kafka and Cassandra
©Instaclustr Pty Limited 2019, 2021, 2022
19 Billion Checks/day with 470 CPU Cores
©Instaclustr Pty Limited 2019, 2021, 2022
0
2
4
6
8
10
12
14
16
18
20
0 50 100 150 200 250 300 350 400 450 500
Billion
checks/day
Total CPU Cores
Anomaly checks/day (billion)
19
Billion
Example 3 - Which Came First? State
or Events?
©Instaclustr Pty Limited 2019, 2021, 2022
State
(Source: Shutterstock)
State
Events
Change Data Capture (CDC) with Debezium and
Kafka Connect
State to Events and back to State again
©Instaclustr Pty Limited 2019, 2021, 2022
State
Events
State
Apache Kafka:
https://kafka.apache.org/
Gently down the Stream:
www.gentlydownthe.stream
That’s it for this
short visual
introduction to
Apache Kafka.
For more information
please have a look at the
Apache Kafka docs, the
Instaclustr Blogs, and check out
our free Kafka trial.
“Gently down the
Stream” - another
“Visual” introduction to
Kafka, with Otters!
©Instaclustr Pty Limited 2019, 2021, 2022
All of my blogs (Cassandra, Kafka, MirrorMaker, Spark, Zookeeper,
OpenSearch, Redis, PostgreSQL, Debezium, Cadence, etc)
www.instaclustr.com/paul-brebner/
Kafka Streams Cluedo Example (part of “Kongo” Kafka intro series)
www.instaclustr.com/blog/kongo-5-3-apache-kafka-streams-examples/
Kafka Connect Pipeline Series (Tides data processing)
www.instaclustr.com/blog/kafka-connect-pipelines-conclusion-pipeline-series-part-10/
Kafka Xmas Tree Lights Simulation (my 1st Kafka program)
www.instaclustr.com/blog/seasons-greetings-instaclustr-kafka-christmas-tree-light-simulation/
Instaclustr’s Managed Kafka (Free Trial)
www.instaclustr.com/platform/managed-apache-kafka/
Instaclustr Blogs
© Instaclustr Pty Limited 2019, 2021, 2022 [https://www.instaclustr.com/company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute,
publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.

More Related Content

PDF
Cloud Native Networking & Security with Cilium & eBPF
PDF
Kafka Streams: What it is, and how to use it?
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Apache Kafka - Overview
PPTX
Introduction to Apache Kafka
PDF
Fundamentals of Apache Kafka
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
Cloud Native Networking & Security with Cilium & eBPF
Kafka Streams: What it is, and how to use it?
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache Kafka - Overview
Introduction to Apache Kafka
Fundamentals of Apache Kafka
Kafka Tutorial - basics of the Kafka streaming platform

What's hot (20)

PDF
Introduction to Kafka Streams
PDF
Introduction to apache kafka
PDF
Kubernetes Networking with Cilium - Deep Dive
PPTX
Apache kafka
PPTX
Kafka presentation
PPTX
A visual introduction to Apache Kafka
PDF
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
PDF
Low Latency Mobile Messaging using MQTT
PDF
Spark Summit EU talk by Mike Percy
PDF
cilium-public.pdf
PDF
Apache Kafka Introduction
PDF
EBPF and Linux Networking
PDF
MQTT - Protocol for the Internet of Things
PDF
PDF
Common issues with Apache Kafka® Producer
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Kubernetes Networking 101
ODP
Stream processing using Kafka
PDF
SDM – A New (Subsea) Cable Paradigm
PDF
Introduction to Apache Kafka
Introduction to Kafka Streams
Introduction to apache kafka
Kubernetes Networking with Cilium - Deep Dive
Apache kafka
Kafka presentation
A visual introduction to Apache Kafka
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Low Latency Mobile Messaging using MQTT
Spark Summit EU talk by Mike Percy
cilium-public.pdf
Apache Kafka Introduction
EBPF and Linux Networking
MQTT - Protocol for the Internet of Things
Common issues with Apache Kafka® Producer
Kafka Tutorial - introduction to the Kafka streaming platform
Kubernetes Networking 101
Stream processing using Kafka
SDM – A New (Subsea) Cable Paradigm
Introduction to Apache Kafka
Ad

Similar to A Visual Introduction to Apache Kafka (20)

PDF
Hello, kafka! (an introduction to apache kafka)
PDF
How Digital is Changing Direct Mail
PPTX
MuleSoft Meetup Singapore #8 March 2021
PDF
Open Source Telecom Software Survey 2022, Alan Quayle
PPTX
Kafka RealTime Streaming
PDF
Managing Cloud Native Data On Kubernetes 1st Early Release Jeff Carpenter Pat...
PDF
APAC-05 XMPP AccessGrid presentation
PDF
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
PDF
Challenges Consuming Programmable Telecoms from the Developer’s Perspective
PDF
TADSummit EMEA 2019, Challenges Consuming Programmable Telecoms from the Deve...
PDF
The big shift 2011 07
PPTX
Magazine collect
PDF
IBM MQ and Kafka, what is the difference?
PDF
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
PPT
Unleashing the Power of the Unbound Book
PDF
Continuous Delivery For Kubernetes Chapters 1 2 Mauricio Salatino
PDF
Cloud Computing-The Challenges for Data Networks-Final Poster
PDF
Mobile Marketing 2015
PDF
Cubeacon Smart Retail Industry with iBeacon Technology
PDF
A Whitepaper on Hybrid Set-Top-Box
Hello, kafka! (an introduction to apache kafka)
How Digital is Changing Direct Mail
MuleSoft Meetup Singapore #8 March 2021
Open Source Telecom Software Survey 2022, Alan Quayle
Kafka RealTime Streaming
Managing Cloud Native Data On Kubernetes 1st Early Release Jeff Carpenter Pat...
APAC-05 XMPP AccessGrid presentation
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Challenges Consuming Programmable Telecoms from the Developer’s Perspective
TADSummit EMEA 2019, Challenges Consuming Programmable Telecoms from the Deve...
The big shift 2011 07
Magazine collect
IBM MQ and Kafka, what is the difference?
Kafka in Action MEAP V12 Dylan D Scott Viktor Gamov Dave Klein
Unleashing the Power of the Unbound Book
Continuous Delivery For Kubernetes Chapters 1 2 Mauricio Salatino
Cloud Computing-The Challenges for Data Networks-Final Poster
Mobile Marketing 2015
Cubeacon Smart Retail Industry with iBeacon Technology
A Whitepaper on Hybrid Set-Top-Box
Ad

More from Paul Brebner (20)

PPTX
Streaming More For Less With Apache Kafka Tiered Storage
PDF
30 Of My Favourite Open Source Technologies In 30 Minutes
PDF
Superpower Your Apache Kafka Applications Development with Complementary Open...
PDF
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
PDF
Architecting Applications With Multiple Open Source Big Data Technologies
PDF
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
PDF
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
PDF
Spinning your Drones with Cadence Workflows and Apache Kafka
PDF
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
PDF
Scaling Open Source Big Data Cloud Applications is Easy/Hard
PDF
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
PDF
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
PDF
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
PDF
Grid Middleware – Principles, Practice and Potential
PDF
Grid middleware is easy to install, configure, secure, debug and manage acros...
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
PPTX
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
PPTX
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Streaming More For Less With Apache Kafka Tiered Storage
30 Of My Favourite Open Source Technologies In 30 Minutes
Superpower Your Apache Kafka Applications Development with Complementary Open...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Architecting Applications With Multiple Open Source Big Data Technologies
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Spinning your Drones with Cadence Workflows and Apache Kafka
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Grid Middleware – Principles, Practice and Potential
Grid middleware is easy to install, configure, secure, debug and manage acros...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
0b101000 years of computing: a personal timeline - decade "0", the 1980's

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Machine learning based COVID-19 study performance prediction
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25-Week II
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A Presentation on Artificial Intelligence
Machine learning based COVID-19 study performance prediction
Review of recent advances in non-invasive hemoglobin estimation
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf

A Visual Introduction to Apache Kafka

  • 1. Paul Brebner Technology Evangelist www.instaclustr.com [email protected] © Instaclustr Pty Limited, 2022 [https://www.instaclustr.com/ company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.
  • 2. In this Visual Introduction to Kafka, we’re going to build a Postal Service We’ll learn about Kafka Producers, Consumers, Topics, Partitions, Keys, Records, Delivery Semantics (Guaranteed delivery, and who gets what messages), Consumer Groups, Kafka Connect and Streams! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 3. Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster. What is ©Instaclustr Pty Limited 2019, 2021, 2022 Kafka?
  • 4. Kafka has lots of benefits: It’s Fast: It has high throughput and low latency It’s Scalable: It’s horizontally scalable, to scale just add nodes and partitions It’s Reliable: It’s distributed and fault tolerant It has Zero Data Loss: Messages are persisted to disk with an immutable log It’s Open Source: An Apache project And it’s available as an Instaclustr Managed Service: On multiple cloud platforms Managed Service Fast Scalable Reliable Durable Open Source ©Instaclustr Pty Limited 2019, 2021, 2022
  • 5. But the usual Kafka diagram (right) is a bit monochrome and boring. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 6. This visual introduction will be more colourful and it’s going to be an extended story… ©Instaclustr Pty Limited 2019, 2021, 2022
  • 7. Let’s build a modern day fully electronic postal service T o send messages from A to B Postal Service A B ©Instaclustr Pty Limited 2019, 2021, 2022
  • 8. T o B, the consumer, the recipient of the message. A is a producer, it sends a message… First, we need an “A”. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 9. Due to the decline in “snail mail” volumes, direct deliveries have been canceled. CANCELED Actually, not. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 10. Consumers poll for messages by visiting the counter at the post office. Poste Restante is not a post office in a restaurant, it’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it. Instead we have “Poste Restante” Image: La Poste Restante, Francois-Auguste Biard (Wikimedia) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 11. Disconnected delivery—consumer doesn’t need to be available to receive messages There’s less effort for the messaging service— only has to deliver to a few locations not many consumer addresses And it can scale better and handle more complex delivery semantics! Postal Service Kafka topics act like a Post Office. What are the benefits? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 12. Kafka Topics have 1 or more Partitions. Partitions function like multiple counters and enable high concurrency. A single counter introduces delays and limits concurrency. More counters increases concurrency and reduces delays. First lets see how it scales. What if there are many consumers for a topic? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 13. Santa North Pole Let’s see what a message looks like. In Kafka a message is called a Record and is a bit like a letter. The topic is the destination, The North Pole. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 14. Santa North Pole Time semantics are flexible, either the time of event creation, ingestion, or processing. timestamp, offset, partition T opic The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 15. Santa North Pole We want this letter sent to Santa not just a random Elf. timestamp, offset, partition T opic Key Partition (optional) There’s also a thing called a Key, which is optional. It refines the destination so it’s a bit like the rest of the address. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 16. Santa North Pole And the value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value. timestamp, offset, partition T opic Key Partition (optional) Value (Content) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 17. Kafka doesn’t look inside the value, but the Producer and Consumer do, and the Consumer can try and make sense of the message (Can you?!) Image: Dear Santa by Zack Poitras / http://theinclusive.net/article.php?id=268 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 18. let’s look at delivery semantics For example, do we care if the message actually arrives or not? Next ©Instaclustr Pty Limited 2019, 2021, 2022
  • 19. Last century, homing pigeons were prone to getting lost or eaten by predators, so the same message was sent with several pigeons. Yes we do! Guaranteed message delivery is desirable. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 20. How does Kafka guarantee delivery? The message is always persisted to disk. This makes it resilient to power failure A Message (M1) is written to a broker (2). Producer M1 M1 Broker 1 Broker 2 Broker 3 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 21. Producer Broker 1 Broker Broker 3 M1 M1 M1 The message is also replicated on multiple brokers, 3 is typical. 2 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 22. Producer M1 M1 M1 And makes it resilient to loss of some servers (all but one). Broker 1 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 23. Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async). Producer M1 Broker 1 Broker 2 Broker 3 M1 M1 Acknowledgement This also increases the read concurrency as partitions are spread over multiple brokers. The message is now available from more than one broker in case some fail. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 24. let’s look at another aspect of delivery semantics Who gets the messages and how many times are messages delivered? Now ©Instaclustr Pty Limited 2019, 2021, 2022
  • 25. Producer Consumer Consumer Consumer Consumer ? Kafka is “pub-sub”. It’s loosely coupled, producers and consumers don’t know about each other. ©Instaclustr Pty Limited 2019, 2021, 2022
  • 26. Filtering, or which consumers get which messages, is topic based. - Producers send messages to topics. - Consumers subscribe to topics of interest, e.g. parties. - When they poll they only receive messages sent to those topics. None of these consumers will receive messages sent to the “Work” topic. Producer Consumer Consumer Consumer Consumer Topic “Parties” Topic “Work” Consumers subscribed to Topic “Parties” Consumers poll to receive messages from “Parties” Consumers not subscribed to “Work” messages ©Instaclustr Pty Limited 2019, 2021, 2022
  • 27. A few more details and we can see how this works. Kafka works like Amish Barn raising. Partitions and a consumer group share work across multiple consumers, the more partitions a topic has the more consumers it supports. Image: Paul Cyr ©2018 NorthernMainePhotos.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 28. Kafka also works like Clones. It supports delivery of the same message to multiple consumers with consumer groups. Kafka doesn’t throw messages away immediately they are delivered, so the same message can be delivered to multiple consumer groups. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 29. Consumers subscribed to ”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions. Consumer Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Group Consumer Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
  • 30. This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages. Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer ©Instaclustr Pty Limited 2019, 2021, 2022
  • 31. Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message. Consumer Consumer Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition n Producer Consumer Group Consumer Group Messages are duplicated across Consumer groups ©Instaclustr Pty Limited 2019, 2021, 2022
  • 32. Which messages are delivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order (within partitions) is guaranteed. Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 33. But if the key is null, then Kafka uses round robin delivery. Each message is delivered to the next partition. Round robin delivery ©Instaclustr Pty Limited 2019, 2021, 2022
  • 34. Let’s look at a concrete example with two consumer groups: Group 1: Nerds which has multiple consumers Group 2: The Pugsters which has a single consumer, Zug Image: Shutterstock.com Bill Paul Penny Kate Millie Jenny Image: Nenad Aksic / Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 35. Consumer 1 (Bill) Consumer 2 (Jenny) Consumer 1 (Zug from The Pugsters) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugsters” Consumers subscribe to “Parties” Each message (1, 2, etc.) is sent to the next partition, and consumers allocated to that partition will receive the message when they poll next. Looking at the case where there’s No Keyfirst Round robin No Key 1 2 etc 1 2 1 2 Consumer n ©Instaclustr Pty Limited 2019, 2021, 2022
  • 36. Here’s what actually happens. We’re not showing the producer, topics, or partitions for simplicity. You’ll have to imagine them. Bill Paul Penny Kate Millie Jenny No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 37. Bill Penny Kate Millie Jenny Both Groups subscribe to T opic“parties” (assuming 6 partitions, each consumer in the Nerds group gets 1 partition each; Zug gets them all) 1 Paul Subscribe to “Parties” No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 38. Bill Pau l Penny Kate Millie Jenny Producer sends record with the value “Pool party—Invitation” to “parties” topic (there’s no key) 2 Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022 Value
  • 39. Bill Paul Penny Kate Millie Jenny Bill and Zug receive a copy of the invitation and plan to attend 3 Invitation Invitation No Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 40. Bill Pen ny Pau l Kate Millie Jenny The Producer sends another record with the value “Pool party—Canceled” 4 No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation
  • 41. Bill Paul Penny Kate Millie Jenny In the Nerds group, Jenny gets the message this time as it’s round robin, and Zug gets it as he’s the only consumer in his group: ▶ Jenny ignores it as she didn’t get the original invite ▶ Bill wastes his time trying to go (as he doesn’t know it’s canceled) ▶ The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking 5 Invitation Canceled No Key Invitation Canceled ©Instaclustr Pty Limited 2019, 2021, 2022
  • 42. Zug plans something else fun instead… A jam session with his band Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 43. Consumer 1 (Bill) Consumer 2 (Jenny) Consumer 1 (Zug) Topic “Parties” Partition 1 Partition 2 Partition n Producer Group “Nerds” Group “Pugster” Consumers subscribe to “Parties” The key is hashed to a partition, so the Message is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to the same partition. How does it work if there is a Key? 1,2 3 etc 1,2 3 1,2 3 Consumer n Hashed to partition Key ©Instaclustr Pty Limited 2019, 2021, 2022
  • 44. Bill Paul Penny Kate Millie Jenny As before Both Groups subscribe to Topic “parties” The Producer sends a record with the key equal to “Pool Party” and the value equal to “Invitation” to “parties” topic Here’s what happens with a key, assuming that the key is the “title” of the message (“Pool Party”), and the value is invitation or canceled 1 2 Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022
  • 45. Bill Paul Penny Kate Millie Jenny As before, Bill and Zug receive a copy of the invitation and plan to attend 3 Invitation Key Key Invitation Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Value
  • 46. Bill Pen n y Kate Millie Jenny The Producer sends another record with the same key but with the value “canceled” to “parties” topic 4 Key Invitation Key Value Canceled Paul Value Key ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value
  • 47. Paul Penny Kate Millie Jenny This time, Bill and Zug receive the cancelation (the same consumers as the key is identical) 5 BK il el y Key Value ©Instaclustr Pty Limited 2019, 2021, 2022 Invitation Key Value Bill Canceled Key Value Key Invitation BK il el y Key Canceled Key Value Key
  • 48. Paul Penny Kate Millie The Producer sends out another invitation to a Halloween party. The key is different this time. 6 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Key Jenny Bill
  • 49. Paul Penny Kate Millie Jenny Jenny receives the Halloween invitation as the key is different and the record is sent to Jenny’s partition. Zug is the only consumer in his group so he gets every record no matter what partition it’s sent to. 7 Key ©Instaclustr Pty Limited 2019, 2021, 2022 Bill Jenny’s partitionkey
  • 50. This time Zug gets dressed up and has fun at the party. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 51. But wait! There’s more— event reprocessing (time travel)! Kafka stores message streams on disk, so Consumers can go back and request the same messages they’ve already received, earlier messages, or ignore some messages etc. Image: Shutterstock.com ©Instaclustr Pty Limited 2019, 2021, 2022
  • 52. ©Instaclustr Pty Limited 2019, 2021, 2022 So Zug can go “back to the future”! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 53. But! The postal system is global and heterogeneous ©Instaclustr Pty Limited 2019, 2021, 2022
  • 54. How can post offices be connected? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 55. Underground pneumatic tubes delivered mail between postal facilities in USA cities in the 1900’s (Source: Wikimediacommons) Compressed Air? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 56. Sink Source Kafka Connect enables message flows across heterogenous systems. From Sources to Sinks via a Lake (Kafka) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 57. Kafka Connect Architecture: Source and Sink Connectors ©Instaclustr Pty Limited 2019, 2021, 2022
  • 58. Form Pipelines (Berlin) ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Paul Brebner)
  • 59. For Beer? ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Paul Brebner)
  • 60. Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022
  • 61. Tides Topic REST call JSON result {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} Elasticsearch sink connector Tides Index Example of a Kafka Connect IoT pipeline: Tidal Data à Kafka à Elasticsearch à Kibana ©Instaclustr Pty Limited, 2021 REST source connector {"metadata": { "id":"8724580", "name":"Key West", "lat":"24.5508”, "lon":"-81.8081"}, "data":[{ "t":"2020-09-24 04:18", "v":"0.597"}]} ©Instaclustr Pty Limited 2019, 2021, 2022 Connectors require Configuration
  • 65. Kafka Streams: Topics in, Topics out, via Streams ©Instaclustr Pty Limited 2019, 2021, 2022 All Kafka APIs: Producer, Consumer, Connect, Streams
  • 66. (Source: Shutterstock) Simple Streams Topology ©Instaclustr Pty Limited 2019, 2021, 2022
  • 68. (Source: Shutterstock) Kafka Streams = Rapids!? ©Instaclustr Pty Limited 2019, 2021, 2022
  • 69. (Source: Shutterstock) Streams get complicated quickly! One way to keep dry… ©Instaclustr Pty Limited 2019, 2021, 2022
  • 70. Or, This Diagram Which Explains The Order of Streams DSL Operations (Source: https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html) ©Instaclustr Pty Limited 2019, 2021, 2022
  • 71. Dr Black has been murdered in the Billiard Room with a Candlestick! Whodunnit?! [KSTREAM-FILTER-0000000024]: Conservatory: Professor Plum has no alibi [KSTREAM-FILTER-0000000024]: Library: Colonel Mustard has no alibi [KSTREAM-FILTER-0000000024]: Billiard Room: Mrs White has no alibi Cluedo Kafka Streams Example Tracks who’s in what rooms and when, and emits list of suspects without an alibi ©Instaclustr Pty Limited 2019, 2021, 2022
  • 72. Topology of Cluedo Streams Example This tool is very useful for visualizing and debugging streams https://zz85.github.io/kafka-streams-viz/ ©Instaclustr Pty Limited 2019, 2021, 2022
  • 73. Some Kafka Use Cases ©Instaclustr Pty Limited 2019, 2021, 2022
  • 74. Example 1 - Kafka ”Kongo” Logistics IoT Application – Goods, Warehouses, Trucks, Sensors and Rules ©Instaclustr Pty Limited 2019, 2021, 2022
  • 75. Detect transportation and storage violations in real-time ©Instaclustr Pty Limited 2019, 2021, 2022
  • 76. And Kafka Streams to prevent Truck Overloading ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
  • 77. Example 2 - One of these things is not like the others ©Instaclustr Pty Limited 2019, 2021, 2022 (Source: Shutterstock)
  • 78. Massively Scalable Anomaly Detection with Kafka and Cassandra ©Instaclustr Pty Limited 2019, 2021, 2022
  • 79. 19 Billion Checks/day with 470 CPU Cores ©Instaclustr Pty Limited 2019, 2021, 2022 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 300 350 400 450 500 Billion checks/day Total CPU Cores Anomaly checks/day (billion) 19 Billion
  • 80. Example 3 - Which Came First? State or Events? ©Instaclustr Pty Limited 2019, 2021, 2022 State (Source: Shutterstock) State Events
  • 81. Change Data Capture (CDC) with Debezium and Kafka Connect State to Events and back to State again ©Instaclustr Pty Limited 2019, 2021, 2022 State Events State
  • 82. Apache Kafka: https://kafka.apache.org/ Gently down the Stream: www.gentlydownthe.stream That’s it for this short visual introduction to Apache Kafka. For more information please have a look at the Apache Kafka docs, the Instaclustr Blogs, and check out our free Kafka trial. “Gently down the Stream” - another “Visual” introduction to Kafka, with Otters! ©Instaclustr Pty Limited 2019, 2021, 2022
  • 83. All of my blogs (Cassandra, Kafka, MirrorMaker, Spark, Zookeeper, OpenSearch, Redis, PostgreSQL, Debezium, Cadence, etc) www.instaclustr.com/paul-brebner/ Kafka Streams Cluedo Example (part of “Kongo” Kafka intro series) www.instaclustr.com/blog/kongo-5-3-apache-kafka-streams-examples/ Kafka Connect Pipeline Series (Tides data processing) www.instaclustr.com/blog/kafka-connect-pipelines-conclusion-pipeline-series-part-10/ Kafka Xmas Tree Lights Simulation (my 1st Kafka program) www.instaclustr.com/blog/seasons-greetings-instaclustr-kafka-christmas-tree-light-simulation/ Instaclustr’s Managed Kafka (Free Trial) www.instaclustr.com/platform/managed-apache-kafka/ Instaclustr Blogs © Instaclustr Pty Limited 2019, 2021, 2022 [https://www.instaclustr.com/company/policies/terms-conditions/]. Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited.