SlideShare a Scribd company logo
STREAMING WITH
KAFKA
Publish/Subscribe Messaging with Kafka
What is streaming?
■ So far we’ve really just talked about processing historical, existing big data
– Sitting on HDFS
– Sitting in a database
■ But how does new data get into your cluster? Especially if it’s “Big data”?
– New log entries from your web servers
– New sensor data from your IoT system
– New stock trades
■ Streaming lets you publish this data, in real time, to your cluster.
– And you can even process it in real time as it comes in!
Two problems
■ How to get data from many different sources flowing into your cluster
■ Processing it when it gets there
■ First, let’s focus on the first problem
Enter Kafka
■ Kafka is a general-purpose publish/subscribe messaging system
■ Kafka servers store all incoming messages from publishers for some period of
time, and publishes them to a stream of data called a topic.
■ Kafka consumers subscribe to one or more topics, and receive data as it’s
published
■ A stream / topic can have many different consumers, all with their own
position in the stream maintained
■ It’s not just for Hadoop
Kafka architecture
Kafka Cluster
App App App
App
App
App App App
DB
DB
Producers
Consumers
Stream
Processors
Connectors
How Kafka scales
Image: kafka.apache.org
■ Kafka itself may be distributed among
many processes on many servers
– Will distribute the storage of stream
data as well
■ Consumers may also be distributed
– Consumers of the same group will
have messages distributed amongst
them
– Consumers of different groups will get
their own copy of each message
Let’s play
■ Start Kafka on our sandbox
■ Set up a topic
– Publish some data to it, and watch it get consumed
■ Set up a file connector
– Monitor a log file and publish additions to it

More Related Content

PDF
Streaming Analytics unit 2 notes for engineers
PPTX
Kafka presentation
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PDF
Devoxx university - Kafka de haut en bas
PDF
Connect K of SMACK:pykafka, kafka-python or?
PDF
PPTX
Kafka
Streaming Analytics unit 2 notes for engineers
Kafka presentation
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
Devoxx university - Kafka de haut en bas
Connect K of SMACK:pykafka, kafka-python or?
Kafka

Similar to STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka (20)

PPTX
Kafkha real time analytics platform.pptx
PDF
Kafka syed academy_v1_introduction
PDF
Introduction_to_Kafka - A brief Overview.pdf
PPTX
How kafka is transforming hadoop, spark & storm
PPTX
Kafka Basic For Beginners
PDF
Streaming Data with Apache Kafka
PDF
Kafka for begginer
PDF
Data Pipelines with Apache Kafka
PPTX
Service messaging using Kafka
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
PPTX
Kafka overview
PPTX
Kafka for Scale
PPTX
Introduction to Kafka Streams Presentation
PDF
Event driven-arch
PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
PDF
Building Streaming Data Applications Using Apache Kafka
PPTX
How Apache Kafka is transforming Hadoop, Spark and Storm
PDF
An Introduction to Apache Kafka
PDF
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Kafkha real time analytics platform.pptx
Kafka syed academy_v1_introduction
Introduction_to_Kafka - A brief Overview.pdf
How kafka is transforming hadoop, spark & storm
Kafka Basic For Beginners
Streaming Data with Apache Kafka
Kafka for begginer
Data Pipelines with Apache Kafka
Service messaging using Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Kafka overview
Kafka for Scale
Introduction to Kafka Streams Presentation
Event driven-arch
Streaming Data and Stream Processing with Apache Kafka
Self-hosting Kafka at Scale: Netflix's Journey & Challenges
Building Streaming Data Applications Using Apache Kafka
How Apache Kafka is transforming Hadoop, Spark and Storm
An Introduction to Apache Kafka
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
DOCX
573137875-Attendance-Management-System-original
PPTX
UNIT 4 Total Quality Management .pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
additive manufacturing of ss316l using mig welding
PPT
Total quality management ppt for engineering students
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Current and future trends in Computer Vision.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
PPT on Performance Review to get promotions
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Artificial Intelligence
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
CYBER-CRIMES AND SECURITY A guide to understanding
573137875-Attendance-Management-System-original
UNIT 4 Total Quality Management .pptx
III.4.1.2_The_Space_Environment.p pdffdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
R24 SURVEYING LAB MANUAL for civil enggi
additive manufacturing of ss316l using mig welding
Total quality management ppt for engineering students
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Current and future trends in Computer Vision.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Sustainable Sites - Green Building Construction
PPT on Performance Review to get promotions
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Artificial Intelligence
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Ad

STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka

  • 2. What is streaming? ■ So far we’ve really just talked about processing historical, existing big data – Sitting on HDFS – Sitting in a database ■ But how does new data get into your cluster? Especially if it’s “Big data”? – New log entries from your web servers – New sensor data from your IoT system – New stock trades ■ Streaming lets you publish this data, in real time, to your cluster. – And you can even process it in real time as it comes in!
  • 3. Two problems ■ How to get data from many different sources flowing into your cluster ■ Processing it when it gets there ■ First, let’s focus on the first problem
  • 4. Enter Kafka ■ Kafka is a general-purpose publish/subscribe messaging system ■ Kafka servers store all incoming messages from publishers for some period of time, and publishes them to a stream of data called a topic. ■ Kafka consumers subscribe to one or more topics, and receive data as it’s published ■ A stream / topic can have many different consumers, all with their own position in the stream maintained ■ It’s not just for Hadoop
  • 5. Kafka architecture Kafka Cluster App App App App App App App App DB DB Producers Consumers Stream Processors Connectors
  • 6. How Kafka scales Image: kafka.apache.org ■ Kafka itself may be distributed among many processes on many servers – Will distribute the storage of stream data as well ■ Consumers may also be distributed – Consumers of the same group will have messages distributed amongst them – Consumers of different groups will get their own copy of each message
  • 7. Let’s play ■ Start Kafka on our sandbox ■ Set up a topic – Publish some data to it, and watch it get consumed ■ Set up a file connector – Monitor a log file and publish additions to it