SlideShare a Scribd company logo
INTRODUCING APACHE
KAFKA – SCALABLE,
RELIABLE EVENT BUS &
ESSAGE QUEUE
Maarten Smeets & Lucas Jellema
09 February 2017, Nieuwegein
M
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
Producers
Consumers
SENDING MESSAGES TO CONSUMERS
• Dependency on producer at design time and at run time
• Deal with multiple consumers?
• Synchronous (blocking) waits
• (how to) Cross technology realms
• (how to) Cross host, location, clouds
• Availability of consumers
• Message delivery guarantees
• Scaling, high (peak) volumes
Producers
Consumers
MESSAGING – TO DECOUPLE PUB AND SUB
MESSAGING AS WE KNOW IT
• JMS, Oracle Advanced Queuing, IBM MQ, MS MQ, RabbitMQ,
MQTT, XMPP, WebSockets, …
• Challenges
• Costs
• Scalability (size and speed)
• (lack of) Distribution (and therefore availability)
• Complexity of infrastructure
• Message delivery guarantees
• Lack of technology openness
• Deal with temporarily offline consumers
• Retain history
Producers
Consumers
tcp
tcp
Producers
Consumers
Topic
KAFKA TERMINOLOGY
• Topic
• Message
• == ByteArray
• Broker
• Producer
• Consumer
Producer Consumer
Topic
Broker
Key
Value
Time
Message
Producers
Consumers
Topic
Broker
Key
Value
Time
CONSUMING
• Messages are available to consumers only when they have been
committed
• Kafka does not push
• Unlike JMS
• Read does not destroy
• Unlike JMS Topic
• (some) History available
• Offline consumers can catch up
• Consumers can re-consume from the past
• Delivery Guarantees
• Ordering maintained
• At-least-once (per consumer) by default; at-most-once and exactly-once can be
implemented
Producers
Consumers
Topic
Broker
Key
Value
Time
WHAT’S SO SPECIAL?
• Durable
• Scalable
• High volume
• High speed
• Available
• Distributed
• Open
• Quick start
• Free (no license costs)
Producers
Consumers
Topic
Broker
tcp
tcp
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND
CONSUMING MESSAGES
(PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
HISTORY
• ..- 2010 – creation at Linkedin
• It was designed to provide a high-performance, scalable messaging system which could handle multiple
consumers, many types of data [at high volumes and peaks], and provide for the availability & persistence
of clean, structured data […] in real time.
• 2011 – open source under the Apache Incubator
• October 2012 – top project under Apache Software Foundation
• 2014 – several orginal Kafka engineers founded Confluent
• 2016
• Introduction of Kafka Connect (0.9)
• Introduction of Kafka Streams (0.10)
• Octobermost recent stable release 0.10.1
• Kafka is used by many large corporations:
• Walmart, Cisco, Netflix, PayPal, LinkedIn, eBay, Spotify, Uber, Sift Science
• And embraced by many software vendors & cloud providers
USE CASES
• Messaging & Queuing
• Handle fast data (IoT, social media, web clicks, infra metrics, …)
• Receive and save – low latency, high volume
• Log aggregation
• Event Sourcing and Commit Log
• Stream processing
• Single enterprise event backbone
• Connect business processes, applications, microservices
PLAYS NICE WITH & ARCHITECTURE
SOME NUMBERS
KAFKA INCARNATIONS
• Kafka Docker Images
• Confluent (Spotify, Wurstmeister)
• Cloud:
• CloudKarafka
• IBM BlueMix Message Hub
• AWS supports Kafka (but tries to propose Amazon Kinesis Streams)
• Google runs Kafka (though tries to push Google Pub/Sub)
• Bitnami VMs for many cloud providers such as Azure, GCP, AWS, OPC
• Kafka Connectors in many platforms
• Azure IoT Hub, Google Pub/Sub, Mule AnyPoint Connector, …
• Oracle ….
KAFKA ECO SYSTEM
• Confluent
• OpenSource: Native Clients, Camus (link to Hadoop), REST Proxy, Schema
Registry
• Enterprise: Kafka Ops Dashboard/Control Center, Auto Data Balancing,
MultiData Center Replication ,
• Community
• Connectors
• Client libraries
• …
KAFKA CONNECT
• Kafka Connect is a framework for connectors (aka adapters) that
provide bridges for
• Producing from specific technologies
to Kafka
• Consuming from Kafka to specific
technologies
• For example:
• JDBC
• Hadoop
KAFKA CONNECT – CONNECTORS
KAFKA STREAMS
• Real Time Event [Stream] Processing integrated into Kafka
• Aggregations & Top-N
• Time Windows
• Continuous Queries
• Latest State (event sourcing)
• Turn Stream (of changes) into Table
(of most recent or current state)
• Part of the state can be quite old
• A Kafka Streams client will have state
in memory
• Always to be recreated from topic partition
log files
• Note: Kafka Streams is relatively new
• Only support for Java clients
KAFKA STREAMS
Topic
Filter
Aggregate
Join
Topic
Map (Xform)
Publish
Topic
EXAMPLE OF KAFKA STREAMS
Topic
SelectKey
AggregateByKey
Join
Topic
Map (Xform)
Publish
CountryMessage
Continent
Name
Population
Size
Set Continent
as key
Update Top 3
biggest
countries
As JSON
Size in Square
Miles, % of entire
continent
Total area for
each continent
Topic: Top3CountrySizePerContinent
countries2.csv
Topic
Broker
Producer
SelectKey
AggregateByKey
Map (Xform)
Publish
Set Continent as
key
Update Top 3
biggest countries
Topic:
Top3CountrySizePerContinent
EXAMPLE OF
KAFKA STREAMS
Topic
SelectKey
AggregateByKey
Publish to
Topic
Topic: Top3CountrySizePerContinent
CountryMessage
Continent
Name
Population
Size
Set Continent
as key
Update Top 3
biggest
countries
As JSON
Print
Producers
Consumers
Topic
Broker
tcp
tcp
PARTITIONS
• Topics are configured with a number of partitions
• Storage, serialization, replication, availability, order guarantee are all at
partition level
• Each partition is an ordered, immutable sequence of records that is
continually appended to
• Producer can specify the destination
partition to write to
• Alternatively the partition is determined from
the message key or simply by load balancing
• Multiple partitions can be written to at
the same time
PRODUCING MESSAGES
• The producer sets the partition for each message
• Note: it should talk to the broker who is leader for that partition
• Messages can be produced one-by-one or in batches
• Batches balance latency vs throughput
• A batch can contain messages for different topics & partitions
• Messages can be compressed
• Producers can configure required
acknowledgement level (from broker)
• No (waiting for leader to complete)
• Wait for leader to commit [to file log]
• Wait for all replicas to complete
• Note: messages are serialized to byte array
as the wire format
Producers
Topic
Broker
tcp
CONSUMING
• A consumer pulls from a Topic
• Consuming can be done in parallel to producing
• And many consumers can consume at the same time
• Each consumer has a Message Offset per partition
• That can be different across consumers
• That can be adjusted at any time
• Delivery Guarantees
• At least once (per consumer) by default; adjust offset when all messages have been processed
• At-most-once and exactly-once can be implemented (for example: maintain offset in the same
transaction that processes the messages)
• Message Retention
• Time Based (at least for … time)
• Size Based (log files can be no larger than … MB/GB/TB)
• Key based aka Log Compaction (retain at least the latest
message for each primary key value)
Consumers
Topic
tcp
CONSUMER GROUPS FOR PARALLEL
MESSAGE PROCESSING
• Multiple consumers can be in the same Consumer Group
• They collaborate on processing messages from a Topic (horizontal
scalability)
• Each Consumer in the Group receives
messages from a different partition
• Messages are delivered to
only one consumer in the group
• Consumers outside the Consumer Group can
pull from the same Topic & Partition
• And process the same messages
Consumers
Topic
tcp
CLUSTER – RELIABLE, SCALABLE
• A cluster consists of multiple brokers,
possibly on multiple server nodes
• Each node runs
• Apache ZooKeeper to keep track
• One or more Kafka Brokers
• Each with their own set of storage logs
• Each partition lives on one or more
brokers (and sets of logs)
• Defined through topic replication factor
• One is the leader, the others are follower
replicas
• Clients communicate about a partition with the broker
that contains the leader replica for that partition
• Changes are committed by the leader, then
replicated across the followers
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
CLUSTER – RELIABLE, SCALABLE (2)
• ZooKeeper has list of all brokers
and a list of all topics and partitions
(with leader and ISR)
• Leader has list of all alive followers
(in-synch replicas or ISR)
• Follower-replicas consume messages
from the leader to synchronize
• Similar to normal message consumers
• Note: message producers requesting
full acknowledgement will get ack
once all follower replicates have
consumed the message
• N-1 replicas can fail without loss of messages
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
Broker
Topic
Partition
Partition
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX SCENARIOS
AND SOME BACKGROUND &
ADMIN
ORACLE AND KAFKA
• On premises
• Service Bus Kafka transport (demo!)
• Stream Analytics Kafka Adapter (demo!)
• GoldenGate for Big Data handler for Kafka
• Data Integrator (coming soon)
• Cloud
• Elastic Big Data & Streaming platform
• Event Hub (coming soon)
GOLDENGATE FOR BIG DATA
GOLDENGATE FOR BIG DATA
DATA INTEGRATOR
ELASTIC BIG DATA & STREAMING PLATFORM
EVENT HUB
EVENT HUB
EVENT HUB
AGENDA
INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 -
PRODUCING AND CONSUMING
MESSAGES (PUB/SUB)
DINNER KAFKA:
SOME HISTORY, A PEEK UNDER
THE HOOD, ROLE IN
ARCHITECTURE AND USE CASES
KAFKA AND ORACLE HANDSON PART 2 –
MORE COMPLEX
SCENARIOS AND SOME
BACKGROUND & ADMIN
HANDS ON PART 2
• Continue part 1
• Java and/or Node consuming/producing
• Some Admin & advanced stuff
• Partitions
• Multiple producers, multiple consumers
• New consumer, go back in time
• Expiration of messages
• Multi-broker, Cluster configuration, ZooKeeper
• Resources: https://github.com/MaartenSmeets/kafka-workshop
• Blog: technology.amis.nl
On Oracle, Cloud, SQL, PL/SQL, Java, JavaScript, Continuous Delivery, SOA, BPM & more
• Email: maarten.smeets@amis.nl , lucas.jellema@amis.nl
• : @MaartenSmeetsNL , @lucasjellema
• : smeetsm , lucas-jellema
• : www.amis.nl, info@amis.nl
+31 306016000
Edisonbaan 15,
Nieuwegein
Ad

Recommended

PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PDF
Apache kafka
NexThoughts Technologies
 
PPTX
Apache kafka
Viswanath J
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
Kafka and Spark Streaming
datamantra
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Apache Kafka
Joe Stein
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Kafka internals
David Groozman
 
PPTX
Apache kafka
Rahul Jain
 
PDF
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
ODP
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 

More Related Content

What's hot (20)

PPTX
Apache Kafka
Joe Stein
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Kafka internals
David Groozman
 
PPTX
Apache kafka
Rahul Jain
 
PDF
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
ODP
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Apache Kafka
Joe Stein
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Introduction Apache Kafka
Joe Stein
 
kafka for db as postgres
PivotalOpenSourceHub
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
An Introduction to Apache Kafka
Amir Sedighi
 
Design Patterns for working with Fast Data
MapR Technologies
 
Introduction to Kafka
Ducas Francis
 
Apache Kafka - Free Friday
Otávio Carvalho
 
Apache Kafka - Martin Podval
Martin Podval
 
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Kafka 101
Clement Demonchy
 
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka internals
David Groozman
 
Apache kafka
Rahul Jain
 
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 

Similar to AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message Queue (20)

PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PDF
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
PPTX
Kafkha real time analytics platform.pptx
dummyuseage1
 
PPTX
Kafka and ibm event streams basics
Brian S. Paskin
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
PDF
Event driven-arch
Mohammed Shoaib
 
PPTX
Kafka overview
Shanki Singh Gandhi
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
PPTX
Distributed messaging through Kafka
Dileep Kalidindi
 
PPTX
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
PDF
STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka
GravenGuan
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
Kafkha real time analytics platform.pptx
dummyuseage1
 
Kafka and ibm event streams basics
Brian S. Paskin
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Event driven-arch
Mohammed Shoaib
 
Kafka overview
Shanki Singh Gandhi
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Distributed messaging through Kafka
Dileep Kalidindi
 
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
STREAMING WITH KAFKA Publish/Subscribe Messaging with Kafka
GravenGuan
 
Apache Kafka
Saroj Panyasrivanit
 
Introduction to Kafka and Event-Driven
arconsis
 
Ad

More from Lucas Jellema (20)

PPTX
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
PPTX
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
PPTX
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
PPTX
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
PPTX
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
PPTX
Op je vingers tellen... tot 1000!
Lucas Jellema
 
PPTX
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
PPTX
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
PPTX
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
PPTX
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
PPTX
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
PPTX
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
PPTX
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
PPTX
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
PPTX
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
PPTX
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
PPTX
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
PPTX
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
Op je vingers tellen... tot 1000!
Lucas Jellema
 
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Ad

Recently uploaded (20)

PDF
Decipher SEO Solutions for your startup needs.
mathai2
 
PPTX
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
PDF
Best Software Development at Best Prices
softechies7
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PDF
Introduction to Agile Frameworks for Product Managers.pdf
Ali Vahed
 
PDF
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
PDF
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
PDF
Simplify Task, Team, and Project Management with Orangescrum Work
Orangescrum
 
PDF
A Guide to Telemedicine Software Development.pdf
Olivero Bozzelli
 
PPTX
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
PPTX
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
PPTX
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
PPT
Complete Guideliness to Build an Effective Maintenance Plan.ppt
QualityzeInc1
 
PDF
Sysinfo OST to PST Converter Infographic
SysInfo Tools
 
PDF
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PPTX
Top Time Tracking Solutions for Accountants
oliviareed320
 
PPTX
HYBRIDIZATION OF ALKANES AND ALKENES ...
karishmaduhijod1
 
PDF
University Campus Navigation for All - Peak of Data & AI
Safe Software
 
Decipher SEO Solutions for your startup needs.
mathai2
 
IDM Crack with Internet Download Manager 6.42 Build 41 [Latest 2025]
pcprocore
 
Best Software Development at Best Prices
softechies7
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Introduction to Agile Frameworks for Product Managers.pdf
Ali Vahed
 
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
Modern Platform Engineering with Choreo - The AI-Native Internal Developer Pl...
WSO2
 
Simplify Task, Team, and Project Management with Orangescrum Work
Orangescrum
 
A Guide to Telemedicine Software Development.pdf
Olivero Bozzelli
 
AI for PV: Development and Governance for a Regulated Industry
Biologit
 
Test Case Design Techniques – Practical Examples & Best Practices in Software...
Muhammad Fahad Bashir
 
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
Complete Guideliness to Build an Effective Maintenance Plan.ppt
QualityzeInc1
 
Sysinfo OST to PST Converter Infographic
SysInfo Tools
 
On-Device AI: Is It Time to Go All-In, or Do We Still Need the Cloud?
Hassan Abid
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
Top Time Tracking Solutions for Accountants
oliviareed320
 
HYBRIDIZATION OF ALKANES AND ALKENES ...
karishmaduhijod1
 
University Campus Navigation for All - Peak of Data & AI
Safe Software
 

AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message Queue

  • 1. INTRODUCING APACHE KAFKA – SCALABLE, RELIABLE EVENT BUS & ESSAGE QUEUE Maarten Smeets & Lucas Jellema 09 February 2017, Nieuwegein M
  • 2. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 4. SENDING MESSAGES TO CONSUMERS • Dependency on producer at design time and at run time • Deal with multiple consumers? • Synchronous (blocking) waits • (how to) Cross technology realms • (how to) Cross host, location, clouds • Availability of consumers • Message delivery guarantees • Scaling, high (peak) volumes
  • 6. MESSAGING AS WE KNOW IT • JMS, Oracle Advanced Queuing, IBM MQ, MS MQ, RabbitMQ, MQTT, XMPP, WebSockets, … • Challenges • Costs • Scalability (size and speed) • (lack of) Distribution (and therefore availability) • Complexity of infrastructure • Message delivery guarantees • Lack of technology openness • Deal with temporarily offline consumers • Retain history
  • 9. KAFKA TERMINOLOGY • Topic • Message • == ByteArray • Broker • Producer • Consumer Producer Consumer Topic Broker Key Value Time Message
  • 11. CONSUMING • Messages are available to consumers only when they have been committed • Kafka does not push • Unlike JMS • Read does not destroy • Unlike JMS Topic • (some) History available • Offline consumers can catch up • Consumers can re-consume from the past • Delivery Guarantees • Ordering maintained • At-least-once (per consumer) by default; at-most-once and exactly-once can be implemented
  • 13. WHAT’S SO SPECIAL? • Durable • Scalable • High volume • High speed • Available • Distributed • Open • Quick start • Free (no license costs)
  • 15. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 16. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 17. HISTORY • ..- 2010 – creation at Linkedin • It was designed to provide a high-performance, scalable messaging system which could handle multiple consumers, many types of data [at high volumes and peaks], and provide for the availability & persistence of clean, structured data […] in real time. • 2011 – open source under the Apache Incubator • October 2012 – top project under Apache Software Foundation • 2014 – several orginal Kafka engineers founded Confluent • 2016 • Introduction of Kafka Connect (0.9) • Introduction of Kafka Streams (0.10) • Octobermost recent stable release 0.10.1 • Kafka is used by many large corporations: • Walmart, Cisco, Netflix, PayPal, LinkedIn, eBay, Spotify, Uber, Sift Science • And embraced by many software vendors & cloud providers
  • 18. USE CASES • Messaging & Queuing • Handle fast data (IoT, social media, web clicks, infra metrics, …) • Receive and save – low latency, high volume • Log aggregation • Event Sourcing and Commit Log • Stream processing • Single enterprise event backbone • Connect business processes, applications, microservices
  • 19. PLAYS NICE WITH & ARCHITECTURE
  • 21. KAFKA INCARNATIONS • Kafka Docker Images • Confluent (Spotify, Wurstmeister) • Cloud: • CloudKarafka • IBM BlueMix Message Hub • AWS supports Kafka (but tries to propose Amazon Kinesis Streams) • Google runs Kafka (though tries to push Google Pub/Sub) • Bitnami VMs for many cloud providers such as Azure, GCP, AWS, OPC • Kafka Connectors in many platforms • Azure IoT Hub, Google Pub/Sub, Mule AnyPoint Connector, … • Oracle ….
  • 22. KAFKA ECO SYSTEM • Confluent • OpenSource: Native Clients, Camus (link to Hadoop), REST Proxy, Schema Registry • Enterprise: Kafka Ops Dashboard/Control Center, Auto Data Balancing, MultiData Center Replication , • Community • Connectors • Client libraries • …
  • 23. KAFKA CONNECT • Kafka Connect is a framework for connectors (aka adapters) that provide bridges for • Producing from specific technologies to Kafka • Consuming from Kafka to specific technologies • For example: • JDBC • Hadoop
  • 24. KAFKA CONNECT – CONNECTORS
  • 25. KAFKA STREAMS • Real Time Event [Stream] Processing integrated into Kafka • Aggregations & Top-N • Time Windows • Continuous Queries • Latest State (event sourcing) • Turn Stream (of changes) into Table (of most recent or current state) • Part of the state can be quite old • A Kafka Streams client will have state in memory • Always to be recreated from topic partition log files • Note: Kafka Streams is relatively new • Only support for Java clients
  • 27. EXAMPLE OF KAFKA STREAMS Topic SelectKey AggregateByKey Join Topic Map (Xform) Publish CountryMessage Continent Name Population Size Set Continent as key Update Top 3 biggest countries As JSON Size in Square Miles, % of entire continent Total area for each continent Topic: Top3CountrySizePerContinent
  • 28. countries2.csv Topic Broker Producer SelectKey AggregateByKey Map (Xform) Publish Set Continent as key Update Top 3 biggest countries Topic: Top3CountrySizePerContinent
  • 29. EXAMPLE OF KAFKA STREAMS Topic SelectKey AggregateByKey Publish to Topic Topic: Top3CountrySizePerContinent CountryMessage Continent Name Population Size Set Continent as key Update Top 3 biggest countries As JSON Print
  • 31. PARTITIONS • Topics are configured with a number of partitions • Storage, serialization, replication, availability, order guarantee are all at partition level • Each partition is an ordered, immutable sequence of records that is continually appended to • Producer can specify the destination partition to write to • Alternatively the partition is determined from the message key or simply by load balancing • Multiple partitions can be written to at the same time
  • 32. PRODUCING MESSAGES • The producer sets the partition for each message • Note: it should talk to the broker who is leader for that partition • Messages can be produced one-by-one or in batches • Batches balance latency vs throughput • A batch can contain messages for different topics & partitions • Messages can be compressed • Producers can configure required acknowledgement level (from broker) • No (waiting for leader to complete) • Wait for leader to commit [to file log] • Wait for all replicas to complete • Note: messages are serialized to byte array as the wire format Producers Topic Broker tcp
  • 33. CONSUMING • A consumer pulls from a Topic • Consuming can be done in parallel to producing • And many consumers can consume at the same time • Each consumer has a Message Offset per partition • That can be different across consumers • That can be adjusted at any time • Delivery Guarantees • At least once (per consumer) by default; adjust offset when all messages have been processed • At-most-once and exactly-once can be implemented (for example: maintain offset in the same transaction that processes the messages) • Message Retention • Time Based (at least for … time) • Size Based (log files can be no larger than … MB/GB/TB) • Key based aka Log Compaction (retain at least the latest message for each primary key value) Consumers Topic tcp
  • 34. CONSUMER GROUPS FOR PARALLEL MESSAGE PROCESSING • Multiple consumers can be in the same Consumer Group • They collaborate on processing messages from a Topic (horizontal scalability) • Each Consumer in the Group receives messages from a different partition • Messages are delivered to only one consumer in the group • Consumers outside the Consumer Group can pull from the same Topic & Partition • And process the same messages Consumers Topic tcp
  • 35. CLUSTER – RELIABLE, SCALABLE • A cluster consists of multiple brokers, possibly on multiple server nodes • Each node runs • Apache ZooKeeper to keep track • One or more Kafka Brokers • Each with their own set of storage logs • Each partition lives on one or more brokers (and sets of logs) • Defined through topic replication factor • One is the leader, the others are follower replicas • Clients communicate about a partition with the broker that contains the leader replica for that partition • Changes are committed by the leader, then replicated across the followers Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition
  • 36. CLUSTER – RELIABLE, SCALABLE (2) • ZooKeeper has list of all brokers and a list of all topics and partitions (with leader and ISR) • Leader has list of all alive followers (in-synch replicas or ISR) • Follower-replicas consume messages from the leader to synchronize • Similar to normal message consumers • Note: message producers requesting full acknowledgement will get ack once all follower replicates have consumed the message • N-1 replicas can fail without loss of messages Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition Broker Topic Partition Partition
  • 37. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 38. ORACLE AND KAFKA • On premises • Service Bus Kafka transport (demo!) • Stream Analytics Kafka Adapter (demo!) • GoldenGate for Big Data handler for Kafka • Data Integrator (coming soon) • Cloud • Elastic Big Data & Streaming platform • Event Hub (coming soon)
  • 42. ELASTIC BIG DATA & STREAMING PLATFORM
  • 46. AGENDA INTRODUCTION & OVERVIEW DEMO HANDSON PART 1 - PRODUCING AND CONSUMING MESSAGES (PUB/SUB) DINNER KAFKA: SOME HISTORY, A PEEK UNDER THE HOOD, ROLE IN ARCHITECTURE AND USE CASES KAFKA AND ORACLE HANDSON PART 2 – MORE COMPLEX SCENARIOS AND SOME BACKGROUND & ADMIN
  • 47. HANDS ON PART 2 • Continue part 1 • Java and/or Node consuming/producing • Some Admin & advanced stuff • Partitions • Multiple producers, multiple consumers • New consumer, go back in time • Expiration of messages • Multi-broker, Cluster configuration, ZooKeeper
  • 48. • Resources: https://github.com/MaartenSmeets/kafka-workshop • Blog: technology.amis.nl On Oracle, Cloud, SQL, PL/SQL, Java, JavaScript, Continuous Delivery, SOA, BPM & more • Email: [email protected] , [email protected] • : @MaartenSmeetsNL , @lucasjellema • : smeetsm , lucas-jellema • : www.amis.nl, [email protected] +31 306016000 Edisonbaan 15, Nieuwegein

Editor's Notes

  • #22: http://stackoverflow.com/questions/35861501/kafka-in-docker-not-working Docker images from Confluent: https://hub.docker.com/r/confluent/kafka/
  • #23: http://docs.confluent.io/2.0.0/platform.html
  • #24: http://docs.confluent.io/2.0.0/platform.html https://www.confluent.io/blog/apache-kafka-getting-started/