SlideShare a Scribd company logo
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra and Kafka Support on AWS/EC2
Cloudurable Support around Cassandra
and Kafka running in EC2
Brief introduction to Kafka Streaming Platform
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra / Kafka Support in EC2/AWS
Kafka
Introduction Kafka messaging
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
What is Kafka?
❖ Distributed Streaming Platform
❖ Publish and Subscribe to streams of records
❖ Fault tolerant storage
❖ Process records as they occur
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Usage
❖ Build real-time streaming data pipe-lines
❖ Enable in-memory microservices (actors, Akka, Vert.x,
Qbit)
❖ Build real-time streaming applications that react to
streams
❖ Real-time data analytics
❖ Transform, react, aggregate, join real-time data flows
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
❖ Metrics / KPIs gathering
❖ Aggregate statistics from many sources
❖ Even Sourcing
❖ Used with microservices (in-memory) and actor systems
❖ Commit Log
❖ External commit log for distributed systems. Replicated
data between nodes, re-sync for nodes to restore state
❖ Real-time data analytics, Stream Processing, Log
Aggregation, Messaging, Click-stream tracking, Audit trail,
etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Who uses Kafka?
❖ LinkedIn: Activity data and operational metrics
❖ Twitter: Uses it as part of Storm – stream processing
infrastructure
❖ Square: Kafka as bus to move all system events to
various Square data centers (logs, custom events,
metrics, an so on). Outputs to Splunk, Graphite, Esper-
like alerting systems
❖ Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box,
Cisco, CloudFlare, DataDog, LucidWorks, MailChimp,
NetFlix, etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Fundamentals
❖ Records have a key, value and timestamp
❖ Topic a stream of records
❖ Log topic storage on disk
❖ Partition / Segments (parts of Topic Log)
❖ Producer API to produce a streams or records
❖ Consumer API to consume a stream of records
❖ Broker: Cluster of Kafka servers running in cluster form broker.
Consists on many processes on many servers
❖ ZooKeeper: Does coordination of broker and consumers. Consistent file
system for configuration information and leadership election
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka: Topics, Producers, and
Consumers
Kafka
Cluster
Topic
Producer
Producer
Producer
Consumer
Consumer
Consumer
record
record
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
ZooKeeper does coordination for
Kafka Consumer and Kafka Cluster
Kafka Broker
Producer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Broker
Kafka Broker
Topic
ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Extensions
❖ Streams API to transform, aggregate, process records
from a stream and produce derivative streams
❖ Connector API reusable producers and consumers
(e.g., stream of changes from DynamoDB)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Connectors and
Streams
Kafka
Cluster
App
App
App
App
App
App
DB DB
App App
Connectors
Producers
Consumers
Streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Polyglot clients / Wire
protocol
❖ Kafka communication from clients and servers wire
protocol over TCP protocol
❖ Protocol versioned
❖ Maintains backwards compatibility
❖ Many languages supported
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topics and Logs
❖ Topic is a stream of records
❖ Topics stored in log
❖ Log broken up into partitions and segments
❖ Topics is a category or stream name
❖ Topics are pub/sub
❖ Can have zero or many consumers (subscribers)
❖ Topics are broken up into partitions for speed and size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partitions
❖ Topics are broken up into partitions
❖ Partitions are decided usually by key of record
❖ Key of record determines which partition
❖ Partitions are used to scale Kafka across many servers
❖ Record sent to correct partition by key
❖ Partitions are used to facilitate parallel consumers
❖ Records are consumed in parallel up to the number of
partitions
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Partition Log
❖ Partition is ordered, immutable sequence of records that is
continually appended to—a structured commit log
❖ Records in partitions are assigned sequential id number
called the offset
❖ Offset identifies each record within the partition
❖ Topic Partitions allow Kafka log to scale beyond a size that
will fit on a single server
❖ Topic partition must fit on servers that host it, but topic can
span many partitions hosted by many servers
❖ Topic Partitions are unit of parallelism - each consumer in a
consumer group can work on one partition at a time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Topic Partitions Layout
0 1 42 3 5 6 7 8 9 10 11
0 1 42 3 5 6 7 8
0 1 42 3 5 6 7 8 9 10
Older Newer
0 1 42 3 5 6 7
Partition
0
Partition
1
Partition
2
Partition
3
Writes
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Record retention
❖ Kafka cluster retains all published records
❖ Time based – configurable retention period
❖ Size based
❖ Compaction
❖ Retention policy of three days or two weeks or a month
❖ It is available for consumption until discarded by time,
size or compaction
❖ Consumption speed not impacted by size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumers / Producers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producers
Consumer Group B
Consumers remember offset where they left off.
Consumers groups each have their own offset.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Partition Distribution
❖ Each partition has leader server and zero or more follower
servers
❖ Leader handles all read and write requests for partition
❖ Followers replicate leader, and take over if leader dies
❖ Used for parallel consumer handling within a group
❖ Partitions of log are distributed over the servers in the Kafka
cluster with each server handling data and requests for a share
of partitions
❖ Each partition can be replicated across a configurable number of
Kafka servers
❖ Used for fault tolerance
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
❖ Producers send records to topics
❖ Producer picks which partition to send record to per
topic
❖ Can be done in a round-robin
❖ Can be based on priority
❖ Typically based on key of record
❖ Important: Producer picks partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumers
❖ Consumers are grouped into a Consumer Group
❖ Consumer group has a unique name
❖ Each consumer group is a subscriber
❖ Each consumer group maintains its own offset
❖ Multiple subscribers = multiple consumer groups
❖ A Record is delivered to one Consumer in a Consumer Group
❖ Each consumer in consumer groups takes records and only one
consumer in group gets same record
❖ Consumers in Consumer Group load balance record
consumption
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
2 server Kafka cluster hosting 4 partitions (P0-P5)
Kafka Cluster
Server 2
P0 P1 P5
Server 1
P2 P3 P4
Consumer Group A
C0 C1 C3
Consumer Group B
C0 C1 C3
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer
Consumption
❖ Kafka Consumer consumption divides partitions over consumer
instances
❖ Each Consumer is exclusive consumer of a "fair share" of partitions
❖ Consumer membership in group is handled by the Kafka protocol
dynamically
❖ If new Consumers join Consumer group they get share of partitions
❖ If Consumer dies, its partitions are split among remaining live
Consumers in group
❖ Order is only guaranteed within a single partition
❖ Since records are typically stored by key into a partition then order
per partition is sufficient for most use cases
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs JMS Messaging
❖ It is a bit like both Queues and Topics in JMS
❖ Kafka is a queue system per consumer in consumer group so
load balancing like JMS queue
❖ Kafka is a topic/pub/sub by offering Consumer Groups which act
like subscriptions
❖ Broadcast to multiple consumer groups
❖ By design Kafka is better suited for scale due to partition topic log
❖ Also by moving location in log to client/consumer side of equation
instead of the broker, less tracking required by Broker
❖ Handles parallel consumers better
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka scalable message
storage
❖ Kafka acts as a good storage system for records/messages
❖ Records written to Kafka topics are persisted to disk and
replicated to other servers for fault-tolerance
❖ Kafka Producers can wait on acknowledgement
❖ Write not complete until fully replicated
❖ Kafka disk structures scales well
❖ Writing in large streaming batches is fast
❖ Clients/Consumers control read position (offset)
❖ Kafka acts like high-speed file system for commit log storage,
replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Stream Processing
❖ Kafka for Stream Processing
❖ Kafka enable real-time processing of streams.
❖ Kafka supports stream processor
❖ Stream processor takes continual streams of records from input topics, performs
some processing, transformation, aggregation on input, and produces one or more
output streams
❖ A video player app might take in input streams of videos watched and videos paused,
and output a stream of user preferences and gear new video recommendations based
on recent user activity or aggregate activity of many users to see what new videos are
hot
❖ Kafka Stream API solves hard problems with out of order records, aggregating across
multiple streams, joining data from multiple streams, allowing for stateful computations,
and more
❖ Stream API builds on core Kafka primitives and has a life of its own
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Single
Node
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka
❖ Run ZooKeeper
❖ Run Kafka Server/Broker
❖ Create Kafka Topic
❖ Run producer
❖ Run consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Server
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running Kafka Producer and
Consumer
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab 1-A Use
Kafka Use single server version of
Kafka
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Cluster
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running many nodes
❖ Modify properties files
❖ Change port
❖ Change Kafka log location
❖ Start up many Kafka server instances
❖ Create Replicated Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Stay tuned

More Related Content

PPTX
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
PPTX
Kafka Tutorial - DevOps, Admin and Ops
PPTX
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
PPTX
Kafka Tutorial Advanced Kafka Consumers
PPTX
Kafka Intro With Simple Java Producer Consumers
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 2)
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Kafka Tutorial - DevOps, Admin and Ops
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka Tutorial Advanced Kafka Consumers
Kafka Intro With Simple Java Producer Consumers
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - Introduction to Apache Kafka (Part 2)
Avro Tutorial - Records with Schema for Kafka and Hadoop

What's hot (20)

PPTX
Kafka Tutorial, Kafka ecosystem with clustering examples
PDF
Kafka as a message queue
PPTX
Amazon AWS basics needed to run a Cassandra Cluster in AWS
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Kafka Tutorial: Kafka Security
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
PPTX
Kafka Tutorial: Streaming Data Architecture
PPTX
Best Practices for Running Kafka on Docker Containers
PPTX
Kafka and Avro with Confluent Schema Registry
PDF
Schema Evolution for Resilient Data microservices
PPTX
Kafka Tutorial: Advanced Producers
PPTX
Kafka: Internals
PDF
ES & Kafka
PPTX
Introduction to Kafka and Zookeeper
PPTX
Event Hub & Kafka
PPTX
Kafka blr-meetup-presentation - Kafka internals
PDF
Kafka clients and emitters
PPTX
Kafka tutorial
PPTX
Building Event-Driven Systems with Apache Kafka
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka as a message queue
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial: Kafka Security
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial: Streaming Data Architecture
Best Practices for Running Kafka on Docker Containers
Kafka and Avro with Confluent Schema Registry
Schema Evolution for Resilient Data microservices
Kafka Tutorial: Advanced Producers
Kafka: Internals
ES & Kafka
Introduction to Kafka and Zookeeper
Event Hub & Kafka
Kafka blr-meetup-presentation - Kafka internals
Kafka clients and emitters
Kafka tutorial
Building Event-Driven Systems with Apache Kafka
Ad

Similar to Brief introduction to Kafka Streaming Platform (20)

PDF
Kafka syed academy_v1_introduction
PDF
kafka-tutorial-cloudruable-v2.pdf
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
PPTX
Kafka
PDF
An Introduction to Apache Kafka
PDF
Introduction_to_Kafka - A brief Overview.pdf
PDF
Feeding Cassandra with Spark-Streaming and Kafka
PPTX
Kafkha real time analytics platform.pptx
PDF
Streaming Data with Apache Kafka
PPTX
Distributed messaging with Apache Kafka
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
PPTX
Kafka overview
PDF
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
PPTX
Fundamentals and Architecture of Apache Kafka
PDF
Kafka for begginer
PPTX
Apache kafka
PDF
Kafka 10000 feet view
PDF
Apache kafka
PPTX
Notes leo kafka
Kafka syed academy_v1_introduction
kafka-tutorial-cloudruable-v2.pdf
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
Kafka
An Introduction to Apache Kafka
Introduction_to_Kafka - A brief Overview.pdf
Feeding Cassandra with Spark-Streaming and Kafka
Kafkha real time analytics platform.pptx
Streaming Data with Apache Kafka
Distributed messaging with Apache Kafka
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
Kafka overview
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Fundamentals and Architecture of Apache Kafka
Kafka for begginer
Apache kafka
Kafka 10000 feet view
Apache kafka
Notes leo kafka
Ad

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
A Presentation on Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
1. Introduction to Computer Programming.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
Tartificialntelligence_presentation.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Group 1 Presentation -Planning and Decision Making .pptx
A Presentation on Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Getting Started with Data Integration: FME Form 101
1. Introduction to Computer Programming.pptx
SOPHOS-XG Firewall Administrator PPT.pptx

Brief introduction to Kafka Streaming Platform

  • 1. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cassandra and Kafka Support on AWS/EC2 Cloudurable Support around Cassandra and Kafka running in EC2
  • 3. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cassandra / Kafka Support in EC2/AWS Kafka Introduction Kafka messaging
  • 4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ What is Kafka? ❖ Distributed Streaming Platform ❖ Publish and Subscribe to streams of records ❖ Fault tolerant storage ❖ Process records as they occur
  • 5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Usage ❖ Build real-time streaming data pipe-lines ❖ Enable in-memory microservices (actors, Akka, Vert.x, Qbit) ❖ Build real-time streaming applications that react to streams ❖ Real-time data analytics ❖ Transform, react, aggregate, join real-time data flows
  • 6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Use Cases ❖ Metrics / KPIs gathering ❖ Aggregate statistics from many sources ❖ Even Sourcing ❖ Used with microservices (in-memory) and actor systems ❖ Commit Log ❖ External commit log for distributed systems. Replicated data between nodes, re-sync for nodes to restore state ❖ Real-time data analytics, Stream Processing, Log Aggregation, Messaging, Click-stream tracking, Audit trail, etc.
  • 7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Who uses Kafka? ❖ LinkedIn: Activity data and operational metrics ❖ Twitter: Uses it as part of Storm – stream processing infrastructure ❖ Square: Kafka as bus to move all system events to various Square data centers (logs, custom events, metrics, an so on). Outputs to Splunk, Graphite, Esper- like alerting systems ❖ Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, DataDog, LucidWorks, MailChimp, NetFlix, etc.
  • 8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Fundamentals ❖ Records have a key, value and timestamp ❖ Topic a stream of records ❖ Log topic storage on disk ❖ Partition / Segments (parts of Topic Log) ❖ Producer API to produce a streams or records ❖ Consumer API to consume a stream of records ❖ Broker: Cluster of Kafka servers running in cluster form broker. Consists on many processes on many servers ❖ ZooKeeper: Does coordination of broker and consumers. Consistent file system for configuration information and leadership election
  • 9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka: Topics, Producers, and Consumers Kafka Cluster Topic Producer Producer Producer Consumer Consumer Consumer record record
  • 10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ ZooKeeper does coordination for Kafka Consumer and Kafka Cluster Kafka Broker Producer Producer Producer Consumer Consumer Consumer Kafka Broker Kafka Broker Topic ZooKeeper
  • 11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Extensions ❖ Streams API to transform, aggregate, process records from a stream and produce derivative streams ❖ Connector API reusable producers and consumers (e.g., stream of changes from DynamoDB)
  • 12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Connectors and Streams Kafka Cluster App App App App App App DB DB App App Connectors Producers Consumers Streams
  • 13. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Polyglot clients / Wire protocol ❖ Kafka communication from clients and servers wire protocol over TCP protocol ❖ Protocol versioned ❖ Maintains backwards compatibility ❖ Many languages supported
  • 14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topics and Logs ❖ Topic is a stream of records ❖ Topics stored in log ❖ Log broken up into partitions and segments ❖ Topics is a category or stream name ❖ Topics are pub/sub ❖ Can have zero or many consumers (subscribers) ❖ Topics are broken up into partitions for speed and size
  • 15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topic Partitions ❖ Topics are broken up into partitions ❖ Partitions are decided usually by key of record ❖ Key of record determines which partition ❖ Partitions are used to scale Kafka across many servers ❖ Record sent to correct partition by key ❖ Partitions are used to facilitate parallel consumers ❖ Records are consumed in parallel up to the number of partitions
  • 16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Partition Log ❖ Partition is ordered, immutable sequence of records that is continually appended to—a structured commit log ❖ Records in partitions are assigned sequential id number called the offset ❖ Offset identifies each record within the partition ❖ Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server ❖ Topic partition must fit on servers that host it, but topic can span many partitions hosted by many servers ❖ Topic Partitions are unit of parallelism - each consumer in a consumer group can work on one partition at a time
  • 17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Topic Partitions Layout 0 1 42 3 5 6 7 8 9 10 11 0 1 42 3 5 6 7 8 0 1 42 3 5 6 7 8 9 10 Older Newer 0 1 42 3 5 6 7 Partition 0 Partition 1 Partition 2 Partition 3 Writes
  • 18. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Record retention ❖ Kafka cluster retains all published records ❖ Time based – configurable retention period ❖ Size based ❖ Compaction ❖ Retention policy of three days or two weeks or a month ❖ It is available for consumption until discarded by time, size or compaction ❖ Consumption speed not impacted by size
  • 19. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumers / Producers 0 1 42 3 5 6 7 8 9 10 11 Partition 0 Consumer Group A Producers Consumer Group B Consumers remember offset where they left off. Consumers groups each have their own offset.
  • 20. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Partition Distribution ❖ Each partition has leader server and zero or more follower servers ❖ Leader handles all read and write requests for partition ❖ Followers replicate leader, and take over if leader dies ❖ Used for parallel consumer handling within a group ❖ Partitions of log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of partitions ❖ Each partition can be replicated across a configurable number of Kafka servers ❖ Used for fault tolerance
  • 21. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producers ❖ Producers send records to topics ❖ Producer picks which partition to send record to per topic ❖ Can be done in a round-robin ❖ Can be based on priority ❖ Typically based on key of record ❖ Important: Producer picks partition
  • 22. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumers ❖ Consumers are grouped into a Consumer Group ❖ Consumer group has a unique name ❖ Each consumer group is a subscriber ❖ Each consumer group maintains its own offset ❖ Multiple subscribers = multiple consumer groups ❖ A Record is delivered to one Consumer in a Consumer Group ❖ Each consumer in consumer groups takes records and only one consumer in group gets same record ❖ Consumers in Consumer Group load balance record consumption
  • 23. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ 2 server Kafka cluster hosting 4 partitions (P0-P5) Kafka Cluster Server 2 P0 P1 P5 Server 1 P2 P3 P4 Consumer Group A C0 C1 C3 Consumer Group B C0 C1 C3
  • 24. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Consumption ❖ Kafka Consumer consumption divides partitions over consumer instances ❖ Each Consumer is exclusive consumer of a "fair share" of partitions ❖ Consumer membership in group is handled by the Kafka protocol dynamically ❖ If new Consumers join Consumer group they get share of partitions ❖ If Consumer dies, its partitions are split among remaining live Consumers in group ❖ Order is only guaranteed within a single partition ❖ Since records are typically stored by key into a partition then order per partition is sufficient for most use cases
  • 25. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka vs JMS Messaging ❖ It is a bit like both Queues and Topics in JMS ❖ Kafka is a queue system per consumer in consumer group so load balancing like JMS queue ❖ Kafka is a topic/pub/sub by offering Consumer Groups which act like subscriptions ❖ Broadcast to multiple consumer groups ❖ By design Kafka is better suited for scale due to partition topic log ❖ Also by moving location in log to client/consumer side of equation instead of the broker, less tracking required by Broker ❖ Handles parallel consumers better
  • 26. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka scalable message storage ❖ Kafka acts as a good storage system for records/messages ❖ Records written to Kafka topics are persisted to disk and replicated to other servers for fault-tolerance ❖ Kafka Producers can wait on acknowledgement ❖ Write not complete until fully replicated ❖ Kafka disk structures scales well ❖ Writing in large streaming batches is fast ❖ Clients/Consumers control read position (offset) ❖ Kafka acts like high-speed file system for commit log storage, replication
  • 27. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Stream Processing ❖ Kafka for Stream Processing ❖ Kafka enable real-time processing of streams. ❖ Kafka supports stream processor ❖ Stream processor takes continual streams of records from input topics, performs some processing, transformation, aggregation on input, and produces one or more output streams ❖ A video player app might take in input streams of videos watched and videos paused, and output a stream of user preferences and gear new video recommendations based on recent user activity or aggregate activity of many users to see what new videos are hot ❖ Kafka Stream API solves hard problems with out of order records, aggregating across multiple streams, joining data from multiple streams, allowing for stateful computations, and more ❖ Stream API builds on core Kafka primitives and has a life of its own
  • 28. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Single Node
  • 29. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka ❖ Run ZooKeeper ❖ Run Kafka Server/Broker ❖ Create Kafka Topic ❖ Run producer ❖ Run consumer
  • 30. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run ZooKeeper
  • 31. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka Server
  • 32. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Topic
  • 33. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producer
  • 34. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer
  • 35. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Running Kafka Producer and Consumer
  • 36. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Use Kafka to send and receive messages Lab 1-A Use Kafka Use single server version of Kafka
  • 37. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Cluster
  • 38. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Running many nodes ❖ Modify properties files ❖ Change port ❖ Change Kafka log location ❖ Start up many Kafka server instances ❖ Create Replicated Topic
  • 39. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Stay tuned

Editor's Notes

  • #8: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By