SlideShare a Scribd company logo
Apache Kafka 
Introduction 
http://kafka.apache.org/
Joe Stein 
• Developer, Architect & Technologist 
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly 
Big Data Open Source Security LLC provides professional services and product solutions for the collection, 
storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and 
distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data 
Infrastructure Components to use but also how to change their existing (or build new) systems to work with 
them. 
• Apache Kafka Committer & PMC member 
• Blog & Podcast - http://allthingshadoop.com 
• Twitter @allthingshadoop
Apache Kafka 
• Apache Kafka 
o http://kafka.apache.org 
• Apache Kafka Source Code 
o https://github.com/apache/kafka 
• Documentation 
o http://kafka.apache.org/documentation.html 
• Wiki 
o https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka decouples data-pipelines
Topics & Partitions
A high-throughput distributed messaging system 
rethought as a distributed commit log.
More! 
• Producers - ** push ** 
o Batching 
o Compression 
o Sync (Ack), Async (auto batch) 
o Replication 
o Sequential writes, guaranteed ordering within each partition 
• Consumers - ** pull ** 
o No state held by broker 
o Consumers control reading from the stream 
• Zero Copy for producers and consumers to and from the broker 
http://kafka.apache.org/documentation.html#maximizingefficiency 
• Message stay on disk when consumed, deletes on TTL or compaction 
https://kafka.apache.org/documentation.html#compaction
Client Libraries 
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients 
• Python - Pure Python implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• C - High performance C library with full protocol support 
• C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. 
• Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy 
compression supported. Ruby 1.9.3 and up (CI runs MRI 2. 
• Clojure - Clojure DSL for the Kafka API 
• JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation 
• stdin & stdout 
Wire Protocol Developers Guide 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Really Quick Start (Scala) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/scala-kafka 
4) cd scala-kafka 
5) vagrant up 
Zookeeper will be running on 192.168.86.5 
BrokerOne will be running on 192.168.86.10 
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 
6) ./gradlew test
Really Quick Start (Go) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/go-kafka 
4) cd go-kafka 
5) vagrant up 
6) vagrant ssh brokerOne 
7) cd /vagrant 
8) sudo ./test.sh
Questions? 
/******************************************* 
Joe Stein 
Founder, Principal Consultant 
Big Data Open Source Security LLC 
http://www.stealth.ly 
Twitter: @allthingshadoop 
********************************************/

More Related Content

PPTX
Apache Kafka
PDF
An Introduction to Apache Kafka
PPTX
Current and Future of Apache Kafka
PDF
Introduction to Apache Kafka
PPTX
Kafka blr-meetup-presentation - Kafka internals
ODP
Introduction to Apache Kafka- Part 1
PDF
Fundamentals of Apache Kafka
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Apache Kafka
An Introduction to Apache Kafka
Current and Future of Apache Kafka
Introduction to Apache Kafka
Kafka blr-meetup-presentation - Kafka internals
Introduction to Apache Kafka- Part 1
Fundamentals of Apache Kafka
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013

What's hot (20)

PDF
Apache Kafka - Martin Podval
PPTX
Introduction to Kafka and Zookeeper
PPTX
Apache Kafka at LinkedIn
PPTX
kafka for db as postgres
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
PPTX
Apache Kafka
PPTX
Architecture of a Kafka camus infrastructure
PPTX
I Heart Log: Real-time Data and Apache Kafka
PDF
PDF
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PDF
A la rencontre de Kafka, le log distribué par Florian GARCIA
PPTX
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
PDF
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
PDF
Kafka and Spark Streaming
PDF
Kafka on Pulsar
PDF
Introduction to apache kafka
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
PPTX
Introduction to Apache Kafka
PPTX
Developing with the Go client for Apache Kafka
Apache Kafka - Martin Podval
Introduction to Kafka and Zookeeper
Apache Kafka at LinkedIn
kafka for db as postgres
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Apache Kafka
Architecture of a Kafka camus infrastructure
I Heart Log: Real-time Data and Apache Kafka
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Lessons from managing a Pulsar cluster (Nutanix)
A la rencontre de Kafka, le log distribué par Florian GARCIA
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Kafka and Spark Streaming
Kafka on Pulsar
Introduction to apache kafka
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Introduction to Apache Kafka
Developing with the Go client for Apache Kafka
Ad

Viewers also liked (20)

PPTX
Kafka & Hadoop - for NYC Kafka Meetup
PPTX
Data Pipeline at Tapad
PPTX
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
PDF
Introduction to Kafka Streams
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PDF
Developing Real-Time Data Pipelines with Apache Kafka
PPTX
Fast Data Driving Personalization - Nick Gorski
PDF
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
PDF
Data Pipeline with Kafka
PPTX
jstein.cassandra.nyc.2011
PPTX
Storing Time Series Metrics With Cassandra and Composite Columns
PPTX
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
PDF
Developing Realtime Data Pipelines With Apache Kafka
PPTX
Containerized Data Persistence on Mesos
PPTX
Apache Cassandra 2.0
PPTX
Introduction to Kafka
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
PPTX
Making Apache Kafka Elastic with Apache Mesos
PPTX
Developing Frameworks for Apache Mesos
PPTX
Design Patterns for working with Fast Data in Kafka
Kafka & Hadoop - for NYC Kafka Meetup
Data Pipeline at Tapad
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introduction to Kafka Streams
Real time Analytics with Apache Kafka and Apache Spark
Developing Real-Time Data Pipelines with Apache Kafka
Fast Data Driving Personalization - Nick Gorski
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Data Pipeline with Kafka
jstein.cassandra.nyc.2011
Storing Time Series Metrics With Cassandra and Composite Columns
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Developing Realtime Data Pipelines With Apache Kafka
Containerized Data Persistence on Mesos
Apache Cassandra 2.0
Introduction to Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Making Apache Kafka Elastic with Apache Mesos
Developing Frameworks for Apache Mesos
Design Patterns for working with Fast Data in Kafka
Ad

Similar to Introduction Apache Kafka (20)

PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
PPTX
Real-time streaming and data pipelines with Apache Kafka
PDF
Apache Kafka - Scalable Message-Processing and more !
DOCX
Apache kafka configuration-guide
PDF
Apache Kafka - Scalable Message Processing and more!
PDF
Kafka Workshop
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Apache Kafka Introduction
PPTX
Apache Kafka 0.8 basic training - Verisign
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
PDF
Virtual Bash! A Lunchtime Introduction to Kafka
PDF
Connect K of SMACK:pykafka, kafka-python or?
PDF
Apache kafka-a distributed streaming platform
PDF
Apache Kafka - A Distributed Streaming Platform
PPTX
Building an Event Bus at Scale
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
Apache KAfka
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Real-time streaming and data pipelines with Apache Kafka
Apache Kafka - Scalable Message-Processing and more !
Apache kafka configuration-guide
Apache Kafka - Scalable Message Processing and more!
Kafka Workshop
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka Introduction
Apache Kafka 0.8 basic training - Verisign
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Virtual Bash! A Lunchtime Introduction to Kafka
Connect K of SMACK:pykafka, kafka-python or?
Apache kafka-a distributed streaming platform
Apache Kafka - A Distributed Streaming Platform
Building an Event Bus at Scale
Apache Kafka - Scalable Message-Processing and more !
Apache KAfka

More from Joe Stein (9)

PDF
Streaming Processing with a Distributed Commit Log
PDF
SMACK Stack 1.1
PDF
Get started with Developing Frameworks in Go on Apache Mesos
PPTX
Introduction To Apache Mesos
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
PPTX
Building and Deploying Application to Apache Mesos
PPTX
Apache Kafka, HDFS, Accumulo and more on Mesos
PPTX
Introduction to Apache Mesos
PPTX
Hadoop Streaming Tutorial With Python
Streaming Processing with a Distributed Commit Log
SMACK Stack 1.1
Get started with Developing Frameworks in Go on Apache Mesos
Introduction To Apache Mesos
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Building and Deploying Application to Apache Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
Introduction to Apache Mesos
Hadoop Streaming Tutorial With Python

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology

Introduction Apache Kafka

  • 1. Apache Kafka Introduction http://kafka.apache.org/
  • 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 6. A high-throughput distributed messaging system rethought as a distributed commit log.
  • 7. More! • Producers - ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • 8. Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  • 9. Really Quick Start (Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  • 10. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 11. Questions? /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/