SlideShare a Scribd company logo
Introduction to Apache Kafka
And Real-Time ETL
for DBAs and others who are interested in new ways
of working with relational databases 1
About Myself
• Gwen Shapira – SystemArchitect @Confluent
• Committer @Apache Kafka,Apache Sqoop
• Author of “HadoopApplicationArchitectures”, “Kafka – The Definitive Guide”
• Previously:
• Software Engineer @ Cloudera
• OracleACE Director
• Senior Consultants @ Pythian
• DBA@ Mercury Interactive
• Find me:
• gwen@confluent.io
• @gwenshap
There’s a Book on That!
Apache Kafka is
publish-subscribe messaging
rethought as a
distributed commit log.
turned into
a stream data platform
An Optical Illusion
• Write-ahead Logs
• So What is Kafka?
• Awesome use-case for Kafka
• Data streams and real-time ETL
• Where can you learn more
We’ll talk about:
Write-Ahead Logging (WAL)
a standard method for ensuring data
integrity… changes to data files … must
be written only after those changes
have been logged… in the event of a
crash we will be able to recover the
database using the log.
Important Point
The write-ahead log is the only reliable
source of information about current state of
the database.
WAL is used for
• Recover consistent state of a database
• Replicate the database (Streaming Replication, Hot Standby)
If you look far enough into archived logs – you can reconstruct the
entire database.
kafka for db as postgres
That’s nice, but what is Kafka?
Kafka provides a fast, distributed, highly scalable,
highly available, publish-subscribe messaging system.
Based on the tried and true log structure.
In turn this solves part of a much harder problem:
Communication and integration between
components of large software systems
The Basics
•Messages are organized into topics
•Producers push messages
•Consumers pull messages
•Kafka runs in a cluster. Nodes are called
brokers
Topics, Partitions and Logs
Each partition is a log
Each Broker has many partitions
Partition 0 Partition 0
Partition 1 Partition 1
Partition 2
Partition 1
Partition 0
Partition 2 Partion 2
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
Consumers
Consumer Group Y
Consumer Group X
Consumer
Kafka Cluster
Topic
Partition A (File)
Partition B (File)
Partition C (File)
Consumer
Consumer
Consumer
Order retained with in
partition
Order retained with in
partition but not over
partitionsOffSetX
OffSetX
OffSetX
OffSetYOffSetYOffSetY
Off sets are kept per
consumer group
Kafka “Magic” – Why is it so fast?
• 250M Events per sec on one node at 3ms latency
• Scales to any number of consumers
• Stores data for set amount of time –
Without tracking who read what data
• Replicates – but no need to sync to disk
• Zero-copy writes from memory / disk to network
How do people use Kafka?
• As a message bus
• As a buffer for replication systems
• As reliable feed for event processing
• As a buffer for event processing
• Decouple apps from databases
But really,
how do they use Kafka?
21
Raise your hand if this sounds familiar
“My next project was to get a working Hadoop setup…
Having little experience in this area, we naturally
budgeted a few weeks for getting data in and out, and
the rest of our time for implementing fancy algorithms.
“
--Jay Kreps, Kafka PMC
23
Client Source
Data Pipelines Start like this.
24
Client Source
Client
Client
Client
Then we reuse them
25
Client Backend
Client
Client
Client
Then we add consumers to the
existing sources
Another
Backend
26
Client Backend
Client
Client
Client
Then it starts to look like this
Another
Backend
Another
Backend
Another
Backend
27
Client Backend
Client
Client
Client
With maybe some of this
Another
Backend
Another
Backend
Another
Backend
Queues decouple systems:
Adding new systems doesn’t require changing
Existing systems
This is where we are trying to get
29
Source System Source System Source System Source System
Kafka decouples Data Pipelines
Hadoop Security Systems
Real-time
monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
Kafka decouples Data Pipelines
Important notes:
• Producers and Consumers dont need to know about each other
• Performance issues on Consumers dont impact Producers
• Consumers are protected from herds of Producers
• Lots of flexibility in handling load
• Messages are available for anyone –
lots of new use cases, monitoring, audit, troubleshooting
http://www.slideshare.net/gwenshap/queues-pools-caches
My Favorite Use Cases
• Shops consume inventory updates
• Clicking around an online shop? Your clicks go to Kafka and
recommendations come back.
• Flagging credit card transactions as fraudulent
• Flagging game interactions as abuse
• Least favorite: Surge pricing in Uber
• Huge list of users at kafka.apache.org
31
Got it!
But what about real-time ETL?
Remember This?
33
Source System Source System Source System Source System
Kafka decouples Data Pipelines
Hadoop Security Systems
Real-time
monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
Kafka is smack in middle of all Data Pipelines
If data flies into Kafka in real time
Why wait 24h before pulling it into a DWH?
34
35
Why Kafka makes real-time ETL better?
• Can integrate with any data source
• RDBMS, NoSQL, Applications, web applications, logs
• Consumers can be real-time
But they do not have to
• Reading and writing to/from Kafka is cheap
• So this is a great place to store intermediate state
• You can fix mistakes by rereading some of the data again
• Same data in same order
• Adding more pipelines / aggregations has no impact on source systems =
low risk
36
It is all valuable data
Raw data
Raw data Clean data
Aggregated dataClean data Enriched data
Filtered data
Dash
board
Report
Data
scientist
Alerts
OMG
• Producers
• Log4J
• Rest Proxy
• BottledWater
• KafkaConnect and its connectors ecosystem
• Other ecosystem
OK, but how does my data get into Kafka
kafka for db as postgres
kafka for db as postgres
• However you want:
• You just consume data, modify it, and produce it back
• Built into Kafka:
• Kprocessor
• Kstream
• Popular choices:
• Storm
• SparkStreaming
But wait, how do we process the data?
One more thing...
Schema is a MUST HAVE for
data integration
Need More Kafka?
• https://kafka.apache.org/documentation.html
• My video tutorial:
http://shop.oreilly.com/product/0636920038603.do
• http://www.michael-noll.com/blog/2014/08/18/apache-kafka-
training-deck-and-tutorial/
• Our website:
http://confluent.io
• Oracle guide to real-time ETL:
http://www.oracle.com/technetwork/middleware/data-
integrator/overview/best-practices-for-realtime-data-wa-132882.pdf

More Related Content

What's hot (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 

Similar to kafka for db as postgres (20)

Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
Riby Varghese
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
Dileep Kalidindi
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Introduction_to_Kafka - A brief Overview.pdf
Introduction_to_Kafka - A brief Overview.pdfIntroduction_to_Kafka - A brief Overview.pdf
Introduction_to_Kafka - A brief Overview.pdf
ssuserc49ec4
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
Worapol Alex Pongpech, PhD
 
Beyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystemBeyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Apache KAfka
Apache KAfkaApache KAfka
Apache KAfka
Pedro Alcantara
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Kafkha real time analytics platform.pptx
Kafkha real time analytics platform.pptxKafkha real time analytics platform.pptx
Kafkha real time analytics platform.pptx
dummyuseage1
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
Riby Varghese
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
Dileep Kalidindi
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Introduction_to_Kafka - A brief Overview.pdf
Introduction_to_Kafka - A brief Overview.pdfIntroduction_to_Kafka - A brief Overview.pdf
Introduction_to_Kafka - A brief Overview.pdf
ssuserc49ec4
 
Beyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystemBeyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Kafkha real time analytics platform.pptx
Kafkha real time analytics platform.pptxKafkha real time analytics platform.pptx
Kafkha real time analytics platform.pptx
dummyuseage1
 
Ad

More from PivotalOpenSourceHub (20)

Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
PivotalOpenSourceHub
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache Geode
PivotalOpenSourceHub
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
PivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
PivotalOpenSourceHub
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
PivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems
PivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PivotalOpenSourceHub
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
PivotalOpenSourceHub
 
Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
PivotalOpenSourceHub
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache Geode
PivotalOpenSourceHub
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
PivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
PivotalOpenSourceHub
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
PivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
PivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PivotalOpenSourceHub
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems
PivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PivotalOpenSourceHub
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
PivotalOpenSourceHub
 
Ad

Recently uploaded (20)

apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
Advanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdfAdvanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdf
leogoemmanguyenthao
 
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdfMEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
OlhaTatokhina1
 
METHODS OF DATA COLLECTION (Research methodology)
METHODS OF DATA COLLECTION (Research methodology)METHODS OF DATA COLLECTION (Research methodology)
METHODS OF DATA COLLECTION (Research methodology)
anwesha248
 
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays
 
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays
 
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
unit- 5 Biostatistics and Research Methodology.pdf
unit- 5 Biostatistics and Research Methodology.pdfunit- 5 Biostatistics and Research Methodology.pdf
unit- 5 Biostatistics and Research Methodology.pdf
KRUTIKA CHANNE
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
Retort Instrumentation laboratory practi
Retort Instrumentation laboratory practiRetort Instrumentation laboratory practi
Retort Instrumentation laboratory practi
ADINDADYAHMUKHLASIN
 
1-2. Lab Introduction to Linux environment.ppt
1-2. Lab Introduction to Linux environment.ppt1-2. Lab Introduction to Linux environment.ppt
1-2. Lab Introduction to Linux environment.ppt
Wahajch
 
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
Eddie Lee
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays New York 2025 - The Future of Small Business Lending with Open Bankin...
apidays
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
Advanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdfAdvanced_English_Pronunciation_in_Use.pdf
Advanced_English_Pronunciation_in_Use.pdf
leogoemmanguyenthao
 
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdfMEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
MEDIA_LITERACY_INDEX_OF_EDUCATORS_ENG.pdf
OlhaTatokhina1
 
METHODS OF DATA COLLECTION (Research methodology)
METHODS OF DATA COLLECTION (Research methodology)METHODS OF DATA COLLECTION (Research methodology)
METHODS OF DATA COLLECTION (Research methodology)
anwesha248
 
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays
 
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
Mining Presentation Online Courses for Student
Mining Presentation Online Courses for StudentMining Presentation Online Courses for Student
Mining Presentation Online Courses for Student
Rizki229625
 
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays New York 2025 - CIAM in the wild by Michael Gruen (Layr)
apidays
 
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays New York 2025 - Building Scalable AI Systems by Sai Prasad Veluru (Ap...
apidays
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
unit- 5 Biostatistics and Research Methodology.pdf
unit- 5 Biostatistics and Research Methodology.pdfunit- 5 Biostatistics and Research Methodology.pdf
unit- 5 Biostatistics and Research Methodology.pdf
KRUTIKA CHANNE
 
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays New York 2025 - Two tales of API Change Management by Eric Koleda (Coda)
apidays
 
Retort Instrumentation laboratory practi
Retort Instrumentation laboratory practiRetort Instrumentation laboratory practi
Retort Instrumentation laboratory practi
ADINDADYAHMUKHLASIN
 
1-2. Lab Introduction to Linux environment.ppt
1-2. Lab Introduction to Linux environment.ppt1-2. Lab Introduction to Linux environment.ppt
1-2. Lab Introduction to Linux environment.ppt
Wahajch
 
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
Eddie Lee
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 

kafka for db as postgres

  • 1. Introduction to Apache Kafka And Real-Time ETL for DBAs and others who are interested in new ways of working with relational databases 1
  • 2. About Myself • Gwen Shapira – SystemArchitect @Confluent • Committer @Apache Kafka,Apache Sqoop • Author of “HadoopApplicationArchitectures”, “Kafka – The Definitive Guide” • Previously: • Software Engineer @ Cloudera • OracleACE Director • Senior Consultants @ Pythian • DBA@ Mercury Interactive • Find me: • [email protected] • @gwenshap
  • 3. There’s a Book on That!
  • 4. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. turned into a stream data platform An Optical Illusion
  • 5. • Write-ahead Logs • So What is Kafka? • Awesome use-case for Kafka • Data streams and real-time ETL • Where can you learn more We’ll talk about:
  • 6. Write-Ahead Logging (WAL) a standard method for ensuring data integrity… changes to data files … must be written only after those changes have been logged… in the event of a crash we will be able to recover the database using the log.
  • 7. Important Point The write-ahead log is the only reliable source of information about current state of the database.
  • 8. WAL is used for • Recover consistent state of a database • Replicate the database (Streaming Replication, Hot Standby) If you look far enough into archived logs – you can reconstruct the entire database.
  • 10. That’s nice, but what is Kafka?
  • 11. Kafka provides a fast, distributed, highly scalable, highly available, publish-subscribe messaging system. Based on the tried and true log structure. In turn this solves part of a much harder problem: Communication and integration between components of large software systems
  • 12. The Basics •Messages are organized into topics •Producers push messages •Consumers pull messages •Kafka runs in a cluster. Nodes are called brokers
  • 15. Each Broker has many partitions Partition 0 Partition 0 Partition 1 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partion 2
  • 16. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 17. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 18. Consumers Consumer Group Y Consumer Group X Consumer Kafka Cluster Topic Partition A (File) Partition B (File) Partition C (File) Consumer Consumer Consumer Order retained with in partition Order retained with in partition but not over partitionsOffSetX OffSetX OffSetX OffSetYOffSetYOffSetY Off sets are kept per consumer group
  • 19. Kafka “Magic” – Why is it so fast? • 250M Events per sec on one node at 3ms latency • Scales to any number of consumers • Stores data for set amount of time – Without tracking who read what data • Replicates – but no need to sync to disk • Zero-copy writes from memory / disk to network
  • 20. How do people use Kafka? • As a message bus • As a buffer for replication systems • As reliable feed for event processing • As a buffer for event processing • Decouple apps from databases
  • 21. But really, how do they use Kafka? 21
  • 22. Raise your hand if this sounds familiar “My next project was to get a working Hadoop setup… Having little experience in this area, we naturally budgeted a few weeks for getting data in and out, and the rest of our time for implementing fancy algorithms. “ --Jay Kreps, Kafka PMC
  • 25. 25 Client Backend Client Client Client Then we add consumers to the existing sources Another Backend
  • 26. 26 Client Backend Client Client Client Then it starts to look like this Another Backend Another Backend Another Backend
  • 27. 27 Client Backend Client Client Client With maybe some of this Another Backend Another Backend Another Backend
  • 28. Queues decouple systems: Adding new systems doesn’t require changing Existing systems
  • 29. This is where we are trying to get 29 Source System Source System Source System Source System Kafka decouples Data Pipelines Hadoop Security Systems Real-time monitoring Data Warehouse Kafka Producers Brokers Consumers Kafka decouples Data Pipelines
  • 30. Important notes: • Producers and Consumers dont need to know about each other • Performance issues on Consumers dont impact Producers • Consumers are protected from herds of Producers • Lots of flexibility in handling load • Messages are available for anyone – lots of new use cases, monitoring, audit, troubleshooting http://www.slideshare.net/gwenshap/queues-pools-caches
  • 31. My Favorite Use Cases • Shops consume inventory updates • Clicking around an online shop? Your clicks go to Kafka and recommendations come back. • Flagging credit card transactions as fraudulent • Flagging game interactions as abuse • Least favorite: Surge pricing in Uber • Huge list of users at kafka.apache.org 31
  • 32. Got it! But what about real-time ETL?
  • 33. Remember This? 33 Source System Source System Source System Source System Kafka decouples Data Pipelines Hadoop Security Systems Real-time monitoring Data Warehouse Kafka Producers Brokers Consumers Kafka is smack in middle of all Data Pipelines
  • 34. If data flies into Kafka in real time Why wait 24h before pulling it into a DWH? 34
  • 35. 35
  • 36. Why Kafka makes real-time ETL better? • Can integrate with any data source • RDBMS, NoSQL, Applications, web applications, logs • Consumers can be real-time But they do not have to • Reading and writing to/from Kafka is cheap • So this is a great place to store intermediate state • You can fix mistakes by rereading some of the data again • Same data in same order • Adding more pipelines / aggregations has no impact on source systems = low risk 36
  • 37. It is all valuable data Raw data Raw data Clean data Aggregated dataClean data Enriched data Filtered data Dash board Report Data scientist Alerts OMG
  • 38. • Producers • Log4J • Rest Proxy • BottledWater • KafkaConnect and its connectors ecosystem • Other ecosystem OK, but how does my data get into Kafka
  • 41. • However you want: • You just consume data, modify it, and produce it back • Built into Kafka: • Kprocessor • Kstream • Popular choices: • Storm • SparkStreaming But wait, how do we process the data?
  • 43. Schema is a MUST HAVE for data integration
  • 44. Need More Kafka? • https://kafka.apache.org/documentation.html • My video tutorial: http://shop.oreilly.com/product/0636920038603.do • http://www.michael-noll.com/blog/2014/08/18/apache-kafka- training-deck-and-tutorial/ • Our website: http://confluent.io • Oracle guide to real-time ETL: http://www.oracle.com/technetwork/middleware/data- integrator/overview/best-practices-for-realtime-data-wa-132882.pdf

Editor's Notes

  • #14: Topics are partitioned, each partition ordered and immutable. Messages in a partition have an ID, called Offset. Offset uniquely identifies a message within a partition
  • #15: Kafka retains all messages for fixed amount of time. Not waiting for acks from consumers. The only metadata retained per consumer is the position in the log – the offset So adding many consumers is cheap On the other hand, consumers have more responsibility and are more challenging to implement correctly And “batching” consumers is not a problem
  • #16: 3 partitions, each replicated 3 times.
  • #17: The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  • #18: The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  • #19: can read from one or more partition leader. You can’t have two consumers in same group reading the same partition. Leaders obviously do more work – but they are balanced between nodes We reviewed the basic components on the system, and it may seem complex. In the next section we’ll see how simple it actually is to get started with Kafka.
  • #25: Then we end up adding clients to use that source.
  • #26: But as we start to deploy our applications we realizet hat clients need data from a number of sources. So we add them as needed.
  • #28: But over time, particularly if we are segmenting services by function, we have stuff all over the place, and the dependencies are a nightmare. This makes for a fragile system.
  • #30: Kafka is a pub/sub messaging system that can decouple your data pipelines. Most of you are probably familiar with it’s history at LinkedIn and they use it as a high throughput relatively low latency commit log. It allows sources to push data without worrying about what clients are reading it. Note that producer push, and consumers pull. Kafka itself is a cluster of brokers, which handles both persisting data to disk and serving that data to consumer requests.
  • #34: Kafka is a pub/sub messaging system that can decouple your data pipelines. Most of you are probably familiar with it’s history at LinkedIn and they use it as a high throughput relatively low latency commit log. It allows sources to push data without worrying about what clients are reading it. Note that producer push, and consumers pull. Kafka itself is a cluster of brokers, which handles both persisting data to disk and serving that data to consumer requests.
  • #40: Logical decoding output client API
  • #44: Sorry, but “Schema on Read” is kind of B.S. We admit that there is a schema, but we want to “ingest fast”, so we shift the burden to the readers. But the data is written once and read many many times by many different people. They each need to figure this out on their own? This makes no sense. Also, how are you going to validate the data without a schema?
  • #45: https://github.com/schema-repo/schema-repo There’s no data dictionary for Kafka
  • #53: There are many options for handling excessing user requests. The only thing that is not an option – throw everything at the database and let the DB queue the excessive load