SlideShare a Scribd company logo
Real-Time Distributed and Reactive Systems
with Apache Kafka and Apache Accumulo
Joe Stein
• Developer, Architect & Technologist
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly
Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage,
transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed
systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure
Components to use but also how to change their existing (or build new) systems to work with them.
• CEO => Elodina, Inc.
Expanding BDOSS from just consulting, Elodina is an ISV & SaaS provider of stream solutions & open source software.
Elodina helps make data streams actionable.
• Apache Kafka Committer & PMC member
• Blog & Podcast - http://allthingshadoop.com
• Twitter @allthingshadoop
Overview
● Real-time distributed reactive systems
● Quick Intro to Apache Kafka
● Quick Intro to Apache Mesos
● Kafka on Mesos
● Accumulo & HDFS on Mesos
● Real-time distributed reactive systems
● Bringing it all together with Accumulo
Real-Time Distributed and Reactive Systems
A distributed system for asynchronous stream processing with
non-blocking back pressure where complex event processing
systems can influence the response without coupling the
business logic of processing. The response can be calculated
by parallel operations with concurrent orthogonal processing
engines computing their influence towards the final result.
Real-Time Distributed and Reactive Systems
http://kafka.apache.org
Apache Kafka
• Apache Kafka
o http://kafka.apache.org
• Apache Kafka Source Code
o https://github.com/apache/kafka
• Documentation
o http://kafka.apache.org/documentation.html
• Wiki
o https://cwiki.apache.org/confluence/display/KAFKA/Index
Producers, Consumers, Brokers
• Producers - ** push **
o Batching
o Compression
o Sync (Ack), Async (auto batch)
o Replication
o Sequential writes, guaranteed ordering within each partition
• Consumers - ** pull **
o No state held by broker
o Consumers control reading from the stream
• Zero Copy for producers and consumers to and from the broker
http://kafka.apache.org/documentation.html#maximizingefficiency
• Message stay on disk when consumed, deletes on TTL or compaction
https://kafka.apache.org/documentation.html#compaction
Kafka decouples data-pipelines
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Client Libraries
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients
• Python - Pure Python implementation with full protocol support. Consumer and Producer
implementations included, GZIP and Snappy compression supported.
• C - High performance C library with full protocol support
• C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset.
• Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer
implementations included, GZIP and Snappy compression supported.
• Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy
compression supported. Ruby 1.9.3 and up (CI runs MRI 2.
• Clojure - Clojure DSL for the Kafka API
• JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
• stdin & stdout
Wire Protocol Developers Guide
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Really Quick Start (Scala)
1) Install Vagrant http://www.vagrantup.com/
2) Install Virtual Box https://www.virtualbox.org/
3) git clone https://github.com/stealthly/scala-kafka
4) cd scala-kafka
5) vagrant up
Zookeeper will be running on 192.168.86.5
BrokerOne will be running on 192.168.86.10
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm)
6) ./gradlew test
Really Quick Start (Go)
1) Install Vagrant http://www.vagrantup.com/
2) Install Virtual Box https://www.virtualbox.org/
3) git clone https://github.com/stealthly/go-kafka
4) cd go-kafka
5) vagrant up
6) vagrant ssh brokerOne
7) cd /vagrant
8) sudo ./test.sh
Apache Mesos
http://mesos.apache.org
Origins
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf
Google Borg - https://research.google.com/pubs/pub43438.html
Google Omega: flexible, scalable schedulers for large compute clusters
http://eurosys2013.tudos.org/wp-
content/uploads/2013/paper/Schwarzkopf.pdf
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Static Partition == Idle Resources
Operating System === Datacenter
Mesos => data center “kernel”
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Apache Mesos
● Scalability to 10,000s of nodes
● Fault-tolerant replicated master and slaves using ZooKeeper
● Support for Docker containers
● Native isolation between tasks with Linux Containers
● Multi-resource scheduling (memory, CPU, disk, and ports)
● Java, Python and C++ APIs for developing new parallel applications
● Web UI for viewing cluster state
Sample Frameworks
C++ - https://github.com/apache/mesos/tree/master/src/examples
Java - https://github.com/apache/mesos/tree/master/src/examples/java
Python - https://github.com/apache/mesos/tree/master/src/examples/python
Scala - https://github.com/mesosphere/scala-sbt-mesos-framework.g8
Go - https://github.com/mesosphere/mesos-go
Kafka on Mesos
● The Mesos Kafka framework https://github.com/mesos/kafka
○ Smart broker.id assignment.
○ Preservation of broker placement.
○ Ability to-do configuration changes.
○ Rolling restarts.
○ Auto-scaling the cluster up and down.
Accumulo on Mesos
No framework yet, but you can use Marathon, no problem!
Marathon https://github.com/mesosphere/marathon is a cluster-
wide init and control system for services in cgroups or docker
based on Apache Mesos
HDFS on Mesos https://github.com/mesosphere/hdfs (more on
this in a bit)
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems
Real-Time Distributed and Reactive Systems
Where does Accumulo fit in?
● Iterators
○ Accumulo iterators are a real time processing framework with
“reduce like” functionality
● Multi HDFS Volume Support
○ Spin up HDFS clusters when they are needed
● Streaming Large Blobs
○ Post files in producers, process and respond to scans
● More!
Real-Time Distributed and Reactive Systems
Questions?
/*******************************************
Joe Stein
CEO, Elodina, Inc
http://www.stealth.ly
Twitter: @allthingshadoop
********************************************/

More Related Content

What's hot (20)

Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
dave_revell
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
Discover Pinterest
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
kafka
kafkakafka
kafka
Ariel Moskovich
 
Hadoop on-mesos
Hadoop on-mesosHadoop on-mesos
Hadoop on-mesos
Henry Cai 蔡明航
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
DataStax Academy
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Kafka
KafkaKafka
Kafka
shrenikp
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
dave_revell
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Deploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and MarathonDeploying Docker Containers at Scale with Mesos and Marathon
Deploying Docker Containers at Scale with Mesos and Marathon
Discover Pinterest
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
DataStax Academy
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 

Viewers also liked (20)

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction
npinto
 
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
MongoDB
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event Systems
Leonardo Borges
 
Authorization - it's not just about who you are
Authorization - it's not just about who you areAuthorization - it's not just about who you are
Authorization - it's not just about who you are
David Brossard
 
CORBA
CORBACORBA
CORBA
Object-Frontier Software Pvt. Ltd
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
MongoDB
 
The rise of microservices - containers and orchestration
The rise of microservices - containers and orchestrationThe rise of microservices - containers and orchestration
The rise of microservices - containers and orchestration
Andrew Morgan
 
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
David Brossard
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
Joe Stein
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
Joe Stein
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
Max Alexejev
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
Joe Stein
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction[Harvard CS264] 01 - Introduction
[Harvard CS264] 01 - Introduction
npinto
 
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
Business Track: Building a Personalized Mobile App Experience Using MongoDB a...
MongoDB
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event Systems
Leonardo Borges
 
Authorization - it's not just about who you are
Authorization - it's not just about who you areAuthorization - it's not just about who you are
Authorization - it's not just about who you are
David Brossard
 
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...
MongoDB
 
The rise of microservices - containers and orchestration
The rise of microservices - containers and orchestrationThe rise of microservices - containers and orchestration
The rise of microservices - containers and orchestration
Andrew Morgan
 
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
XACML for Developers - Updates, New Tools, & Patterns for the Eager #IAM Deve...
David Brossard
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
Joe Stein
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
Joe Stein
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
Max Alexejev
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
Joe Stein
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Ad

Similar to Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
C4Media
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Rahul Krishna Upadhyaya
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
Fabio Fumarola
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
hdhappy001
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
C4Media
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Rahul Krishna Upadhyaya
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
Fabio Fumarola
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
hdhappy001
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Ad

More from Joe Stein (8)

SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
Joe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Joe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
Joe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
Joe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
Joe Stein
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Joe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
Joe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Joe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
Joe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
Joe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
Joe Stein
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Joe Stein
 

Recently uploaded (20)

How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdfvertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and ImplementationAI Agents in Logistics and Supply Chain Applications Benefits and Implementation
AI Agents in Logistics and Supply Chain Applications Benefits and Implementation
Christine Shepherd
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdfvertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Dancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptxDancing with AI - A Developer's Journey.pptx
Dancing with AI - A Developer's Journey.pptx
Elliott Richmond
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI FoundationsOracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
DevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical PodcastDevOps in the Modern Era - Thoughtfully Critical Podcast
DevOps in the Modern Era - Thoughtfully Critical Podcast
Chris Wahl
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI ProfessionalOracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo

  • 1. Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
  • 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • CEO => Elodina, Inc. Expanding BDOSS from just consulting, Elodina is an ISV & SaaS provider of stream solutions & open source software. Elodina helps make data streams actionable. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3. Overview ● Real-time distributed reactive systems ● Quick Intro to Apache Kafka ● Quick Intro to Apache Mesos ● Kafka on Mesos ● Accumulo & HDFS on Mesos ● Real-time distributed reactive systems ● Bringing it all together with Accumulo
  • 4. Real-Time Distributed and Reactive Systems A distributed system for asynchronous stream processing with non-blocking back pressure where complex event processing systems can influence the response without coupling the business logic of processing. The response can be calculated by parallel operations with concurrent orthogonal processing engines computing their influence towards the final result.
  • 5. Real-Time Distributed and Reactive Systems
  • 7. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 8. Producers, Consumers, Brokers • Producers - ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • 11. Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  • 12. Really Quick Start (Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  • 13. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 15. Origins Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf Google Borg - https://research.google.com/pubs/pub43438.html Google Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp- content/uploads/2013/paper/Schwarzkopf.pdf
  • 17. Static Partition == Idle Resources
  • 18. Operating System === Datacenter
  • 19. Mesos => data center “kernel”
  • 21. Apache Mesos ● Scalability to 10,000s of nodes ● Fault-tolerant replicated master and slaves using ZooKeeper ● Support for Docker containers ● Native isolation between tasks with Linux Containers ● Multi-resource scheduling (memory, CPU, disk, and ports) ● Java, Python and C++ APIs for developing new parallel applications ● Web UI for viewing cluster state
  • 22. Sample Frameworks C++ - https://github.com/apache/mesos/tree/master/src/examples Java - https://github.com/apache/mesos/tree/master/src/examples/java Python - https://github.com/apache/mesos/tree/master/src/examples/python Scala - https://github.com/mesosphere/scala-sbt-mesos-framework.g8 Go - https://github.com/mesosphere/mesos-go
  • 23. Kafka on Mesos ● The Mesos Kafka framework https://github.com/mesos/kafka ○ Smart broker.id assignment. ○ Preservation of broker placement. ○ Ability to-do configuration changes. ○ Rolling restarts. ○ Auto-scaling the cluster up and down.
  • 24. Accumulo on Mesos No framework yet, but you can use Marathon, no problem! Marathon https://github.com/mesosphere/marathon is a cluster- wide init and control system for services in cgroups or docker based on Apache Mesos HDFS on Mesos https://github.com/mesosphere/hdfs (more on this in a bit)
  • 26. Real-Time Distributed and Reactive Systems
  • 27. Real-Time Distributed and Reactive Systems
  • 28. Where does Accumulo fit in? ● Iterators ○ Accumulo iterators are a real time processing framework with “reduce like” functionality ● Multi HDFS Volume Support ○ Spin up HDFS clusters when they are needed ● Streaming Large Blobs ○ Post files in producers, process and respond to scans ● More!
  • 29. Real-Time Distributed and Reactive Systems
  • 30. Questions? /******************************************* Joe Stein CEO, Elodina, Inc http://www.stealth.ly Twitter: @allthingshadoop ********************************************/