SlideShare a Scribd company logo
@tgrall#Devoxx #sparkstreaming
Build a Time Series Application
with Spark and HBase
Tugdual Grall
@tgrall
MapR
Carol McDonald
@caroljmcdonald
MapR
@tgrall#Devoxx #sparkstreaming
Agenda
• Time Series
• Apache Spark & Spark Streaming
• Apache HBase
• Lab
@tgrall#Devoxx #sparkstreaming
About the Lab
• Use Spark & HBase in MapR Cluster
• Option 1: Use a SandBox (Virtual Box VM located on USB
Key)
• Option 2: Use Cloud Instance (SSH/SCP only)
• Content:
• Option 1: spark-streaming-hbase-workshop.zip on USB
• Option 2: download zip from
https://github.com/tgrall/spark-streaming-hbase-workshop
@tgrall#Devoxx #sparkstreaming
Time Series
@tgrall#Devoxx #sparkstreaming
What is a Time Series?
• Stuff with timestamps
• sensor measurements
• system stats
• log files
• ….
@tgrall#Devoxx #sparkstreaming
Got Some Examples?
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What do we need to do?
• Acquire
• Measurement, transmission, reception
• Store
• Individually, or grouped for some amount of time
• Retrieve
• Ad hoc, flexible, correlate and aggregate
• Analyze and visualize
• We facilitate this via retrieval
@tgrall#Devoxx #sparkstreaming
Acquisition
Not usually our problem
• Sensors
• Data collection – agents, raspberry pi
• Transmission – via LAN/Wan, Mobile Network, Satellites
• Receipt into system – listening daemon or queue, or
depending on use case writing directly to the database
@tgrall#Devoxx #sparkstreaming
Storage Choice
• Flat files
• Great for rapid ingest with massive data
• Handles essentially any data type
• Less good for data requiring frequent updates
• Harder to find specific ranges
• Traditional RDBMS
• Ingests up to ~10,000/ sec; prefers well structured (numerical) data;
expensive
• NoSQL (such as MapR-DB or HBase)
• Easily handle 10,000 rows / sec / node – True linear scaling
• Handles wide variety of data
• Good for frequent updates
• Easily scanned in a range
@tgrall#Devoxx #sparkstreaming
Specific Example
Consider oil drilling rigs
• When drilling wells, there are *lots* of moving parts
• Typically a drilling rig makes about 10K samples/s
• Temperatures, pressures, magnetics, machine vibration
levels, salinity, voltage, currents, many others
• Typical project has 100 rigs
@tgrall#Devoxx #sparkstreaming
General Outline
10K samples / second / rig
x 100 rigs
= 1M samples / second
• But wait, there’s more
• Suppose you want to test your system
• Perhaps with a year of data
• And you want to load that data in << 1 year
• 100x real-time = 100M samples / second
@tgrall#Devoxx #sparkstreaming
Data Storage
• Typical time window is one hour
• Column names are offsets in time window
• Find series-uid in separate table
Key 13 43 73 103 …
…
series-uid.time-window 4.5 5.2 6.1 4.9
…
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
Why do we need NoSQL / HBase?
Relational Model
bottleneck
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Storage ModelRDBMS HBase
Distributed Joins, Transactions do
not scale
Data that is accessed together is
stored together
@tgrall#Devoxx #sparkstreaming
HBase is a ColumnFamily oriented Database
• Data is accessed and stored together:
• RowKey is the primary index
• Column Families group similar data by row key
CF_DATA
colA colB colC
Val val
val
CF_STATS
colA colB colC
val val
val
RowKey
series-abc.time-
window
series-efg.time-
window
Customer id Raw Data Stats
@tgrall#Devoxx #sparkstreaming
HBase is a Distributed Database
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Put, Get by Key
Data is automatically
distributed across the cluster
• Key range is used for horizontal
partitioning
@tgrall#Devoxx #sparkstreaming
Basic Table Operations
• Create Table, define Column Families before data is
imported
• but not the rows keys or number/names of columns
• Low level API, technically more demanding
• Basic data access operations (CRUD):
put Inserts data into rows (both create and update)
get Accesses data from one row
scan Accesses data from a range of rows
delete Delete a row or a range of rows or columns
@tgrall#Devoxx #sparkstreaming
Learn More
• Free Online Training: http://learn.mapr.com
• DEV 320 - Apache HBase Data Model and Architecture
• DEV 325 - Apache HBase Schema Design
• DEV 330 - Developing Apache HBase Applications: Basics
• DEV 335 - Developing Apache HBase Applications: Advanced
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What is Spark?
• Cluster Computing Platform
• Extends “MapReduce” with
extensions
• Streaming
• Interactive Analytics
• Run in Memory
@tgrall#Devoxx #sparkstreaming
What is Spark?
Fast
• 100x faster than M/R
Logistic regression in Hadoop and Spark
@tgrall#Devoxx #sparkstreaming
What is Spark?
Ease of Development
• Write programs quickly
• More Operators
• Interactive Shell
• Less Code
@tgrall#Devoxx #sparkstreaming
What is Spark?
Multi Language Support
• Scala
• Python
• Java
• SparkR
@tgrall#Devoxx #sparkstreaming
What is Spark?
Deployment Flexibility
• Deployment
• Local
• Standalone
• Storage
• HDFS
• MapR-FS
• S3
• Cassandra
• YARN
• Mesos
@tgrall#Devoxx #sparkstreaming
Unified Platform
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
Spark Components
Driver Program
(application)
SparkContext
Cluster Manager
Worker
Executor
Task Task
Worker
Executor
Task Task
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Sensor RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
sc.textFile P1
8213034705, 95,
2.927373,
jake7870, 0……
P2
8213034705,
115, 2.943484,
Davidbresler2,
1….
P3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
P4
8213034705,
117, 2.998947,
daysrus, 95….
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Transformation
Filter()
Action
Count()
RDD
newRDD
Value
@tgrall#Devoxx #sparkstreaming
Spark Streaming
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
What is Streaming?
• Data Stream:
• Unbounded sequence of data arriving continuously
• Stream processing:
• Low latency processing, querying, and analyzing of real time
streaming data
@tgrall#Devoxx #sparkstreaming
Why Spark Streaming
• Many applications must process
streaming data
• With the following Requirements:
• Results in near-real-time
• Handle large workloads
• latencies of few seconds
• Use Cases
• Website statistics, monitoring
• IoT
• Fraud detection
• Social network trends
• Advertising click monetization
put
put
put
put
Time stamped data
data
• Sensor, System Metrics, Events, log files
• Stock Ticker, User Activity
• Hi Volume, Velocity
Data for real-time
monitoring
@tgrall#Devoxx #sparkstreaming
What is Spark Streaming?
• Enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Extension of the core Spark
Data Sources Data Sinks
@tgrall#Devoxx #sparkstreaming
Spark Streaming Architecture
• Divide data stream into batches of X seconds
• Called DStream = sequence of RDDs
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
@tgrall#Devoxx #sparkstreaming
Process DStream
• Process using transformations
• creates new RDDs
transform
Transform
map
reduceByValue
count
DStream
RDDs
Dstream
RDDs
transformtransform
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
RDD @ time 1 RDD @ time 2 RDD @ time 3
@tgrall#Devoxx #sparkstreaming
Time Series
Data for
real-time monitoring
read
Sensor
Time stamped data
HBase
Processing
data
@tgrall#Devoxx #sparkstreaming
Lab “flow”
@tgrall#Devoxx #sparkstreaming
Convert Line of CSV data to Sensor
Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
@tgrall#Devoxx #sparkstreaming
Create a DStream
val ssc = new StreamingContext(sparkConf, Seconds(2))
val linesDStream = ssc.textFileStream(“/mapr/stream")
batch
time 0-1
linesDStream
batch
time 1-2
batch
time 1-2
DStream: a sequence of RDDs representing a
stream of data
stored in memory as an
RDD
@tgrall#Devoxx #sparkstreaming
Process DStream
val linesDStream = ssc.textFileStream(”directory path")
val sensorDStream = linesDStream.map(parseSensor)
map
new RDDs created for
every batch
batch
time 0-1
linesDStream RDDs
sensorDstream RDDs
batch
time 1-2
mapmap
batch
time 1-2
@tgrall#Devoxx #sparkstreaming
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
Put objects written
To HBase
batch
time 0-1
linesRDD DStream
sensorRDD Dstream
batch
time 1-2
map
batch
time 1-2
HBase
save save save
output operation: persist data to external storage
map map
@tgrall#Devoxx #sparkstreaming
Learn More
• Free Spark Online Training: http://learn.mapr.com
Ad

Recommended

Apache Spark Overview
Apache Spark Overview
Carol McDonald
 
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
 
Apache Spark streaming and HBase
Apache Spark streaming and HBase
Carol McDonald
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
 
Introduction to Spark
Introduction to Spark
Carol McDonald
 
Getting Started with HBase
Getting Started with HBase
Carol McDonald
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
MapR Technologies
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
When Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Apache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
MapR & Skytree:
MapR & Skytree:
MapR Technologies
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Time-Series Apache HBase
Time-Series Apache HBase
HBaseCon
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 

More Related Content

What's hot (20)

Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
When Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Apache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
MapR & Skytree:
MapR & Skytree:
MapR Technologies
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
When Streaming Becomes Strategic
When Streaming Becomes Strategic
MapR Technologies
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
Carol McDonald
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to Live Patient Data
Carol McDonald
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
Sujee Maniyam
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
 

Viewers also liked (20)

Time-Series Apache HBase
Time-Series Apache HBase
HBaseCon
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
 
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
DataStax Academy
 
Spark!
Spark!
Przemek Maciolek
 
Apache spark core
Apache spark core
Thành Nguyễn
 
SparkSQL et Cassandra - Tool In Action Devoxx 2015
SparkSQL et Cassandra - Tool In Action Devoxx 2015
Alexander DEJANOVSKI
 
The SparkSQL things you maybe confuse
The SparkSQL things you maybe confuse
vito jeng
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
Making Scrum Work Inside Small Businesses
Making Scrum Work Inside Small Businesses
Laszlo Szalvay
 
Streaming map reduce
Streaming map reduce
danirayan
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
Jesse Yates
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
Olga Lavrentieva
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
 
Lspe
Lspe
Arpit Tak
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
DataWorks Summit
 
Apache HBase 0.98
Apache HBase 0.98
AndrewPurtell
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBase
Sematext Group, Inc.
 
Time-Series Apache HBase
Time-Series Apache HBase
HBaseCon
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard
DataStax Academy
 
SparkSQL et Cassandra - Tool In Action Devoxx 2015
SparkSQL et Cassandra - Tool In Action Devoxx 2015
Alexander DEJANOVSKI
 
The SparkSQL things you maybe confuse
The SparkSQL things you maybe confuse
vito jeng
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
clairvoyantllc
 
Making Scrum Work Inside Small Businesses
Making Scrum Work Inside Small Businesses
Laszlo Szalvay
 
Streaming map reduce
Streaming map reduce
danirayan
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
Jesse Yates
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
Olga Lavrentieva
 
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
DataWorks Summit
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBase
Sematext Group, Inc.
 
Ad

Similar to Build a Time Series Application with Apache Spark and Apache HBase (20)

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Big Data on azure
Big Data on azure
David Giard
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Chris Fregly
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Iulia Emanuela Iancuta
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
ITCamp
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
Devnexus 2018
Devnexus 2018
Roy Russo
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
Apache Spark Components
Apache Spark Components
Girish Khanzode
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
Adam Doyle
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
Spark to DocumentDB connector
Spark to DocumentDB connector
Denny Lee
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Big Data on azure
Big Data on azure
David Giard
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Chris Fregly
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Iulia Emanuela Iancuta
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
ITCamp
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
Devnexus 2018
Devnexus 2018
Roy Russo
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
Adam Doyle
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
Spark to DocumentDB connector
Spark to DocumentDB connector
Denny Lee
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
Xavier Lucas
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Ad

More from Carol McDonald (18)

Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Spark graphx
Spark graphx
Carol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churn
Carol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
 
CU9411MW.DOC
CU9411MW.DOC
Carol McDonald
 
Getting started with HBase
Getting started with HBase
Carol McDonald
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUs
Carol McDonald
 
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Carol McDonald
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Carol McDonald
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
Carol McDonald
 
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DB
Carol McDonald
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
Carol McDonald
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churn
Carol McDonald
 
Apache Spark Machine Learning
Apache Spark Machine Learning
Carol McDonald
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
 
Getting started with HBase
Getting started with HBase
Carol McDonald
 

Recently uploaded (20)

Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 

Build a Time Series Application with Apache Spark and Apache HBase