SlideShare a Scribd company logo
Stephan Ewen
@stephanewen
Streaming Analytics
with Apache Flink
Apache Flink Stack
2
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
LibrariesApache Beam
Streaming and batch as first class citizens.
Today
3
Streaming and batch as first class citizens.
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
LibrariesApache Beam
4
Streaming technology is enabling the obvious:
continuous processing on data that is
continuously produced
Continuous Processing with Batch
 Continuous
ingestion
 Periodic (e.g.,
hourly) files
 Periodic batch
jobs
5
λ Architecture
 "Batch layer": what
we had before
 "Stream layer":
approximate early
results
6
A Stream Processing Pipeline
7
collect log analyze serve & store
Programs and Dataflows
8
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Source
[1]
map()
[1]
keyBy()/
window()/
apply()
[1]
Sink
[1]
Source
[2]
map()
[2]
keyBy()/
window()/
apply()
[2]
Streaming
Dataflow
Why does Flink stream flink?
9
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
True
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Streaming Analytics by Example
10
Time-Windowed Aggregations
11
case class Event(sensor: String, measure: Double)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum("measure")
Time-Windowed Aggregations
12
case class Event(sensor: String, measure: Double)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("sensor")
.timeWindow(Time.seconds(60), Time.seconds(5))
.sum("measure")
Session-Windowed Aggregations
13
case class Event(sensor: String, measure: Double)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("sensor")
.window(EventTimeSessionWindows.withGap(Time.seconds(60)))
.max("measure")
Pattern Detection
14
case class Event(producer: String, evtType: Int, msg: String)
case class Alert(msg: String)
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("producer")
.flatMap(new RichFlatMapFuncion[Event, Alert]() {
lazy val state: ValueState[Int] = getRuntimeContext.getState(…)
def flatMap(event: Event, out: Collector[Alert]) = {
val newState = state.value() match {
case 0 if (event.evtType == 0) => 1
case 1 if (event.evtType == 1) => 0
case x => out.collect(Alert(event.msg, x)); 0
}
state.update(newState)
}
})
Pattern Detection
15
case class Event(producer: String, evtType: Int, msg: String)
case class Alert(msg: String)
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("producer")
.flatMap(new RichFlatMapFuncion[Event, Alert]() {
lazy val state: ValueState[Int] = getRuntimeContext.getState(…)
def flatMap(event: Event, out: Collector[Alert]) = {
val newState = state.value() match {
case 0 if (event.evtType == 0) => 1
case 1 if (event.evtType == 1) => 0
case x => out.collect(Alert(event.msg, x)); 0
}
state.update(newState)
}
})
Embedded key/value
state store
Many more
 Joining streams (e.g. combine readings from sensor)
 Detecting Patterns (CEP)
 Applying (changing) rules or models to events
 Training and applying online machine learning
models
 …
16
(It's) About Time
17
18
The biggest change in moving from
batch to streaming is
handling time explicitly
Example: Windowing by Time
19
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Different Notions of Time
20
Event Producer Message Queue
Flink
Data Source
Flink
Window Operator
partition 1
partition 2
Event
Time
Ingestion
Time
Window
Processing
Time
Time and the Dataflow Model
 Event Time semantics in Flink follow the
Dataflow model (Apache Beam (incub.))
 See previous talk by Frances Perry
& Tyler Akidau
 For the sake of time (no pun intended) I, only briefly
recapitulate on the basic concept
21
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
Event Time vs. Processing Time
22
Processing Time
23
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(ProcessingTime)
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Ingestion Time
24
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(IngestionTime)
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Event Time
25
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(EventTime)
val stream: DataStream[Event] = env.addSource(…)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Event Time
26
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(EventTime)
val stream: DataStream[Event] = env.addSource(…)
val tsStream = stream.assignTimestampsAndWatermarks(
new MyTimestampsAndWatermarkGenerator())
tsStream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Watermarks
27
7
W(11)W(17)
11159121417122220 171921
Watermark
Event
Event timestamp
Stream (in order)
7
W(11)W(20)
Watermark
991011141517
Event
Event timestamp
1820 192123
Stream (out of order)
Watermarks in Parallel
28
Source
(1)
Source
(2)
map
(1)
map
(2)
window
(1)
window
(2)
29
29
17
14
14
29
14
14
W(33)
W(17)
W(17)
A|30B|31
C|30
D|15
E|30
F|15G|18H|20
K|35
Watermark
Event Time
at the operator
Event
[id|timestamp]
Event Time
at input streams
Watermark
Generation
M|39N|39Q|44
L|22O|23R|37
Per Kafka Partition Watermarks
29
Source
(1)
Source
(2)
map
(1)
map
(2)
window
(1)
window
(2)
33
17
29
29
17
14
14
29
14
14
W(33)
W(17)
W(17)
A|30B|73
C|33
D|18
E|31
F|15G|91H|94
K|77
Watermark
Generation
L|35N|39
O|97 M|89
I|21Q|23
T|99 S|97
Matters of State
(Fault Tolerance, Reinstatements, etc)
30
Back to the Aggregation Example
31
case class Event(id: String, measure: Double, timestamp: Long)
val env = StreamExecutionEnvironment.getExecutionEnvironment
val stream: DataStream[Event] = env.addSource(
new FlinkKafkaConsumer09(topic, schema, properties))
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
Stateful
Fault Tolerance
 Prevent data loss (reprocess lost in-flight events)
 Recover state consistency (exactly-once semantics)
• Pending windows & user-defined (key/value) state
 Checkpoint based fault tolerance
• Periodically create checkpoints
• Recovery: resume from last completed checkpoint
• Async. Barrier Snapshots (ABS) Algorithm
32
Checkpoints
33
data stream
event
newer records older records
State of the dataflow
at point Y
State of the dataflow
at point X
Checkpoint Barriers
 Markers, injected into the streams
34
Checkpoint Procedure
35
Checkpoint Procedure
36
Savepoints
 A "Checkpoint" is a globally consistent point-in-time snapshot
of the streaming program (point in stream, state)
 A "Savepoint" is a user-triggered retained checkpoint
 Streaming programs can start from a savepoint
37
Savepoint B Savepoint A
(Re)processing data (in batch)
 Re-processing data (what-if exploration, to correct bugs, etc.)
 Usually by running a batch job with a set of old files
 Tools that map files to times
38
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am…
Collection of files, by ingestion time
2016-3-11
10:00pm
To the batch
processor
Unclear Batch Boundaries
39
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am…
2016-3-11
10:00pm
To the batch
processor
?
?
What about sessions across batches?
(Re)processing data (streaming)
 Draw savepoints at times that you will want to start new jobs
from (daily, hourly, …)
 Reprocess by starting a new job from a savepoint
• Defines start position in stream (for example Kafka offsets)
• Initializes pending state (like partial sessions)
40
Savepoint
Run new streaming
program from savepoint
Continuous Data Sources
41
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm …
partition
partition
Savepoint
Savepoint
Stream of Kafka Partitions
Stream view over sequence of files
Kafka offsets +
Operator state
File mod timestamp +
File position +
Operator state
WIP (target: Flink 1.1)
Complex Event Processing Primer
Demo Time
42
Demo Scenario
Pattern validation & violation detection:
 Events should follow a certain pattern,
or an alert should be raised
 Think cybersecurity, process monitoring, etc
43
An Outlook on Things to Come
44
Flink in the wild
45
30 billion events daily 2 billion events in
10 1Gb machines
data integration & distribution
platform
See talks by at
Roadmap
 Dynamic Scaling, Resource Elasticity
 Stream SQL
 CEP enhancements
 Incremental & asynchronous state snapshotting
 Mesos support
 More connectors, end-to-end exactly once
 API enhancements (e.g., joins, slowly changing inputs)
 Security (data encryption, Kerberos with Kafka)
46
47
Apache Flink Meetup - Thursday, April, 28th
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers
50
I stream, do you?
Ad

Recommended

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Introduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Apache Airflow
Apache Airflow
Sumit Maheshwari
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 

More Related Content

What's hot (20)

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Introduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Apache Airflow
Apache Airflow
Sumit Maheshwari
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
Databricks
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Introduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 

Similar to Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen (20)

Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Zurich Flink Meetup
Zurich Flink Meetup
Konstantinos Kloudas
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Stream Processing with Apache Flink
Stream Processing with Apache Flink
C4Media
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
ucelebi
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Stream Processing with Apache Flink
Stream Processing with Apache Flink
C4Media
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Ad

More from confluent (20)

Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
AI_Presentation (1). Artificial intelligence
AI_Presentation (1). Artificial intelligence
RoselynKaur8thD34
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
How to Un-Obsolete Your Legacy Keypad Design
How to Un-Obsolete Your Legacy Keypad Design
Epec Engineered Technologies
 
Data Structures Module 3 Binary Trees Binary Search Trees Tree Traversals AVL...
Data Structures Module 3 Binary Trees Binary Search Trees Tree Traversals AVL...
resming1
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 
Complete guidance book of Asp.Net Web API
Complete guidance book of Asp.Net Web API
Shabista Imam
 
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx
resming1
 
AI_Presentation (1). Artificial intelligence
AI_Presentation (1). Artificial intelligence
RoselynKaur8thD34
 
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
resming1
 
Data Structures Module 3 Binary Trees Binary Search Trees Tree Traversals AVL...
Data Structures Module 3 Binary Trees Binary Search Trees Tree Traversals AVL...
resming1
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
Generative AI & Scientific Research : Catalyst for Innovation, Ethics & Impact
AlqualsaDIResearchGr
 
Microwatt: Open Tiny Core, Big Possibilities
Microwatt: Open Tiny Core, Big Possibilities
IBM
 
دراسة حاله لقرية تقع في جنوب غرب السودان
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Unit III_One Dimensional Consolidation theory
Unit III_One Dimensional Consolidation theory
saravananr808639
 
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
CST413 KTU S7 CSE Machine Learning Clustering K Means Hierarchical Agglomerat...
resming1
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
Industry 4.o the fourth revolutionWeek-2.pptx
Industry 4.o the fourth revolutionWeek-2.pptx
KNaveenKumarECE
 
Solar thermal – Flat plate and concentrating collectors .pptx
Solar thermal – Flat plate and concentrating collectors .pptx
jdaniabraham1
 
System design handwritten notes guidance
System design handwritten notes guidance
Shabista Imam
 
Introduction to sensing and Week-1.pptx
Introduction to sensing and Week-1.pptx
KNaveenKumarECE
 
Structural Wonderers_new and ancient.pptx
Structural Wonderers_new and ancient.pptx
nikopapa113
 
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Rapid Prototyping for XR: Lecture 5 - Cross Platform Development
Mark Billinghurst
 
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
May 2025: Top 10 Read Articles in Data Mining & Knowledge Management Process
IJDKP
 

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

  • 2. Apache Flink Stack 2 DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow LibrariesApache Beam Streaming and batch as first class citizens.
  • 3. Today 3 Streaming and batch as first class citizens. DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow LibrariesApache Beam
  • 4. 4 Streaming technology is enabling the obvious: continuous processing on data that is continuously produced
  • 5. Continuous Processing with Batch  Continuous ingestion  Periodic (e.g., hourly) files  Periodic batch jobs 5
  • 6. λ Architecture  "Batch layer": what we had before  "Stream layer": approximate early results 6
  • 7. A Stream Processing Pipeline 7 collect log analyze serve & store
  • 8. Programs and Dataflows 8 Source Transformation Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Source [1] map() [1] keyBy()/ window()/ apply() [1] Sink [1] Source [2] map() [2] keyBy()/ window()/ apply() [2] Streaming Dataflow
  • 9. Why does Flink stream flink? 9 Low latency High Throughput Well-behaved flow control (back pressure) Make more sense of data Works on real-time and historic data True Streaming Event Time APIs Libraries Stateful Streaming Globally consistent savepoints Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing
  • 11. Time-Windowed Aggregations 11 case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum("measure")
  • 12. Time-Windowed Aggregations 12 case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .timeWindow(Time.seconds(60), Time.seconds(5)) .sum("measure")
  • 13. Session-Windowed Aggregations 13 case class Event(sensor: String, measure: Double) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("sensor") .window(EventTimeSessionWindows.withGap(Time.seconds(60))) .max("measure")
  • 14. Pattern Detection 14 case class Event(producer: String, evtType: Int, msg: String) case class Alert(msg: String) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("producer") .flatMap(new RichFlatMapFuncion[Event, Alert]() { lazy val state: ValueState[Int] = getRuntimeContext.getState(…) def flatMap(event: Event, out: Collector[Alert]) = { val newState = state.value() match { case 0 if (event.evtType == 0) => 1 case 1 if (event.evtType == 1) => 0 case x => out.collect(Alert(event.msg, x)); 0 } state.update(newState) } })
  • 15. Pattern Detection 15 case class Event(producer: String, evtType: Int, msg: String) case class Alert(msg: String) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("producer") .flatMap(new RichFlatMapFuncion[Event, Alert]() { lazy val state: ValueState[Int] = getRuntimeContext.getState(…) def flatMap(event: Event, out: Collector[Alert]) = { val newState = state.value() match { case 0 if (event.evtType == 0) => 1 case 1 if (event.evtType == 1) => 0 case x => out.collect(Alert(event.msg, x)); 0 } state.update(newState) } }) Embedded key/value state store
  • 16. Many more  Joining streams (e.g. combine readings from sensor)  Detecting Patterns (CEP)  Applying (changing) rules or models to events  Training and applying online machine learning models  … 16
  • 18. 18 The biggest change in moving from batch to streaming is handling time explicitly
  • 19. Example: Windowing by Time 19 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")
  • 20. Different Notions of Time 20 Event Producer Message Queue Flink Data Source Flink Window Operator partition 1 partition 2 Event Time Ingestion Time Window Processing Time
  • 21. Time and the Dataflow Model  Event Time semantics in Flink follow the Dataflow model (Apache Beam (incub.))  See previous talk by Frances Perry & Tyler Akidau  For the sake of time (no pun intended) I, only briefly recapitulate on the basic concept 21
  • 22. 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time Event Time vs. Processing Time 22
  • 23. Processing Time 23 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(ProcessingTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")
  • 24. Ingestion Time 24 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(IngestionTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")
  • 25. Event Time 25 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")
  • 26. Event Time 26 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(EventTime) val stream: DataStream[Event] = env.addSource(…) val tsStream = stream.assignTimestampsAndWatermarks( new MyTimestampsAndWatermarkGenerator()) tsStream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure")
  • 27. Watermarks 27 7 W(11)W(17) 11159121417122220 171921 Watermark Event Event timestamp Stream (in order) 7 W(11)W(20) Watermark 991011141517 Event Event timestamp 1820 192123 Stream (out of order)
  • 29. Per Kafka Partition Watermarks 29 Source (1) Source (2) map (1) map (2) window (1) window (2) 33 17 29 29 17 14 14 29 14 14 W(33) W(17) W(17) A|30B|73 C|33 D|18 E|31 F|15G|91H|94 K|77 Watermark Generation L|35N|39 O|97 M|89 I|21Q|23 T|99 S|97
  • 30. Matters of State (Fault Tolerance, Reinstatements, etc) 30
  • 31. Back to the Aggregation Example 31 case class Event(id: String, measure: Double, timestamp: Long) val env = StreamExecutionEnvironment.getExecutionEnvironment val stream: DataStream[Event] = env.addSource( new FlinkKafkaConsumer09(topic, schema, properties)) stream .keyBy("id") .timeWindow(Time.seconds(15), Time.seconds(5)) .sum("measure") Stateful
  • 32. Fault Tolerance  Prevent data loss (reprocess lost in-flight events)  Recover state consistency (exactly-once semantics) • Pending windows & user-defined (key/value) state  Checkpoint based fault tolerance • Periodically create checkpoints • Recovery: resume from last completed checkpoint • Async. Barrier Snapshots (ABS) Algorithm 32
  • 33. Checkpoints 33 data stream event newer records older records State of the dataflow at point Y State of the dataflow at point X
  • 34. Checkpoint Barriers  Markers, injected into the streams 34
  • 37. Savepoints  A "Checkpoint" is a globally consistent point-in-time snapshot of the streaming program (point in stream, state)  A "Savepoint" is a user-triggered retained checkpoint  Streaming programs can start from a savepoint 37 Savepoint B Savepoint A
  • 38. (Re)processing data (in batch)  Re-processing data (what-if exploration, to correct bugs, etc.)  Usually by running a batch job with a set of old files  Tools that map files to times 38 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am… Collection of files, by ingestion time 2016-3-11 10:00pm To the batch processor
  • 39. Unclear Batch Boundaries 39 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am… 2016-3-11 10:00pm To the batch processor ? ? What about sessions across batches?
  • 40. (Re)processing data (streaming)  Draw savepoints at times that you will want to start new jobs from (daily, hourly, …)  Reprocess by starting a new job from a savepoint • Defines start position in stream (for example Kafka offsets) • Initializes pending state (like partial sessions) 40 Savepoint Run new streaming program from savepoint
  • 41. Continuous Data Sources 41 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm … partition partition Savepoint Savepoint Stream of Kafka Partitions Stream view over sequence of files Kafka offsets + Operator state File mod timestamp + File position + Operator state WIP (target: Flink 1.1)
  • 42. Complex Event Processing Primer Demo Time 42
  • 43. Demo Scenario Pattern validation & violation detection:  Events should follow a certain pattern, or an alert should be raised  Think cybersecurity, process monitoring, etc 43
  • 44. An Outlook on Things to Come 44
  • 45. Flink in the wild 45 30 billion events daily 2 billion events in 10 1Gb machines data integration & distribution platform See talks by at
  • 46. Roadmap  Dynamic Scaling, Resource Elasticity  Stream SQL  CEP enhancements  Incremental & asynchronous state snapshotting  Mesos support  More connectors, end-to-end exactly once  API enhancements (e.g., joins, slowly changing inputs)  Security (data encryption, Kerberos with Kafka) 46
  • 47. 47 Apache Flink Meetup - Thursday, April, 28th
  • 48. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 Early bird deadline: July 15, 2016 www.flink-forward.org