SlideShare a Scribd company logo
Stateful distributed stream
processing
Gyula Fóra
gyfora@apache.org
@GyulaFora
This talk
§ Stateful processing by example
§ Definition and challenges
§ State in current open-source systems
§ State in Apache Flink
§ Closing
2Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful processing by example
§ Window aggregations
• Total number of customers
in the last 10 minutes
• State: Current aggregate
§ Machine learning
• Fitting trends to the evolving
stream
• State: Model
3Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful processing by example
§ Pattern recognition
• Detect suspicious financial
activity
• State: Matched prefix
§ Stream-stream joins
• Match ad views and
impressions
• State: Elements in the window
4Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Stateful operators
§ All these examples use a common processing
pattern
§ Stateful operator (in essence):
𝒇:	
   𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆.
§ State hangs around and can be read and
modified as the stream evolves
§ Goal: Get as close as possible while
maintaining scalability and fault-tolerance
5Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
State-of-the-art systems
§ Most systems allow developers to
implement stateful programs
§ Trick is to limit the scope of 𝒇 (state access)
while maintaining expressivity
§ Issues to tackle:
• Expressivity
• Exactly-once semantics
• Scalability to large inputs
• Scalability to large states
6Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ States available only in Trident API
§ Dedicated operators for state updates and
queries
§ State access methods
• stateQuery(…)
• partitionPersist(…)
• persistentAggregate(…)
§ It’s very difficult to
implement transactional
states
Exactly-­‐‑once  guarantee
7Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Storm Word Count
8Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Stateless runtime by design
• No continuous operators
• UDFs are assumed to be stateless
§ State can be generated as a stream of
RDDs: updateStateByKey(…)
𝒇:	
   𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆.
𝒌
§ 𝒇 is scoped to a specific key
§ Exactly-once semantics
9Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
val stateDstream = wordDstream.updateStateByKey[Int](
newUpdateFunc,
new HashPartitioner(ssc.sparkContext.defaultParallelism),
true,
initialRDD)
val updateFunc = (values: Seq[Int], state: Option[Int]) => {
val currentCount = values.sum
val previousCount = state.getOrElse(0)
Some(currentCount + previousCount)
}
Spark Streaming Word Count
10Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Stateful dataflow operators
(Any task can hold state)
§ State changes are stored
as a log by Kafka
§ Custom storage engines can
be plugged in to the log
§ 𝒇 is scoped to a specific task
§ At-least-once processing
semantics
11Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Samza Word Count
public class WordCounter implements StreamTask, InitableTask {
//Some omitted details…
private KeyValueStore<String, Integer> store;
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator) {
//Get the current count
String word = (String) envelope.getKey();
Integer count = store.get(word);
if (count == null) count = 0;
//Increment, store and send
count += 1;
store.put(word, count);
collector.send(
new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count));
}
}
12Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
What can we say so far?
§ Trident
+ Consistent state accessible from outside
– Only works well with idempotent states
– States are not part of the operators
§ Spark
+ Integrates well with the system guarantees
– Limited expressivity
– Immutability increases update complexity
§ Samza
+ Efficient log based state updates
+ States are well integrated with the operators
– Lack of exactly-once semantics
– State access is not fully transparent
13Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
§ Take what’s good, make it work + add
some more
§ Clean and powerful abstractions
• Local (Task) state
• Partitioned (Key) state
§ Proper API integration
• Java: OperatorState interface
• Scala: mapWithState, flatMapWithState…
§ Exactly-once semantics by checkpointing
14Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Flink Word Count
words.keyBy(x => x).mapWithState {
(word, count: Option[Int]) =>
{
val newCount = count.getOrElse(0) + 1
val output = (word, newCount)
(output, Some(newCount))
}
}
15Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Local State
§ Task scoped state access
§ Can be used to implement
custom access patterns
§ Typical usage:
• Source operators (offset)
• Machine learning models
• Use cyclic flows to simulate
global state access
16Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Local State Example (Java)
public class MySource extends RichParallelSourceFunction {
// Omitted details
private OperatorState<Long> offset;
@Override
public void run(SourceContext ctx) {
Object checkpointLock = ctx.getCheckpointLock();
isRunning = true;
while (isRunning) {
synchronized (checkpointLock) {
offset.update(offset.value() + 1);
// ctx.collect(next);
}
}
}
}
17Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Partitioned State
§ Key scoped state access
§ Highly scalable
§ Allows for incremental
backup/restore
§ Typical usage:
• Any per-key operation
• Grouped aggregations
• Window buffers
18Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Partitioned State Example (Scala)
// Compute the current average of each city's temperature
temps.keyBy("city").mapWithState {
(in: Temp, state: Option[(Double, Long)]) =>
{
val current = state.getOrElse((0.0, 0L))
val updated = (current._1 + in.temp, current._2 + 1)
val avg = Temp(in.city, updated._1 / updated._2)
(avg, Some(updated))
}
}
case class Temp(city: String, temp: Double)
19Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Exactly-once semantics
§ Based on consistent global snapshots
§ Algorithm designed for stateful dataflows
20Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Detailed  mechanism
Exactly-once semantics
§ Low runtime overhead
§ Checkpointing logic is separated from
application logic
21Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Blogpost  on  streaming  fault-­‐‑tolerance
Summary
§ State is essential to many applications
§ Fault-tolerant streaming state is a hard
problem
§ There is a trade-off between expressivity vs
scalability/fault-tolerance
§ Flink tries to hit the sweet spot with…
• Providing very flexible abstractions
• Keeping good scalability and exactly-once
semantics
22Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
Thank you!

More Related Content

PPTX
Data Stream Processing with Apache Flink
PPTX
Flink Streaming @BudapestData
PPTX
Apache Flink Berlin Meetup May 2016
PPTX
Real-time Stream Processing with Apache Flink
PDF
Flink Apachecon Presentation
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
PDF
Marton Balassi – Stateful Stream Processing
PPTX
Apache flink
Data Stream Processing with Apache Flink
Flink Streaming @BudapestData
Apache Flink Berlin Meetup May 2016
Real-time Stream Processing with Apache Flink
Flink Apachecon Presentation
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Marton Balassi – Stateful Stream Processing
Apache flink

What's hot (20)

PPTX
Flink internals web
PDF
Unified Stream and Batch Processing with Apache Flink
PPTX
Apache Flink@ Strata & Hadoop World London
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PPTX
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Apache Flink: API, runtime, and project roadmap
PPTX
Flink Streaming Hadoop Summit San Jose
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PPTX
Apache Flink at Strata San Jose 2016
PDF
Apache Flink: Streaming Done Right @ FOSDEM 2016
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
PPTX
Streaming in the Wild with Apache Flink
PDF
Flink Streaming Berlin Meetup
PDF
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
PDF
Flink Gelly - Karlsruhe - June 2015
PPTX
Debunking Common Myths in Stream Processing
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Flink internals web
Unified Stream and Batch Processing with Apache Flink
Apache Flink@ Strata & Hadoop World London
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Apache Flink: API, runtime, and project roadmap
Flink Streaming Hadoop Summit San Jose
Don't Cross The Streams - Data Streaming And Apache Flink
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Apache Flink at Strata San Jose 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Streaming in the Wild with Apache Flink
Flink Streaming Berlin Meetup
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Gelly - Karlsruhe - June 2015
Debunking Common Myths in Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Ad

Viewers also liked (20)

PDF
Baymeetup-FlinkResearch
PPTX
Flink vs. Spark
PDF
Gelly in Apache Flink Bay Area Meetup
PDF
Bay Area Apache Flink Meetup Community Update August 2015
PPTX
Designing and Testing Accumulo Iterators
PPTX
Click-Through Example for Flink’s KafkaConsumer Checkpointing
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
PPT
Step-by-Step Introduction to Apache Flink
PDF
Composing Project Archetyps with SBT AutoPlugins
PDF
Transformative Git Practices
PDF
A Scala Corrections Library
PPT
Lightning Talk: Running MongoDB on Docker for High Performance Deployments
PDF
Building Big Data Streaming Architectures
PDF
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
PPTX
Future of ai on the jvm
PPTX
KDD 2016 Streaming Analytics Tutorial
PDF
Real-time Stream Processing with Apache Flink @ Hadoop Summit
PDF
Effective Actors
PDF
RBea: Scalable Real-Time Analytics at King
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
Baymeetup-FlinkResearch
Flink vs. Spark
Gelly in Apache Flink Bay Area Meetup
Bay Area Apache Flink Meetup Community Update August 2015
Designing and Testing Accumulo Iterators
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Step-by-Step Introduction to Apache Flink
Composing Project Archetyps with SBT AutoPlugins
Transformative Git Practices
A Scala Corrections Library
Lightning Talk: Running MongoDB on Docker for High Performance Deployments
Building Big Data Streaming Architectures
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Future of ai on the jvm
KDD 2016 Streaming Analytics Tutorial
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Effective Actors
RBea: Scalable Real-Time Analytics at King
Large-Scale Stream Processing in the Hadoop Ecosystem
Ad

Similar to Stateful Distributed Stream Processing (20)

PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
PDF
Stateful stream processing with Apache Flink
PDF
Introduction to Stateful Stream Processing with Apache Flink.
PDF
Zurich Flink Meetup
PDF
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PPTX
Flink Meetup Septmeber 2017 2018
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
PDF
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PPTX
The Stream Processor as a Database Apache Flink
PPTX
Stream processing - Apache flink
PPTX
SICS: Apache Flink Streaming
PDF
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
PPTX
Apache Flink @ NYC Flink Meetup
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
PPTX
Counting Elements in Streams
PDF
Streaming Dataflow with Apache Flink
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Stateful stream processing with Apache Flink
Introduction to Stateful Stream Processing with Apache Flink.
Zurich Flink Meetup
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Flink Meetup Septmeber 2017 2018
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as a Database Apache Flink
Stream processing - Apache flink
SICS: Apache Flink Streaming
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Apache Flink @ NYC Flink Meetup
K. Tzoumas & S. Ewen – Flink Forward Keynote
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Counting Elements in Streams
Streaming Dataflow with Apache Flink

Recently uploaded (20)

PDF
[EN] Industrial Machine Downtime Prediction
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Computer network topology notes for revision
PDF
Lecture1 pattern recognition............
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
annual-report-2024-2025 original latest.
[EN] Industrial Machine Downtime Prediction
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to machine learning and Linear Models
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Computer network topology notes for revision
Lecture1 pattern recognition............
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Knowledge Engineering Part 1
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IB Computer Science - Internal Assessment.pptx
annual-report-2024-2025 original latest.

Stateful Distributed Stream Processing

  • 2. This talk § Stateful processing by example § Definition and challenges § State in current open-source systems § State in Apache Flink § Closing 2Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 3. Stateful processing by example § Window aggregations • Total number of customers in the last 10 minutes • State: Current aggregate § Machine learning • Fitting trends to the evolving stream • State: Model 3Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 4. Stateful processing by example § Pattern recognition • Detect suspicious financial activity • State: Matched prefix § Stream-stream joins • Match ad views and impressions • State: Elements in the window 4Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 5. Stateful operators § All these examples use a common processing pattern § Stateful operator (in essence): 𝒇:   𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆. § State hangs around and can be read and modified as the stream evolves § Goal: Get as close as possible while maintaining scalability and fault-tolerance 5Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 6. State-of-the-art systems § Most systems allow developers to implement stateful programs § Trick is to limit the scope of 𝒇 (state access) while maintaining expressivity § Issues to tackle: • Expressivity • Exactly-once semantics • Scalability to large inputs • Scalability to large states 6Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 7. § States available only in Trident API § Dedicated operators for state updates and queries § State access methods • stateQuery(…) • partitionPersist(…) • persistentAggregate(…) § It’s very difficult to implement transactional states Exactly-­‐‑once  guarantee 7Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 8. Storm Word Count 8Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 9. § Stateless runtime by design • No continuous operators • UDFs are assumed to be stateless § State can be generated as a stream of RDDs: updateStateByKey(…) 𝒇:   𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆. 𝒌 § 𝒇 is scoped to a specific key § Exactly-once semantics 9Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 10. val stateDstream = wordDstream.updateStateByKey[Int]( newUpdateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initialRDD) val updateFunc = (values: Seq[Int], state: Option[Int]) => { val currentCount = values.sum val previousCount = state.getOrElse(0) Some(currentCount + previousCount) } Spark Streaming Word Count 10Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 11. § Stateful dataflow operators (Any task can hold state) § State changes are stored as a log by Kafka § Custom storage engines can be plugged in to the log § 𝒇 is scoped to a specific task § At-least-once processing semantics 11Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 12. Samza Word Count public class WordCounter implements StreamTask, InitableTask { //Some omitted details… private KeyValueStore<String, Integer> store; public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { //Get the current count String word = (String) envelope.getKey(); Integer count = store.get(word); if (count == null) count = 0; //Increment, store and send count += 1; store.put(word, count); collector.send( new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count)); } } 12Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 13. What can we say so far? § Trident + Consistent state accessible from outside – Only works well with idempotent states – States are not part of the operators § Spark + Integrates well with the system guarantees – Limited expressivity – Immutability increases update complexity § Samza + Efficient log based state updates + States are well integrated with the operators – Lack of exactly-once semantics – State access is not fully transparent 13Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 14. § Take what’s good, make it work + add some more § Clean and powerful abstractions • Local (Task) state • Partitioned (Key) state § Proper API integration • Java: OperatorState interface • Scala: mapWithState, flatMapWithState… § Exactly-once semantics by checkpointing 14Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 15. Flink Word Count words.keyBy(x => x).mapWithState { (word, count: Option[Int]) => { val newCount = count.getOrElse(0) + 1 val output = (word, newCount) (output, Some(newCount)) } } 15Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 16. Local State § Task scoped state access § Can be used to implement custom access patterns § Typical usage: • Source operators (offset) • Machine learning models • Use cyclic flows to simulate global state access 16Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 17. Local State Example (Java) public class MySource extends RichParallelSourceFunction { // Omitted details private OperatorState<Long> offset; @Override public void run(SourceContext ctx) { Object checkpointLock = ctx.getCheckpointLock(); isRunning = true; while (isRunning) { synchronized (checkpointLock) { offset.update(offset.value() + 1); // ctx.collect(next); } } } } 17Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 18. Partitioned State § Key scoped state access § Highly scalable § Allows for incremental backup/restore § Typical usage: • Any per-key operation • Grouped aggregations • Window buffers 18Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 19. Partitioned State Example (Scala) // Compute the current average of each city's temperature temps.keyBy("city").mapWithState { (in: Temp, state: Option[(Double, Long)]) => { val current = state.getOrElse((0.0, 0L)) val updated = (current._1 + in.temp, current._2 + 1) val avg = Temp(in.city, updated._1 / updated._2) (avg, Some(updated)) } } case class Temp(city: String, temp: Double) 19Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27
  • 20. Exactly-once semantics § Based on consistent global snapshots § Algorithm designed for stateful dataflows 20Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Detailed  mechanism
  • 21. Exactly-once semantics § Low runtime overhead § Checkpointing logic is separated from application logic 21Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27 Blogpost  on  streaming  fault-­‐‑tolerance
  • 22. Summary § State is essential to many applications § Fault-tolerant streaming state is a hard problem § There is a trade-off between expressivity vs scalability/fault-tolerance § Flink tries to hit the sweet spot with… • Providing very flexible abstractions • Keeping good scalability and exactly-once semantics 22Apache  Flink  Meetup  @  MapR2015-­‐‑08-­‐‑27