SlideShare a Scribd company logo
Till Rohrmann
trohrmann@apache.org
@stsffap
Unifying Stream SQL and CEP
for Declarative Stream
Processing with Apache Flink
2
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Streams are Everywhere
 Most data is continuously produced as stream
 Processing data as it arrives
is becoming very popular
 Many diverse applications
and use cases
3
Batch Analytics
4
 The batch approach to data analytics
Streaming Analytics
 Online aggregation of streams
• No delay – Continuous results
 Stream analytics subsumes batch analytics
• Batch is a finite stream
 Demanding requirements on stream processor
• High throughput
• Exactly-once semantics & event-time support
• Advanced window support
5
Complex Event Processing
 Analyzing a stream of events and drawing conclusions
• Detect patterns and assemble new events
 Applications
• Network intrusion
• Process monitoring
• Algorithmic trading
 Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
6
Apache Flink®
 Platform for scalable stream processing
 Meets requirements of CEP and stream analytics
• Low latency and high throughput
• Exactly-once semantics
• Event-time support
• Advanced windowing
 Core DataStream API available for Java & Scala
7
Tracking an Order Process
Use Case
8
Order Process
9
Order Events
 Process is reflected in a stream of order events
 Order(orderId, tStamp, “received”)
 Shipment(orderId, tStamp, “shipped”)
 Delivery(orderId, tStamp, “delivered”)
 orderId: Identifies the order
 tStamp: Time at which the event happened
10
Aggregating Massive Streams
Stream Analytics
11
Stream Analytics
 Traditional batch analytics
• Repeated queries on finite and changing data sets
• Queries join and aggregate large data sets
 Stream analytics
• “Standing” query produces continuous results
from infinite input stream
• Query computes aggregates on high-volume streams
 How to compute aggregates on infinite streams?
12
Compute Aggregates on Streams
 Split infinite stream into finite “windows”
• Split usually by time
 Tumbling windows
• Fixed size & consecutive
 Sliding windows
• Fixed size & may overlap
 Event time mandatory for correct & consistent results!
13
Example: Count Orders by Hour
14
Example: Count Orders by Hour
15
SELECT
TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour,
COUNT(*) AS cnt
FROM events
WHERE
status = ‘received’
GROUP BY
TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
Stream SQL Architecture
 Flink features SQL on static
and streaming tables
 Parsing and optimization by
Apache Calcite
 SQL queries are translated
into native Flink programs
16
Pattern Matching on Streams
Complex Event Processing
17
Real-time Warnings
18
CEP to the Rescue
 Define processing and delivery intervals (SLAs)
 ProcessSucc(orderId, tStamp, duration)
 ProcessWarn(orderId, tStamp)
 DeliverySucc(orderId, tStamp, duration)
 DeliveryWarn(orderId, tStamp)
 orderId: Identifies the order
 tStamp: Time when the event happened
 duration: Duration of the processing/delivery
19
CEP Example
20
Processing: Order  Shipment
21
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
… and both at the same time!
Integrated Stream Analytics with CEP
22
Count Delayed Shipments
23
Compute Avg Processing Time
24
CEP + Stream SQL
25
// complex event processing result
val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption)
val deliveryWarningTable: Table = delWarn.toTable(tableEnv)
tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable)
// calculate the delayed deliveries per day
val delayedDeliveriesPerDay = tableEnv.sql(
"""SELECT
| TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day,
| COUNT(*) AS cnt
|FROM deliveryWarnings
|GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
CEP-enriched Stream SQL
26
SELECT
TUMBLE_START(tStamp, INTERVAL '1' DAY) as day,
AVG(duration) as avgDuration
FROM (
// CEP pattern
SELECT duration, tStamp
FROM inputs MATCH_RECOGNIZE (
PARTITION BY orderId ORDER BY tStamp
MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp
PATTERN (START OTHER* END)
INTERVAL '1' HOUR
DEFINE
START AS START.status = ’received’,
END AS END.status = ‘shipped’
)
)
GROUP BY
TUMBLE(tStamp, INTERVAL '1' DAY)
Conclusion
 Apache Flink handles CEP and analytical
workloads
 Apache Flink offers intuitive APIs
 New class of applications by CEP and
Stream SQL integration 
27
2
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
29
Stream Processing
and Apache Flink®'s
approach to it
@StephanEwen
Apache Flink PMC
CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN
SEPTEMBER11-13, 2017
BERLIN.FLINK-FORWARD.ORG -
We are hiring!
data-artisans.com/careers
Ad

Recommended

Introduction to Apache Flink
Introduction to Apache Flink
datamantra
 
kafka
kafka
Amikam Snir
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
Till Rohrmann
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
HostedbyConfluent
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Introduction to Apache Flink
Introduction to Apache Flink
mxmxm
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Kafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Introduction To Flink
Introduction To Flink
Knoldus Inc.
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 

More Related Content

What's hot (20)

Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
HostedbyConfluent
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Introduction to Apache Flink
Introduction to Apache Flink
mxmxm
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Kafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Introduction To Flink
Introduction To Flink
Knoldus Inc.
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
HostedbyConfluent
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
 
Introduction to Apache Flink
Introduction to Apache Flink
mxmxm
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Introduction To Flink
Introduction To Flink
Knoldus Inc.
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 

Similar to Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink (20)

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Gyula Fóra
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Srinath Perera
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadays
Sriskandarajah Suhothayan
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Soroosh Khodami
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Flink Forward
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Gyula Fóra
 
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Scalable Realtime Analytics with declarative SQL like Complex Event Processin...
Srinath Perera
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadays
Sriskandarajah Suhothayan
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Fabian Hueske
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
Stephan Ewen
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Big Data Spain
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Soroosh Khodami
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 

Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink

  • 1. Till Rohrmann [email protected] @stsffap Unifying Stream SQL and CEP for Declarative Stream Processing with Apache Flink
  • 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 3. Streams are Everywhere  Most data is continuously produced as stream  Processing data as it arrives is becoming very popular  Many diverse applications and use cases 3
  • 4. Batch Analytics 4  The batch approach to data analytics
  • 5. Streaming Analytics  Online aggregation of streams • No delay – Continuous results  Stream analytics subsumes batch analytics • Batch is a finite stream  Demanding requirements on stream processor • High throughput • Exactly-once semantics & event-time support • Advanced window support 5
  • 6. Complex Event Processing  Analyzing a stream of events and drawing conclusions • Detect patterns and assemble new events  Applications • Network intrusion • Process monitoring • Algorithmic trading  Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support 6
  • 7. Apache Flink®  Platform for scalable stream processing  Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time support • Advanced windowing  Core DataStream API available for Java & Scala 7
  • 8. Tracking an Order Process Use Case 8
  • 10. Order Events  Process is reflected in a stream of order events  Order(orderId, tStamp, “received”)  Shipment(orderId, tStamp, “shipped”)  Delivery(orderId, tStamp, “delivered”)  orderId: Identifies the order  tStamp: Time at which the event happened 10
  • 12. Stream Analytics  Traditional batch analytics • Repeated queries on finite and changing data sets • Queries join and aggregate large data sets  Stream analytics • “Standing” query produces continuous results from infinite input stream • Query computes aggregates on high-volume streams  How to compute aggregates on infinite streams? 12
  • 13. Compute Aggregates on Streams  Split infinite stream into finite “windows” • Split usually by time  Tumbling windows • Fixed size & consecutive  Sliding windows • Fixed size & may overlap  Event time mandatory for correct & consistent results! 13
  • 14. Example: Count Orders by Hour 14
  • 15. Example: Count Orders by Hour 15 SELECT TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  • 16. Stream SQL Architecture  Flink features SQL on static and streaming tables  Parsing and optimization by Apache Calcite  SQL queries are translated into native Flink programs 16
  • 17. Pattern Matching on Streams Complex Event Processing 17
  • 19. CEP to the Rescue  Define processing and delivery intervals (SLAs)  ProcessSucc(orderId, tStamp, duration)  ProcessWarn(orderId, tStamp)  DeliverySucc(orderId, tStamp, duration)  DeliveryWarn(orderId, tStamp)  orderId: Identifies the order  tStamp: Time when the event happened  duration: Duration of the processing/delivery 19
  • 21. Processing: Order  Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  • 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  • 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  • 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT duration, tStamp FROM inputs MATCH_RECOGNIZE ( PARTITION BY orderId ORDER BY tStamp MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp PATTERN (START OTHER* END) INTERVAL '1' HOUR DEFINE START AS START.status = ’received’, END AS END.status = ‘shipped’ ) ) GROUP BY TUMBLE(tStamp, INTERVAL '1' DAY)
  • 27. Conclusion  Apache Flink handles CEP and analytical workloads  Apache Flink offers intuitive APIs  New class of applications by CEP and Stream SQL integration  27
  • 29. 29 Stream Processing and Apache Flink®'s approach to it @StephanEwen Apache Flink PMC CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN SEPTEMBER11-13, 2017 BERLIN.FLINK-FORWARD.ORG -