SlideShare a Scribd company logo
FEBRUARY 9, 2017, WARSAW
Stream Analytics with SQL on Apache Flink®
Fabian Hueske | Apache Flink PMC member | Co-founder dataArtisans
FEBRUARY 9, 2017, WARSAW
Streams are Everywhere
FEBRUARY 9, 2017, WARSAW
Data Analytics on Streaming Data
• Periodic batch processing
• Lots of duct tape and baling wire
• It’s up to you to make
everything work… reliably!
• High latency
• Continuous stream processing
• Framework takes care of failures
• Low latency
FEBRUARY 9, 2017, WARSAW
Stream Processing in Apache Flink
• Platform for scalable stream processing
• Fast
• Low latency and high throughput
• Accurate
• Stateful streaming processing in event time
• Reliable
• Exactly-once state guarantees
• Highly available cluster setup
FEBRUARY 9, 2017, WARSAW
Streaming Applications Powered by Flink
30 Flink applications in production for more than
one year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining
state of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
FEBRUARY 9, 2017, WARSAW
Stream Processing is not for Everybody, … yet
• APIs of open source stream processors target developers
• Implementing streaming applications requires knowledge & skill
• Stream processing concepts (time, state, windows, triggers, ...)
• Programming experience (Java / Scala APIs)
• Stream processing technology spreads rapidly
• There is a talent gap
FEBRUARY 9, 2017, WARSAW
What about SQL?
• SQL is the most widely used language for data analytics
• Many good reasons to use SQL
• Declarative specification
• Optimization
• Efficient execution
• “Everybody” knows SQL
• SQL would make stream processing much more accessible, but…
FEBRUARY 9, 2017, WARSAW
No OS Stream Processor Offers Decent SQL Support
• SQL was not designed with streaming data in mind
• Relations are sets. Streams are infinite sequences.
• Records arrive over time.
• Syntax
• Time-based operations are cumbersome to specify (aggregates, joins)
• Semantics
• A SQL query should compute the same result on a batch table and a stream
FEBRUARY 9, 2017, WARSAW
• Standard SQL and LINQ-style Table API
• Unified APIs for batch & streaming data
• Common translation layers
• Optimization based on Apache Calcite
• Type system & code-generation
• Table sources & sinks
• Streaming SQL & Table API is work in
progress
Flink’s SQL Support & Table API
FEBRUARY 9, 2017, WARSAW
What are the Use Cases for Stream SQL?
• Continuous ETL & Data Import
• Live Dashboards & Reports
• Ad-hoc Analytics & Exploration
FEBRUARY 9, 2017, WARSAW
Dynamic Tables
• Core concept is a “Dynamic Table”
• Dynamic tables change over time
• Dynamic tables are treated like static batch tables
• Dynamic tables are queried with standard SQL
• A query returns another dynamic table
• Stream ←→ Dynamic Table conversions without information loss
• “Stream / Table Duality”
FEBRUARY 9, 2017, WARSAW
Stream → Dynamic Table
• Append
• Replace by Key
time k
1 A
2 B
4 A
5 C
7 B
8 A
9 B
… …
time k
2, B4, A5, C7, B8, A9, B 1, A
2, B4, A5, C7, B8, A9, B 1, A
8 A
9 B
5 C
… …
FEBRUARY 9, 2017, WARSAW
Querying a Dynamic Table
• Dynamic tables change over time
• A[t]: Table A at time t
• Dynamic tables are queried with regular SQL
• Result of a query changes as input table changes
• q(A[t]): Evaluate query q on table A at time t
• As time t progresses, the query result is continuously updated
• similar to maintaining a materialized view
• t is current event time
FEBRUARY 9, 2017, WARSAW
Querying a Dynamic Table
time k
k cnt
A 3
B 2
C 1
9 B
k cnt
A 3
B 3
C 1
12 C
k cnt
A 3
B 3
C 2
A[8]
A[9]
A[12]
q(A[8])
q(A[9])
q(A[12])
Table A
q:
SELECT
k,
COUNT(k) as cnt
FROM A
GROUP BY k
1 A
2 B
4 A
5 C
7 B
8 A
FEBRUARY 9, 2017, WARSAW
time k
A[5]
A[10]
A[15]
q(A[5])
q(A[10])
q(A[15])
Table A
Querying a Dynamic Table
7 B
8 A
9 B
11 A
12 C
14 C
15 A
k cnt endT
A 2 5
B 1 5
C 1 5
q(A)
A 1 10
B 2 10
A 2 15
C 2 15
q:
SELECT
k,
COUNT(k) AS cnt,
TUMBLE_END(
time,
INTERVAL '5' SECONDS)
AS endT
FROM A
GROUP BY
k,
TUMBLE(
time,
INTERVAL '5' SECONDS)
1 A
2 B
4 A
5 C
FEBRUARY 9, 2017, WARSAW
Can We Run Any Query on Dynamic Tables?
• No 
• There are state and computation constraints
• State may not grow infinitely as more data arrives
• Clean-up timeout must be defined
• Input updates may only trigger partial re-computation of the result
• Queries with possibly unbounded state or computation are rejected
• Optimizer performs validation
FEBRUARY 9, 2017, WARSAW
Bounding the State of a Query
• State grows infinitely with domain of grouping attribute
• Bound query input by time
• Query aggregates data of last 24 hours. Older data is discarded.
SELECT k, COUNT(k) AS cnt
FROM A
GROUP BY k
SELECT k, COUNT(k) AS cnt
FROM A
WHERE last(time, INTERVAL ‘1’ DAY)
GROUP BY k
STOP!
UNBOUNED
STATE!
FEBRUARY 9, 2017, WARSAW
Updating Results and Late Arriving Data
• Sometimes emitted results need to be updated
• Results which are continuously updated
• Results for which relevant records arrived late
• Results that might be updated must be kept as state
• Clean-up timeout
• When a table is converted into a stream, updates must be propagated
• Update mode
• Add/Retract mode
FEBRUARY 9, 2017, WARSAW
Dynamic Table → Stream: Update Mode
time k
Table A
B, 1A, 2C, 1B, 2A, 3 A, 1
SELECT
k,
COUNT(k) AS cnt
FROM A
GROUP BY k
1 A
2 B
4 A
5 C
7 B
8 A
… …
Update by Key
FEBRUARY 9, 2017, WARSAW
Dynamic Table → Stream: Add/Retract Mode
time k
Table A
+ B, 1+ A, 2+ C, 1+ B, 2+ A, 3 + A, 1- A, 1- B, 1- A, 2
1 A
2 B
4 A
5 C
7 B
8 A
… …
SELECT
k,
COUNT(k) AS cnt
FROM A
GROUP BY k
Add (+) / Retract (-)
FEBRUARY 9, 2017, WARSAW
Current State of SQL and Table API
• Huge interest and many contributors
• Current development efforts
• Adding more window operators
• Introducing dynamic tables
• And there is a lot more to do
• New operators and features for streaming and batch
• Performance improvements
• Tooling and integration
• Try it out, give feedback, and start contributing!
FEBRUARY 9, 2017, WARSAW
Ready for More Stream Processing with Flink?
Preview will be available via O’Reilly Early Release in the next weeks
FEBRUARY 9, 2017, WARSAW
Stream Analytics with SQL on Apache Flink
Fabian Hueske | @fhueske

More Related Content

PDF
NAPE 2019 Presentation
PPTX
WHODIS_kearns_presentation.v0a
PPTX
Data Quality Assurance
PPT
Dan Querimit - BI Portfolio
PDF
RIPE Atlas
PDF
A new R package for analysing TIMES data
PDF
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
PPTX
Reaching State Zero Without Losing Your Versions
NAPE 2019 Presentation
WHODIS_kearns_presentation.v0a
Data Quality Assurance
Dan Querimit - BI Portfolio
RIPE Atlas
A new R package for analysing TIMES data
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Reaching State Zero Without Losing Your Versions

What's hot (18)

PDF
Results of 3 regional cigs
PDF
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
PDF
Change Data Feed in Delta
PPTX
What New In TFS2015
DOC
Sukhwant resume
PDF
City of Roseville Case Study
PPT
An End User Perspective on Implementing Oracle in the Engineering Environment
PDF
1Spatial: Cardiff FME World Tour: Time machines and attribute alchemy
PDF
R-tools to analyse bird data_Henk sierdsema_Bird numbers 2016
DOC
Complete Portfolio
PDF
Designing a modern data warehouse in azure
PDF
Exploratory Analysis of Spark Structured Streaming
PDF
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
PPT
Integrating CAD and GIS Data at Mineta San Jose International Airport
PPTX
DBAs vs Developers: JSON in SQL Server
PPTX
Cruising in data lake from zero to scale
PPTX
JSON in SQL Server 2016
PPTX
DBAs vs Developers - JSON in SQL Server
Results of 3 regional cigs
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Change Data Feed in Delta
What New In TFS2015
Sukhwant resume
City of Roseville Case Study
An End User Perspective on Implementing Oracle in the Engineering Environment
1Spatial: Cardiff FME World Tour: Time machines and attribute alchemy
R-tools to analyse bird data_Henk sierdsema_Bird numbers 2016
Complete Portfolio
Designing a modern data warehouse in azure
Exploratory Analysis of Spark Structured Streaming
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Integrating CAD and GIS Data at Mineta San Jose International Airport
DBAs vs Developers: JSON in SQL Server
Cruising in data lake from zero to scale
JSON in SQL Server 2016
DBAs vs Developers - JSON in SQL Server
Ad

Similar to Stream Analytics with SQL on Apache Flink - Fabian Hueske (20)

PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PPTX
Stream Analytics with SQL on Apache Flink
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
PDF
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
PDF
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
PPTX
Stream Analytics with SQL on Apache Flink
PPTX
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
PPTX
Project_Plan-Datalake_v1.0_26-10-2022.pptx
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PDF
Learn from HomeAway Hadoop Development and Operations Best Practices
PDF
Streaming SQL Foundations: Why I ❤ Streams+Tables
PDF
Delta Architecture
PDF
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
PDF
Streaming SQL
PDF
AWS Innovate: Running Databases in AWS- Russell Nash
PPT
NoSQL_Night
PDF
What's new in SQL Server 2017
PDF
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
PPTX
Real-Time Analytics with Spark and MemSQL
PDF
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Stream Analytics with SQL on Apache Flink
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Project_Plan-Datalake_v1.0_26-10-2022.pptx
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Learn from HomeAway Hadoop Development and Operations Best Practices
Streaming SQL Foundations: Why I ❤ Streams+Tables
Delta Architecture
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Streaming SQL
AWS Innovate: Running Databases in AWS- Russell Nash
NoSQL_Night
What's new in SQL Server 2017
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Real-Time Analytics with Spark and MemSQL
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Ad

More from Evention (20)

PDF
The Factorization Machines algorithm for building recommendation system - Paw...
PDF
A/B testing powered by Big data - Saurabh Goyal, Booking.com
PDF
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
PDF
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
PDF
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
PDF
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
PDF
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
PDF
Enhancing Spark - increase streaming capabilities of your applications - Kami...
PDF
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
PDF
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
PDF
Scaling Cassandra in all directions - Jimmy Mardell Spotify
PDF
Big Data for unstructured data Dariusz Śliwa
PDF
Elastic development. Implementing Big Data search Grzegorz Kołpuć
PDF
H2 o deep water making deep learning accessible to everyone -jo-fai chow
PDF
That won’t fit into RAM - Michał Brzezicki
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
The Factorization Machines algorithm for building recommendation system - Paw...
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Privacy by Design - Lars Albertsson, Mapflat
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Big Data for unstructured data Dariusz Śliwa
Elastic development. Implementing Big Data search Grzegorz Kołpuć
H2 o deep water making deep learning accessible to everyone -jo-fai chow
That won’t fit into RAM - Michał Brzezicki
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...

Recently uploaded (20)

PPTX
modul_python (1).pptx for professional and student
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to Data Science and Data Analysis
PPT
Quality review (1)_presentation of this 21
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Predictive modeling basics in data cleaning process
modul_python (1).pptx for professional and student
IB Computer Science - Internal Assessment.pptx
Business Analytics and business intelligence.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Introduction-to-Cloud-ComputingFinal.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Optimise Shopper Experiences with a Strong Data Estate.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Data Science and Data Analysis
Quality review (1)_presentation of this 21
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Predictive modeling basics in data cleaning process

Stream Analytics with SQL on Apache Flink - Fabian Hueske

  • 1. FEBRUARY 9, 2017, WARSAW Stream Analytics with SQL on Apache Flink® Fabian Hueske | Apache Flink PMC member | Co-founder dataArtisans
  • 2. FEBRUARY 9, 2017, WARSAW Streams are Everywhere
  • 3. FEBRUARY 9, 2017, WARSAW Data Analytics on Streaming Data • Periodic batch processing • Lots of duct tape and baling wire • It’s up to you to make everything work… reliably! • High latency • Continuous stream processing • Framework takes care of failures • Low latency
  • 4. FEBRUARY 9, 2017, WARSAW Stream Processing in Apache Flink • Platform for scalable stream processing • Fast • Low latency and high throughput • Accurate • Stateful streaming processing in event time • Reliable • Exactly-once state guarantees • Highly available cluster setup
  • 5. FEBRUARY 9, 2017, WARSAW Streaming Applications Powered by Flink 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second
  • 6. FEBRUARY 9, 2017, WARSAW Stream Processing is not for Everybody, … yet • APIs of open source stream processors target developers • Implementing streaming applications requires knowledge & skill • Stream processing concepts (time, state, windows, triggers, ...) • Programming experience (Java / Scala APIs) • Stream processing technology spreads rapidly • There is a talent gap
  • 7. FEBRUARY 9, 2017, WARSAW What about SQL? • SQL is the most widely used language for data analytics • Many good reasons to use SQL • Declarative specification • Optimization • Efficient execution • “Everybody” knows SQL • SQL would make stream processing much more accessible, but…
  • 8. FEBRUARY 9, 2017, WARSAW No OS Stream Processor Offers Decent SQL Support • SQL was not designed with streaming data in mind • Relations are sets. Streams are infinite sequences. • Records arrive over time. • Syntax • Time-based operations are cumbersome to specify (aggregates, joins) • Semantics • A SQL query should compute the same result on a batch table and a stream
  • 9. FEBRUARY 9, 2017, WARSAW • Standard SQL and LINQ-style Table API • Unified APIs for batch & streaming data • Common translation layers • Optimization based on Apache Calcite • Type system & code-generation • Table sources & sinks • Streaming SQL & Table API is work in progress Flink’s SQL Support & Table API
  • 10. FEBRUARY 9, 2017, WARSAW What are the Use Cases for Stream SQL? • Continuous ETL & Data Import • Live Dashboards & Reports • Ad-hoc Analytics & Exploration
  • 11. FEBRUARY 9, 2017, WARSAW Dynamic Tables • Core concept is a “Dynamic Table” • Dynamic tables change over time • Dynamic tables are treated like static batch tables • Dynamic tables are queried with standard SQL • A query returns another dynamic table • Stream ←→ Dynamic Table conversions without information loss • “Stream / Table Duality”
  • 12. FEBRUARY 9, 2017, WARSAW Stream → Dynamic Table • Append • Replace by Key time k 1 A 2 B 4 A 5 C 7 B 8 A 9 B … … time k 2, B4, A5, C7, B8, A9, B 1, A 2, B4, A5, C7, B8, A9, B 1, A 8 A 9 B 5 C … …
  • 13. FEBRUARY 9, 2017, WARSAW Querying a Dynamic Table • Dynamic tables change over time • A[t]: Table A at time t • Dynamic tables are queried with regular SQL • Result of a query changes as input table changes • q(A[t]): Evaluate query q on table A at time t • As time t progresses, the query result is continuously updated • similar to maintaining a materialized view • t is current event time
  • 14. FEBRUARY 9, 2017, WARSAW Querying a Dynamic Table time k k cnt A 3 B 2 C 1 9 B k cnt A 3 B 3 C 1 12 C k cnt A 3 B 3 C 2 A[8] A[9] A[12] q(A[8]) q(A[9]) q(A[12]) Table A q: SELECT k, COUNT(k) as cnt FROM A GROUP BY k 1 A 2 B 4 A 5 C 7 B 8 A
  • 15. FEBRUARY 9, 2017, WARSAW time k A[5] A[10] A[15] q(A[5]) q(A[10]) q(A[15]) Table A Querying a Dynamic Table 7 B 8 A 9 B 11 A 12 C 14 C 15 A k cnt endT A 2 5 B 1 5 C 1 5 q(A) A 1 10 B 2 10 A 2 15 C 2 15 q: SELECT k, COUNT(k) AS cnt, TUMBLE_END( time, INTERVAL '5' SECONDS) AS endT FROM A GROUP BY k, TUMBLE( time, INTERVAL '5' SECONDS) 1 A 2 B 4 A 5 C
  • 16. FEBRUARY 9, 2017, WARSAW Can We Run Any Query on Dynamic Tables? • No  • There are state and computation constraints • State may not grow infinitely as more data arrives • Clean-up timeout must be defined • Input updates may only trigger partial re-computation of the result • Queries with possibly unbounded state or computation are rejected • Optimizer performs validation
  • 17. FEBRUARY 9, 2017, WARSAW Bounding the State of a Query • State grows infinitely with domain of grouping attribute • Bound query input by time • Query aggregates data of last 24 hours. Older data is discarded. SELECT k, COUNT(k) AS cnt FROM A GROUP BY k SELECT k, COUNT(k) AS cnt FROM A WHERE last(time, INTERVAL ‘1’ DAY) GROUP BY k STOP! UNBOUNED STATE!
  • 18. FEBRUARY 9, 2017, WARSAW Updating Results and Late Arriving Data • Sometimes emitted results need to be updated • Results which are continuously updated • Results for which relevant records arrived late • Results that might be updated must be kept as state • Clean-up timeout • When a table is converted into a stream, updates must be propagated • Update mode • Add/Retract mode
  • 19. FEBRUARY 9, 2017, WARSAW Dynamic Table → Stream: Update Mode time k Table A B, 1A, 2C, 1B, 2A, 3 A, 1 SELECT k, COUNT(k) AS cnt FROM A GROUP BY k 1 A 2 B 4 A 5 C 7 B 8 A … … Update by Key
  • 20. FEBRUARY 9, 2017, WARSAW Dynamic Table → Stream: Add/Retract Mode time k Table A + B, 1+ A, 2+ C, 1+ B, 2+ A, 3 + A, 1- A, 1- B, 1- A, 2 1 A 2 B 4 A 5 C 7 B 8 A … … SELECT k, COUNT(k) AS cnt FROM A GROUP BY k Add (+) / Retract (-)
  • 21. FEBRUARY 9, 2017, WARSAW Current State of SQL and Table API • Huge interest and many contributors • Current development efforts • Adding more window operators • Introducing dynamic tables • And there is a lot more to do • New operators and features for streaming and batch • Performance improvements • Tooling and integration • Try it out, give feedback, and start contributing!
  • 22. FEBRUARY 9, 2017, WARSAW Ready for More Stream Processing with Flink? Preview will be available via O’Reilly Early Release in the next weeks
  • 23. FEBRUARY 9, 2017, WARSAW Stream Analytics with SQL on Apache Flink Fabian Hueske | @fhueske