SlideShare a Scribd company logo
1
Timo Walther
Apache Flink PMC
@twalthr
With slides from Fabian Hueske
Flink Meetup @ Amsterdam, March 2nd, 2017
Table & SQL API
unified APIs for batch and stream processing
2
Original creators of
Apache Flink®
Providers of the
dA Platform, a supported
Flink distribution
Motivation
3
DataStream API is not for Everyone
4
§ Writing DataStream programs is not easy
• Stream processing technology spreads rapidly
§ Requires Knowledge & Skill
• Stream processing concepts (time, state, windows, ...)
• Programming experience (Java / Scala)
§ Program logic goes into UDFs
• great for expressiveness
• bad for optimization - need for manual tuning
Why not a Relational API?
5
§ Relational APIs are declarative
• User says what is needed
• System decides how to compute it
§ Users do not specify implementation
§ Queries are efficiently executed
§ “Everybody” knows SQL!
Goals
§ Flink is a platform for distributed stream and batch data
processing
§ Relational APIs as a unifying layer
• Queries on batch tables terminate and produce a finite result
• Queries on streaming tables run continuously and produce
result stream
§ Same syntax & semantics for both queries
6
Table API & SQL
7
Table API & SQL
§ Flink features two relational APIs
• Table API: LINQ-style API for Java & Scala (since Flink 0.9.0)
• SQL: Standard SQL (since Flink 1.1.0)
§ Equivalent feature set (at the moment)
• Table API and SQL can be mixed
§ Both are tightly integrated with Flink’s core APIs
• DataStream
• DataSet
8
Table API Example
9
val sensorData: DataStream[(String, Long, Double)] = ???
// convert DataSet into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, ’time, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
SQL Example
10
val sensorData: DataStream[(String, Long, Double)] = ???
// register DataStream
tableEnv.registerDataStream(
"sensorData", sensorData, 'location, ’time, 'tempF)
// query registered Table
val avgTempCTable: Table = tableEnv
.sql("""
SELECT FLOOR(rowtime() TO DAY) AS day, location,
AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%'
GROUP BY location, FLOOR(rowtime() TO DAY) """)
Architecture
2 APIs [SQL, Table API]
*
2 backends [DataStream, DataSet]
=
4 different translation paths?
11
Architecture
12
Architecture
§ Table API and SQL queries
are translated into common
logical plan representation.
§ Logical plans are translated
and optimized depending on
execution backend.
§ Plans are transformed into
DataSet or DataStream
programs.
13
Translation to Logical Plan
14
sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
Translation to Optimized Plan
15
Translation to Flink Program
16
Current State (in master)
§ Batch SQL & Table API support
• Selection, Projection, Sort, Inner & Outer Joins, Set operations
• Windows for Slide, Tumble, Session
§ Streaming Table API support
• Selection, Projection, Union
• Windows for Slide, Tumble, Session
§ Streaming SQL
• Selection, Projection, Union, Tumble, but …
17
Use Cases for Streaming SQL
§ Continuous ETL & Data Import
§ Live Dashboards & Reports
§ Ad-hoc Analytics & Exploration
18
Outlook: Dynamic Tables
19
Dynamic Tables
§ Dynamic tables change over time
§ Dynamic tables are treated like static batch tables
• Dynamic tables are queried with standard SQL
• A query returns another dynamic table
§ Stream ←→ Dynamic Table conversions without
information loss
• “Stream / Table Duality”
20
Stream to Dynamic Tables
§ Append:
§ Replace by key:
21
Querying Dynamic Tables
§ Dynamic tables change over time
• A[t]: Table A at time t
§ Dynamic tables are queried with regular SQL
• Result of a query changes as input table changes
• q(A[t]): Evaluate query q on table A at time t
§ Query result is continuously updated as t progresses
• Similar to maintaining a materialized view
• t is current event time
22
Querying Dynamic Tables
23
Querying Dynamic Tables
§ Can we run any query on Dynamic Tables? No!
§ State may not grow infinitely as more data arrives
• Set clean-up timeout or key constraints.
§ Input may only trigger partial re-computation
§ Queries with possibly unbounded state or computation
are rejected
24
Dynamic Tables to Stream
§ Update:
25
Dynamic Tables to Stream
§ Add/Retract:
26
Result computation & refinement
27
Contributions welcome!
§ Huge interest and many contributors
• Adding more window operators
• Introducing dynamic tables
§ And there is a lot more to do
• New operators and features for streaming and batch
• Performance improvements
• Tooling and integration
§ Try it out, give feedback, and start contributing!
28
29
One day of hands-on Flink
training
One day of conference
Tickets are on sale
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward
We are hiring!
data-artisans.com/careers
3
Thank you!
@twalthr
@ApacheFlink
@dataArtisans

More Related Content

What's hot (11)

Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
Optimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScale
RightScale
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Akka Streams
Akka Streams
Diego Pacheco
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Online index rebuild automation
Online index rebuild automation
Carlos Sierra
 
Stream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 
Why and how to leverage the power and simplicity of SQL on Apache Flink
Why and how to leverage the power and simplicity of SQL on Apache Flink
Fabian Hueske
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward
 
Optimizing Your Cloud Applications in RightScale
Optimizing Your Cloud Applications in RightScale
RightScale
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Online index rebuild automation
Online index rebuild automation
Carlos Sierra
 
Stream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 

Viewers also liked (20)

Dive into Spark Streaming
Dive into Spark Streaming
Gerard Maas
 
The Power of the Log
The Power of the Log
Ben Stopford
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast
Viktor Gamov
 
Stream all the things
Stream all the things
Dean Wampler
 
Streamsets and spark
Streamsets and spark
Hari Shreedharan
 
Apache Beam
Apache Beam
Adil Oulghard
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
Manuel Hurtado
 
Kudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Slim Baltagi
 
Complex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2
 
Streaming all the things with akka streams
Streaming all the things with akka streams
Johan Andrén
 
Dive into Spark Streaming
Dive into Spark Streaming
Gerard Maas
 
The Power of the Log
The Power of the Log
Ben Stopford
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
[OracleCode SF] In memory analytics with apache spark and hazelcast
Viktor Gamov
 
Stream all the things
Stream all the things
Dean Wampler
 
Apache Spark and Oracle Stream Analytics
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
Manuel Hurtado
 
Kudu Forrester Webinar
Kudu Forrester Webinar
Cloudera, Inc.
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Slim Baltagi
 
Complex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2
 
Streaming all the things with akka streams
Streaming all the things with akka streams
Johan Andrén
 
Ad

Similar to Apache Flink's Table & SQL API - unified APIs for batch and stream processing (20)

Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
Fabian Hueske
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
Timo Walther
 
Towards sql for streams
Towards sql for streams
Radu Tudoran
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
HostedbyConfluent
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
HostedbyConfluent
 
Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
Fabian Hueske
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
Timo Walther
 
Towards sql for streams
Towards sql for streams
Radu Tudoran
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
HostedbyConfluent
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
HostedbyConfluent
 
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
HostedbyConfluent
 
Ad

Recently uploaded (20)

Transmission Media. (Computer Networks)
Transmission Media. (Computer Networks)
S Pranav (Deepu)
 
Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
AI-Powered Compliance Solutions for Global Regulations | Certivo
AI-Powered Compliance Solutions for Global Regulations | Certivo
certivoai
 
How the US Navy Approaches DevSecOps with Raise 2.0
How the US Navy Approaches DevSecOps with Raise 2.0
Anchore
 
wAIred_RabobankIgniteSession_12062025.pptx
wAIred_RabobankIgniteSession_12062025.pptx
SimonedeGijt
 
Plooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your way
Plooma
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Making significant Software Architecture decisions
Making significant Software Architecture decisions
Bert Jan Schrijver
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
dp-700 exam questions sample docume .pdf
dp-700 exam questions sample docume .pdf
pravkumarbiz
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Step by step guide to install Flutter and Dart
Step by step guide to install Flutter and Dart
S Pranav (Deepu)
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
GDG Douglas - Google AI Agents: Your Next Intern?
GDG Douglas - Google AI Agents: Your Next Intern?
felipeceotto
 
Software Testing & it’s types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
Porting Qt 5 QML Modules to Qt 6 Webinar
Porting Qt 5 QML Modules to Qt 6 Webinar
ICS
 
Transmission Media. (Computer Networks)
Transmission Media. (Computer Networks)
S Pranav (Deepu)
 
Open Source Software Development Methods
Open Source Software Development Methods
VICTOR MAESTRE RAMIREZ
 
AI-Powered Compliance Solutions for Global Regulations | Certivo
AI-Powered Compliance Solutions for Global Regulations | Certivo
certivoai
 
How the US Navy Approaches DevSecOps with Raise 2.0
How the US Navy Approaches DevSecOps with Raise 2.0
Anchore
 
wAIred_RabobankIgniteSession_12062025.pptx
wAIred_RabobankIgniteSession_12062025.pptx
SimonedeGijt
 
Plooma is a writing platform to plan, write, and shape books your way
Plooma is a writing platform to plan, write, and shape books your way
Plooma
 
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Wondershare PDFelement Pro 11.4.20.3548 Crack Free Download
Puppy jhon
 
Making significant Software Architecture decisions
Making significant Software Architecture decisions
Bert Jan Schrijver
 
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORK.P.pptx
usmanch7829
 
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
MOVIE RECOMMENDATION SYSTEM, UDUMULA GOPI REDDY, Y24MC13085.pptx
Maharshi Mallela
 
UPDASP a project coordination unit ......
UPDASP a project coordination unit ......
withrj1
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
dp-700 exam questions sample docume .pdf
dp-700 exam questions sample docume .pdf
pravkumarbiz
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Step by step guide to install Flutter and Dart
Step by step guide to install Flutter and Dart
S Pranav (Deepu)
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
GDG Douglas - Google AI Agents: Your Next Intern?
GDG Douglas - Google AI Agents: Your Next Intern?
felipeceotto
 
Software Testing & it’s types (DevOps)
Software Testing & it’s types (DevOps)
S Pranav (Deepu)
 
Porting Qt 5 QML Modules to Qt 6 Webinar
Porting Qt 5 QML Modules to Qt 6 Webinar
ICS
 

Apache Flink's Table & SQL API - unified APIs for batch and stream processing

  • 1. 1 Timo Walther Apache Flink PMC @twalthr With slides from Fabian Hueske Flink Meetup @ Amsterdam, March 2nd, 2017 Table & SQL API unified APIs for batch and stream processing
  • 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 4. DataStream API is not for Everyone 4 § Writing DataStream programs is not easy • Stream processing technology spreads rapidly § Requires Knowledge & Skill • Stream processing concepts (time, state, windows, ...) • Programming experience (Java / Scala) § Program logic goes into UDFs • great for expressiveness • bad for optimization - need for manual tuning
  • 5. Why not a Relational API? 5 § Relational APIs are declarative • User says what is needed • System decides how to compute it § Users do not specify implementation § Queries are efficiently executed § “Everybody” knows SQL!
  • 6. Goals § Flink is a platform for distributed stream and batch data processing § Relational APIs as a unifying layer • Queries on batch tables terminate and produce a finite result • Queries on streaming tables run continuously and produce result stream § Same syntax & semantics for both queries 6
  • 7. Table API & SQL 7
  • 8. Table API & SQL § Flink features two relational APIs • Table API: LINQ-style API for Java & Scala (since Flink 0.9.0) • SQL: Standard SQL (since Flink 1.1.0) § Equivalent feature set (at the moment) • Table API and SQL can be mixed § Both are tightly integrated with Flink’s core APIs • DataStream • DataSet 8
  • 9. Table API Example 9 val sensorData: DataStream[(String, Long, Double)] = ??? // convert DataSet into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, ’time, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 10. SQL Example 10 val sensorData: DataStream[(String, Long, Double)] = ??? // register DataStream tableEnv.registerDataStream( "sensorData", sensorData, 'location, ’time, 'tempF) // query registered Table val avgTempCTable: Table = tableEnv .sql(""" SELECT FLOOR(rowtime() TO DAY) AS day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%' GROUP BY location, FLOOR(rowtime() TO DAY) """)
  • 11. Architecture 2 APIs [SQL, Table API] * 2 backends [DataStream, DataSet] = 4 different translation paths? 11
  • 13. Architecture § Table API and SQL queries are translated into common logical plan representation. § Logical plans are translated and optimized depending on execution backend. § Plans are transformed into DataSet or DataStream programs. 13
  • 14. Translation to Logical Plan 14 sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 16. Translation to Flink Program 16
  • 17. Current State (in master) § Batch SQL & Table API support • Selection, Projection, Sort, Inner & Outer Joins, Set operations • Windows for Slide, Tumble, Session § Streaming Table API support • Selection, Projection, Union • Windows for Slide, Tumble, Session § Streaming SQL • Selection, Projection, Union, Tumble, but … 17
  • 18. Use Cases for Streaming SQL § Continuous ETL & Data Import § Live Dashboards & Reports § Ad-hoc Analytics & Exploration 18
  • 20. Dynamic Tables § Dynamic tables change over time § Dynamic tables are treated like static batch tables • Dynamic tables are queried with standard SQL • A query returns another dynamic table § Stream ←→ Dynamic Table conversions without information loss • “Stream / Table Duality” 20
  • 21. Stream to Dynamic Tables § Append: § Replace by key: 21
  • 22. Querying Dynamic Tables § Dynamic tables change over time • A[t]: Table A at time t § Dynamic tables are queried with regular SQL • Result of a query changes as input table changes • q(A[t]): Evaluate query q on table A at time t § Query result is continuously updated as t progresses • Similar to maintaining a materialized view • t is current event time 22
  • 24. Querying Dynamic Tables § Can we run any query on Dynamic Tables? No! § State may not grow infinitely as more data arrives • Set clean-up timeout or key constraints. § Input may only trigger partial re-computation § Queries with possibly unbounded state or computation are rejected 24
  • 25. Dynamic Tables to Stream § Update: 25
  • 26. Dynamic Tables to Stream § Add/Retract: 26
  • 27. Result computation & refinement 27
  • 28. Contributions welcome! § Huge interest and many contributors • Adding more window operators • Introducing dynamic tables § And there is a lot more to do • New operators and features for streaming and batch • Performance improvements • Tooling and integration § Try it out, give feedback, and start contributing! 28
  • 29. 29 One day of hands-on Flink training One day of conference Tickets are on sale Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward