SlideShare a Scribd company logo
8
Most read
17
Most read
19
Most read
Flink 2.0: Navigating the Future of
Unified Stream and Batch Processing
Martijn Visser
Senior Product Manager and
Apache Flink PMC member
2
Real-time services rely on stream processing
Real-time
Data
A Sale
A Shipment
A Trade
Rich Front-End
Customer
Experiences
A Customer
Experience
Real-Time Backend
Operations
Real-time Stream Processing
3
Developers choose Flink because of its performance
and rich feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
4
The Future of Unified Stream and Batch Processing
5
Four Focus Areas
Mixed Unification
Mixed Unification
Use the Unified API and
Mix and Switch
automagically between
Batch and Stream
Execution modes, for
example when needing
to reprocess or backfill
data.
Unified SQL
Platform
Add support for other
common SQL elements
like DELETE, UPDATE,
Stored Procedures, Time
Travel and unstructured
data types.
Streaming
Warehouses
Integrate Streaming and
Batch processing with
real-time analytics and
up-to-date storage,
blending traditional
data warehouse benefits
with instant insights.
Engine Evolution
Engine Evolution
Cloud-native,
Disaggregated State
Backends, New APIs,
SQL Gateway, JDBC
Driver and much more
Mixed Unification
6
• Flink supports Batch Execution and Streaming Execution mode
• What if you want to do backfill or reprocessing?
MySCL CDC → phase 1 reads from bounded snapshot, phase 2 from unbounded binlog.
S3 + Kafka (HybridSource) → read historical data from your lake before switching to real-time
A couple of proposals:
• FLIP-327: Support switching from batch to stream mode to
improve throughput when processing backlog data
• FLIP-309 Larger checkpointing interval processing backlog
• FLIP-326: Enhance Watermark to Support Processing-Time
Temporal Join
Unified SQL Platform: New DML Syntax
7
DELETE FROM user WHERE id = -1;
DELETE FROM user WHERE id > (SELECT count(*) FROM employee);
UPDATE user SET name = "u1" WHERE id > 10;
UPDATE user SET name = "u1" WHERE id > (SELECT count(*) FROM employee);
TRUNCATE TABLE user;
CALL `my_cat`.`my_db`.add_user("Martijn","Product Manager");
Unified SQL Platform: Time Travel
8
now
SELECT * FROM t FOR SYSTEM_TIME AS OF
TIMESTAMP '2023-03-19 00:00:00';
SELECT * FROM t;
SELECT * FROM t AS OF
FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP;
Streaming Warehouses
9
• Unified changelog & table representation, originated as FLIP-188: Introduce Built-in Dynamic
Table Storage
Now known as Apache Paimon (Incubating)
• Improve OLAP support, like quicker short-lived jobs to support OLAP queries with low latency
and concurrent execution.
• CBO (cost-based optimizations) with statistics
• Make full use of the layout and indexes on streaming lakehouse to reduce data reading and
processing for streaming queries.
Engine Evolution: Cloud-native, Disaggregated State
10
• FLIP-423: Disaggregated
State Storage and
Management
• FLIP-424: Asynchronous
State APIs
• FLIP-425: Asynchronous
Execution Model
• FLIP-426: Grouping
Remote State Access
• FLIP-427: ForSt -
Disaggregated state
Store
• FLIP-428: Fault Tolerance
/Rescale Integration for
Disaggregated State
Engine Evolution: DataStream API V2
11
FLIP-408: Introduce DataStream API V2
1. DataStream API exposes internal concepts and implementation details to users.
2. Complex API that provides primitives that corresponds to concepts of many different levels.
3. Started for Streaming, Batch was added later
Engine Evolution: Dynamic Tables
12
• Flink SQL and Table API always had the concept of Dynamic Tables
• FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines proposes new entity
CREATE DYNAMIC TABLE dwd_orders (
PRIMARY KEY(ds, id) NOT ENFORCED)
DISTRIBUTED BY (ds)
FRESHNESS = INTERVAL '3' MINUTE
AS
SELECT * FROM orders as o
LEFT JOIN order_pay AS pay
ON o.id = pay.order_id and o.ds = pay.ds
Engine Evolution: Polymorphic Table Functions
13
CREATE OR REPLACE PACKAGE BODY dynamic_cols_pkg IS
FUNCTION get_dynamic_cols(column_list column_list_t)
RETURN TABLE PIPELINED IS
BEGIN
-- Example logic to select different columns based on input
-- In practice, use dynamic SQL to build and execute the query
IF column_list.EXISTS(1) THEN
IF column_list(1) = 'name' THEN
PIPE ROW ('John Doe');
ELSIF column_list(1) = 'age' THEN
PIPE ROW (30);
END IF;
END IF;
RETURN;
END get_dynamic_cols;
END dynamic_cols_pkg;
SELECT * FROM
TABLE(dynamic_cols_pkg.get_dynamic_cols(colu
mn_list_t('name')));
14
Flink 2.0
15
Flink 2.0
https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
Flink 2.0 is primarily a clean-up
16
Removed in 2.0 Refactored in 2.0 New in 2.0 (or sooner)
• DataSet API
• Deprecated
methods/fields/classes in
DataStream API and Table
API
• Scala APIs
• Deprecated Source / Sink /
TableSource / Table Sink
interfaces
• Legacy SQL Function and
Operator stack
• Old configuration layer
• Java 8 and 11 support
• Refactor the REST API
• Refactor the Metrics
System
• No default-to-Kryo
serialization
• Default to Java 17 (or 21)
• DataStream API V2
• Dynamic Tables
• Disaggregated State
Backend/Management
APIs
Flink 2.0 Expected Timeline
17
Flink 1.19 released
March 2024
Flink 1.20
Four/five months after
Flink 1.19
Jul/Aug 2024
Flink 2.0
Four/five months after
Flink 1.20
Dec/Jan
Thank You
18
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing

More Related Content

PDF
Stateful stream processing with Apache Flink
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
Getting Started with Splunk Enterprise
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PPTX
Building Data Pipelines for Solr with Apache NiFi
PPTX
Introduction to Kafka and Zookeeper
PDF
Introduction to Apache NiFi 1.11.4
PPTX
Flink Streaming
Stateful stream processing with Apache Flink
Building Reliable Lakehouses with Apache Flink and Delta Lake
Getting Started with Splunk Enterprise
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Building Data Pipelines for Solr with Apache NiFi
Introduction to Kafka and Zookeeper
Introduction to Apache NiFi 1.11.4
Flink Streaming

What's hot (20)

PDF
Nifi workshop
PPTX
Apache Flink and what it is used for
PPTX
Kafka connect 101
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
ODP
Introduction to Kafka connect
PPTX
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
PPTX
Real time big data stream processing
PDF
Databricks Delta Lake and Its Benefits
PDF
Data ingestion and distribution with apache NiFi
PPTX
Kafka and Avro with Confluent Schema Registry
PDF
Stream Processing with Flink and Stream Sharing
PPT
Step-by-Step Introduction to Apache Flink
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
PPTX
Kafka Connect
PPTX
The Current State of Table API in 2022
PDF
Sqoop on Spark for Data Ingestion
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Introduction to Stream Processing
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Nifi workshop
Apache Flink and what it is used for
Kafka connect 101
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Stream processing with Apache Flink (Timo Walther - Ververica)
Introduction to Kafka connect
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Real time big data stream processing
Databricks Delta Lake and Its Benefits
Data ingestion and distribution with apache NiFi
Kafka and Avro with Confluent Schema Registry
Stream Processing with Flink and Stream Sharing
Step-by-Step Introduction to Apache Flink
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Kafka Connect
The Current State of Table API in 2022
Sqoop on Spark for Data Ingestion
Evening out the uneven: dealing with skew in Flink
Introduction to Stream Processing
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Ad

Similar to Flink 2.0: Navigating the Future of Unified Stream and Batch Processing (20)

PPTX
Apache Flink: Past, Present and Future
PPTX
Streaming SQL to unify batch and stream processing: Theory and practice with ...
PPTX
Why and how to leverage the power and simplicity of SQL on Apache Flink
PPTX
Taking a look under the hood of Apache Flink's relational APIs.
PPTX
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PPTX
Stream Analytics with SQL on Apache Flink
PDF
Santander Stream Processing with Apache Flink
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
PDF
Integrating Flink with Hive - Flink Forward SF 2019
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Rivivi il Data in Motion Tour Milano 2024
PPTX
Workshop híbrido: Stream Processing con Flink
PPTX
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PDF
Flink and Hive integration - unifying enterprise data processing systems
PPTX
Data Stream Processing with Apache Flink
PDF
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
PPTX
From Apache Flink® 1.3 to 1.4
Apache Flink: Past, Present and Future
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Why and how to leverage the power and simplicity of SQL on Apache Flink
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Why apache Flink is the 4G of Big Data Analytics Frameworks
Stream Analytics with SQL on Apache Flink
Santander Stream Processing with Apache Flink
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Webinar: Flink SQL in Action - Fabian Hueske
Integrating Flink with Hive - Flink Forward SF 2019
Flexible and Real-Time Stream Processing with Apache Flink
Rivivi il Data in Motion Tour Milano 2024
Workshop híbrido: Stream Processing con Flink
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Flink and Hive integration - unifying enterprise data processing systems
Data Stream Processing with Apache Flink
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
From Apache Flink® 1.3 to 1.4
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
Assigned Numbers - 2025 - Bluetooth® Document
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx

Flink 2.0: Navigating the Future of Unified Stream and Batch Processing

  • 1. Flink 2.0: Navigating the Future of Unified Stream and Batch Processing Martijn Visser Senior Product Manager and Apache Flink PMC member
  • 2. 2 Real-time services rely on stream processing Real-time Data A Sale A Shipment A Trade Rich Front-End Customer Experiences A Customer Experience Real-Time Backend Operations Real-time Stream Processing
  • 3. 3 Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 4. 4 The Future of Unified Stream and Batch Processing
  • 5. 5 Four Focus Areas Mixed Unification Mixed Unification Use the Unified API and Mix and Switch automagically between Batch and Stream Execution modes, for example when needing to reprocess or backfill data. Unified SQL Platform Add support for other common SQL elements like DELETE, UPDATE, Stored Procedures, Time Travel and unstructured data types. Streaming Warehouses Integrate Streaming and Batch processing with real-time analytics and up-to-date storage, blending traditional data warehouse benefits with instant insights. Engine Evolution Engine Evolution Cloud-native, Disaggregated State Backends, New APIs, SQL Gateway, JDBC Driver and much more
  • 6. Mixed Unification 6 • Flink supports Batch Execution and Streaming Execution mode • What if you want to do backfill or reprocessing? MySCL CDC → phase 1 reads from bounded snapshot, phase 2 from unbounded binlog. S3 + Kafka (HybridSource) → read historical data from your lake before switching to real-time A couple of proposals: • FLIP-327: Support switching from batch to stream mode to improve throughput when processing backlog data • FLIP-309 Larger checkpointing interval processing backlog • FLIP-326: Enhance Watermark to Support Processing-Time Temporal Join
  • 7. Unified SQL Platform: New DML Syntax 7 DELETE FROM user WHERE id = -1; DELETE FROM user WHERE id > (SELECT count(*) FROM employee); UPDATE user SET name = "u1" WHERE id > 10; UPDATE user SET name = "u1" WHERE id > (SELECT count(*) FROM employee); TRUNCATE TABLE user; CALL `my_cat`.`my_db`.add_user("Martijn","Product Manager");
  • 8. Unified SQL Platform: Time Travel 8 now SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2023-03-19 00:00:00'; SELECT * FROM t; SELECT * FROM t AS OF FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP;
  • 9. Streaming Warehouses 9 • Unified changelog & table representation, originated as FLIP-188: Introduce Built-in Dynamic Table Storage Now known as Apache Paimon (Incubating) • Improve OLAP support, like quicker short-lived jobs to support OLAP queries with low latency and concurrent execution. • CBO (cost-based optimizations) with statistics • Make full use of the layout and indexes on streaming lakehouse to reduce data reading and processing for streaming queries.
  • 10. Engine Evolution: Cloud-native, Disaggregated State 10 • FLIP-423: Disaggregated State Storage and Management • FLIP-424: Asynchronous State APIs • FLIP-425: Asynchronous Execution Model • FLIP-426: Grouping Remote State Access • FLIP-427: ForSt - Disaggregated state Store • FLIP-428: Fault Tolerance /Rescale Integration for Disaggregated State
  • 11. Engine Evolution: DataStream API V2 11 FLIP-408: Introduce DataStream API V2 1. DataStream API exposes internal concepts and implementation details to users. 2. Complex API that provides primitives that corresponds to concepts of many different levels. 3. Started for Streaming, Batch was added later
  • 12. Engine Evolution: Dynamic Tables 12 • Flink SQL and Table API always had the concept of Dynamic Tables • FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines proposes new entity CREATE DYNAMIC TABLE dwd_orders ( PRIMARY KEY(ds, id) NOT ENFORCED) DISTRIBUTED BY (ds) FRESHNESS = INTERVAL '3' MINUTE AS SELECT * FROM orders as o LEFT JOIN order_pay AS pay ON o.id = pay.order_id and o.ds = pay.ds
  • 13. Engine Evolution: Polymorphic Table Functions 13 CREATE OR REPLACE PACKAGE BODY dynamic_cols_pkg IS FUNCTION get_dynamic_cols(column_list column_list_t) RETURN TABLE PIPELINED IS BEGIN -- Example logic to select different columns based on input -- In practice, use dynamic SQL to build and execute the query IF column_list.EXISTS(1) THEN IF column_list(1) = 'name' THEN PIPE ROW ('John Doe'); ELSIF column_list(1) = 'age' THEN PIPE ROW (30); END IF; END IF; RETURN; END get_dynamic_cols; END dynamic_cols_pkg; SELECT * FROM TABLE(dynamic_cols_pkg.get_dynamic_cols(colu mn_list_t('name')));
  • 16. Flink 2.0 is primarily a clean-up 16 Removed in 2.0 Refactored in 2.0 New in 2.0 (or sooner) • DataSet API • Deprecated methods/fields/classes in DataStream API and Table API • Scala APIs • Deprecated Source / Sink / TableSource / Table Sink interfaces • Legacy SQL Function and Operator stack • Old configuration layer • Java 8 and 11 support • Refactor the REST API • Refactor the Metrics System • No default-to-Kryo serialization • Default to Java 17 (or 21) • DataStream API V2 • Dynamic Tables • Disaggregated State Backend/Management APIs
  • 17. Flink 2.0 Expected Timeline 17 Flink 1.19 released March 2024 Flink 1.20 Four/five months after Flink 1.19 Jul/Aug 2024 Flink 2.0 Four/five months after Flink 1.20 Dec/Jan