SlideShare a Scribd company logo
Presented By:
Kundan Kumar
Software Consultant
Stateful Stream
Processing with
Apache Flink
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Mute
Be on mute until you have
questions or concerns.
Agenda
01 What is stateful stream processing
02 Flink takes on stateful stream processing
Demo
03
What is Stateful Stream Processing?
Streaming and Stream Processing:
Stream processing is the processing of data in motion, or in other words, computing on data directly
as it is produced or received.
The systems that receive and send the data streams and execute the application or analytics logic are
called stream processors.
Stateful Stream Processing:
Stateful stream processing is a subset of stream processing in which the computation maintains
contextual state. This state is used to store information derived from the previously-seen events.
Stateful stream processing means a “State” is shared between events(stream entities). And therefore
past events can influence the way the current events are processed.
Flink takes on stateful stream processing
Flink in nutshell-
● Apache Flink is a Big Data framework and distributed processing engine for stateful
computations over unbounded and bounded data streams.
➢ A Flink application may consume real-time data from streaming sources such as
message queues or distributed logs, like Apache Kafka or Kinesis.
➢ Flink can also consume bounded, historic data from a variety of data sources.
➢ The streams of results being produced by a Flink application can be sent to a wide
variety of systems that can be connected as sinks.
➢ Fast, In memory, scalable, large state, fault tolerant, event time, exactly once.
Source
Transformations
Sink
➢ Programs in Flink are inherently parallel and distributed.
➢ During execution, a stream has one or more stream partitions, and each
operator has one or more operator subtasks.
➢ Flink facilitate stateful operations.
➢ Current handling event can depend on the accumulated effect of all the events that
came before it.
➢ The set of parallel instances of a stateful operator is effectively a sharded key-value
store. Each parallel instance is responsible for handling events for a specific group of
keys, and the state for those keys is kept locally.
Stateful stream processing with Apache Flink
States in Flink
➢ Operator State: State is maintained on per operator basis on stream. Special type of
state used in source and sink implementations.
➢ Keyed State: Maintaining state on per key basis on stream. Stores state associated
with the same key. Embedded key value store.
➢ Broadcast State: Special type of operator state used where records of one stream will
be broadcast to all downstream task which needs access to those records.
➢ Queryable State: Feature that allow client API’s to query Jobstate from outside Flink.
Stateful streaming application in Flink
Stateful stream processing with Apache Flink
State Backends
1. Memory state backend:
➢ This is the default backend used by Flink in case nothing is configured.
➢ Persists the data in the memory of each task manager’s Heap.
➢ This state should never be used in production jobs.
➢ The state creates a backup of the data (also known as checkpointing) in the job
manager memory which puts unnecessary pressure on the job manager's operational
stability.
2. File System Backend
➢ This backend is similar to Memory state backend except, it stores the backup on the
filesystem rather than job manager’s memory.
➢ The filesystem can be task manager's local filesystem or a durable store such as
HDFS/S3.
3. RocksDB backend
➢ This backend uses RocksDB by Facebook to store the data
➢ RocksDB maintains an in-memory table (also known as mem-table) along with bloom
filters, reading recent data also is extremely fast.
➢ Each task manager maintains its own Rocks DB file and the backup of this state is
checkpointed to a durable store such as HDFS/S3.
➢ This is the only backend which offers support for incremental checkpointing i.e. taking a
backup of only modified data rather than complete data.
Checkpointing
Checkpoint: Specific marked point in each input stream from which stream can
replayed. Flink implements it by persisting state of all stateful operator. Periodically
save state to reliable storage system.
Stream Barriers: Lightweight stream marker with unique ID’s. Injected by Flink into
input stream and flow with stream in line.
Checkpointing mechanism
Aligned Checkpointing-
Unaligned Checkpointing-
Demo
Q/A
References
1. https://flink.apache.org
2. https://ci.apache.org/projects/flink/flink-docs-release-1.12/con
cepts/stateful-stream-processing.html#unaligned-checkpointin
g
3. Book: Learning Apache Flink By Tanmay Deshpande
Thank You !
Ad

Recommended

Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Apache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
 
ElasticSearch at berlinbuzzwords 2010
ElasticSearch at berlinbuzzwords 2010
Elasticsearch
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using Terraform
Adin Ermie
 
Git interview questions | Edureka
Git interview questions | Edureka
Edureka!
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
Databricks
 
Introduction to Kibana
Introduction to Kibana
Vineet .
 
Apache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Nifi workshop
Nifi workshop
Yifeng Jiang
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
kafka
kafka
Amikam Snir
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Terraform
Terraform
Phil Wilkins
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
 
Introduction to elasticsearch
Introduction to elasticsearch
hypto
 
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Zurich Flink Meetup
Zurich Flink Meetup
Konstantinos Kloudas
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 

More Related Content

What's hot (20)

Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Nifi workshop
Nifi workshop
Yifeng Jiang
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
kafka
kafka
Amikam Snir
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Terraform
Terraform
Phil Wilkins
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
 
Introduction to elasticsearch
Introduction to elasticsearch
hypto
 
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
Databricks
 
Introduction to elasticsearch
Introduction to elasticsearch
hypto
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 

Similar to Stateful stream processing with Apache Flink (20)

Zurich Flink Meetup
Zurich Flink Meetup
Konstantinos Kloudas
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Flink Meetup Septmeber 2017 2018
Flink Meetup Septmeber 2017 2018
Christos Hadjinikolis
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Stream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Apache flink
Apache flink
pranay kumar
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Flink System Overview
Flink System Overview
Timo Walther
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flink Architecture
Flink Architecture
Prasad Wali
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Stream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Flink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
Flink System Overview
Flink System Overview
Timo Walther
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flink Architecture
Flink Architecture
Prasad Wali
 
Ad

More from Knoldus Inc. (20)

Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Ad

Recently uploaded (20)

Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
The Growing Value and Application of FME & GenAI
The Growing Value and Application of FME & GenAI
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
All Things Open
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
The Growing Value and Application of FME & GenAI
The Growing Value and Application of FME & GenAI
Safe Software
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 

Stateful stream processing with Apache Flink

  • 1. Presented By: Kundan Kumar Software Consultant Stateful Stream Processing with Apache Flink
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Mute Be on mute until you have questions or concerns.
  • 3. Agenda 01 What is stateful stream processing 02 Flink takes on stateful stream processing Demo 03
  • 4. What is Stateful Stream Processing? Streaming and Stream Processing: Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received. The systems that receive and send the data streams and execute the application or analytics logic are called stream processors.
  • 5. Stateful Stream Processing: Stateful stream processing is a subset of stream processing in which the computation maintains contextual state. This state is used to store information derived from the previously-seen events. Stateful stream processing means a “State” is shared between events(stream entities). And therefore past events can influence the way the current events are processed.
  • 6. Flink takes on stateful stream processing Flink in nutshell- ● Apache Flink is a Big Data framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  • 7. ➢ A Flink application may consume real-time data from streaming sources such as message queues or distributed logs, like Apache Kafka or Kinesis. ➢ Flink can also consume bounded, historic data from a variety of data sources. ➢ The streams of results being produced by a Flink application can be sent to a wide variety of systems that can be connected as sinks. ➢ Fast, In memory, scalable, large state, fault tolerant, event time, exactly once.
  • 9. ➢ Programs in Flink are inherently parallel and distributed. ➢ During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks.
  • 10. ➢ Flink facilitate stateful operations. ➢ Current handling event can depend on the accumulated effect of all the events that came before it. ➢ The set of parallel instances of a stateful operator is effectively a sharded key-value store. Each parallel instance is responsible for handling events for a specific group of keys, and the state for those keys is kept locally.
  • 12. States in Flink ➢ Operator State: State is maintained on per operator basis on stream. Special type of state used in source and sink implementations. ➢ Keyed State: Maintaining state on per key basis on stream. Stores state associated with the same key. Embedded key value store. ➢ Broadcast State: Special type of operator state used where records of one stream will be broadcast to all downstream task which needs access to those records. ➢ Queryable State: Feature that allow client API’s to query Jobstate from outside Flink.
  • 15. State Backends 1. Memory state backend: ➢ This is the default backend used by Flink in case nothing is configured. ➢ Persists the data in the memory of each task manager’s Heap. ➢ This state should never be used in production jobs. ➢ The state creates a backup of the data (also known as checkpointing) in the job manager memory which puts unnecessary pressure on the job manager's operational stability.
  • 16. 2. File System Backend ➢ This backend is similar to Memory state backend except, it stores the backup on the filesystem rather than job manager’s memory. ➢ The filesystem can be task manager's local filesystem or a durable store such as HDFS/S3. 3. RocksDB backend ➢ This backend uses RocksDB by Facebook to store the data ➢ RocksDB maintains an in-memory table (also known as mem-table) along with bloom filters, reading recent data also is extremely fast. ➢ Each task manager maintains its own Rocks DB file and the backup of this state is checkpointed to a durable store such as HDFS/S3. ➢ This is the only backend which offers support for incremental checkpointing i.e. taking a backup of only modified data rather than complete data.
  • 17. Checkpointing Checkpoint: Specific marked point in each input stream from which stream can replayed. Flink implements it by persisting state of all stateful operator. Periodically save state to reliable storage system. Stream Barriers: Lightweight stream marker with unique ID’s. Injected by Flink into input stream and flow with stream in line.
  • 20. Demo
  • 21. Q/A