SlideShare a Scribd company logo
2
Most read
7
Most read
10
Most read
Autoscaling with
Apache Flink
Robert Metzger
Staff Engineer @ decodable, Committer and PMC Chair @ Flink
Why Autoscaling?
Source: https://flink.apache.org/2021/05/06/reactive-mode.html
Wasted resources
Reasons for changing loads
- Seasonality:
- day / night
- weekend / weekday
- Product popularity: new feature launches, ad campaigns
- Upstream system outages: load spikes during recovery
Solutions in Flink to Rescale
- Flink 1.2 (2017): Rescalable State
- Flink can restore from a savepoint with a different parallelism, so no data will be lost, all
computations will stay correct
- When used for scaling: requires custom tooling to orchestrate operations, and
bookkeeping
- Flink 1.13 (2021): Reactive Mode (beta)
- Flink automatically adjusts when TaskManagers are added or removed
- Requires outside entity to decide on # TaskManagers
- Since Flink 1.15 (2022): Reactive Mode is out of beta
Further reading: https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html
How to use Reactive Mode?
- Reactive Mode works with all standalone deployments
- E.g. Kubernetes, Docker or via the provided deployment scripts
- Set the configuration:
scheduler-mode=reactive
- Start the JobManager, and add as many TaskManagers as you need
- (optionally) Use a service to determine the number of TaskManagers
- Kubernetes Horizontal Pod Autoscaler
- AWS AutoScaling Groups
- Google Cloud Managed Instance Groups
Reactive Mode: How does it work?
JobManager
TaskManager
Job parallelism = 2
TaskManager
Flink automatically adjusts when TaskManagers are added or removed
Example: Load is increasing
Load
Reactive Mode: How does it work?
JobManager
TaskManager
Job parallelism = 4
TaskManager
Flink automatically adjusts when TaskManagers are added or removed
Example: Load is increasing → add more TaskManagers
TaskManager TaskManager
NEW NEW
Reactive Mode: How does it work?
- The JobManager adjusts the job parallelism depending on the number of
available TaskManagers
- When the # TaskManager changes, the Flink job is restarting, restoring from
the latest checkpoint
- Possible metrics: CPU load / Kafka lag (recommended) / Throughput / latency
- Scaling model similar to Kafka Streams
Reactive Mode example: Kubernetes HPA
- Kubernetes has a built-in
component called
HorizontalPodAutoscaler
- Automatically adjusts the
scale of a deployment based
on a metric
Flink
TaskManager
Deployment
Flink
JobManager
Job
Flink
Job-
Manager
Pod
Flink
Task-
Manager
Pod
Flink
Task-
Manager
Pod
Flink
Task-
Manager
Pod
min=1 max=15
cpu=80%
on=TaskManager
deployment
Horizontalpodautoscaler
Adjusted dynamically
Source: https://flink.apache.org/2021/05/06/reactive-mode.html
Reactive Mode and Flink Deployments
→ Reactive Mode only works with “standalone mode”
Passive Deployment
Flink resources managed externally (“Standalone
mode”)
→ “a bunch of JVMs”
Deployed on bare metal, Docker, Kubernetes
Pros / Cons:
+ DIY scenarios
+ Fast deployments
- Restart
→ Reactive Scaling (outside entity decides)
Active Deployment
Flink actively manages resources
→ Flink talks to a resource manager
Implementations: Native Kubernetes, YARN
Pros / cons:
+ Automatically restarts failed resources
+ Allocates only required resources
- Requires a lot of K8s permissions
→ Autoscaling (Flink decides)
Autoscaling with Flink? Enter Adaptive
Scheduler
- Benefits
- Flink can make better scaling decisions
- Example: rescale only right after a checkpoint completed → avoid
reprocessing
- Fewer components required (“batteries included”)
- How?
- Reactive Mode is based a new (Flink 1.13) internal workload scheduler,
called Adaptive Scheduler.
- Currently configured to behave “reactively”, can also be changed to
automatic
Internals: Adaptive Scheduler
Source / Further reading: https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler
https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management
SlotManager
Resource
Manager
Active K8s / YARN
Requirements
Adaptive Scheduler
I need 15 slots
I have 8 slots
Adaptive Scheduler for Autoscaling (future)
Source / Further reading: https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler
https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management
SlotManager
Resource
Manager
Active K8s / YARN
Requirements
Adaptive Scheduler
I need x slots
I have 8 slots
Pluggable
Autoscaler
Ideas for autoscaler implementations
- REST Interface
- Set desired parallelism via REST call to JobManager
- Either for entire job (and let JM decide on per-operator parallelism) or per-
operator
- User Code + provided autoscaling strategies
- User provides Flink with a custom scaling logic with access to metrics
- Problem: we want to avoid user-code on the JobManager
- JobGraph configuration
- Users configure min, target, max parallelism per operator
Closing remarks
- Autoscaling with Flink is possible today, it’s called
“Reactive Mode” :-)
- Getting started guide:
https://flink.apache.org/2021/05/06/reactive-mode.html
- Limitations of Adaptive Scheduler / Reactive Mode
- Only works with Application Mode
- Task local recovery not yet supported
- Lack of good UI support (history of rescale events)
Questions?
rmetzger@decodable.co / rmetzger@apache.org
@rmetzger_
2022
Build real-time data apps &
services. Fast.
decodable.co

More Related Content

PPTX
Dynamic Rule-based Real-time Market Data Alerts
PDF
Introducing the Apache Flink Kubernetes Operator
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Introduction To Flink
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Introduction to Apache Flink
PDF
Apache Flink internals
PPTX
Evening out the uneven: dealing with skew in Flink
Dynamic Rule-based Real-time Market Data Alerts
Introducing the Apache Flink Kubernetes Operator
Tame the small files problem and optimize data layout for streaming ingestion...
Introduction To Flink
Where is my bottleneck? Performance troubleshooting in Flink
Introduction to Apache Flink
Apache Flink internals
Evening out the uneven: dealing with skew in Flink

What's hot (20)

PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
Kafka Streams State Stores Being Persistent
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Flink powered stream processing platform at Pinterest
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PDF
A Deep Dive into Kafka Controller
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Kafka Streams: What it is, and how to use it?
PDF
Fundamentals of Apache Kafka
PDF
Batch Processing at Scale with Flink & Iceberg
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Storing State Forever: Why It Can Be Good For Your Analytics
PDF
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
ODP
Stream processing using Kafka
PPTX
Hive + Tez: A Performance Deep Dive
PPTX
Deep Dive into Apache Kafka
PPTX
Apache Kafka Best Practices
PDF
An Introduction to Apache Kafka
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Kafka Streams State Stores Being Persistent
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink powered stream processing platform at Pinterest
Building a fully managed stream processing platform on Flink at scale for Lin...
A Deep Dive into Kafka Controller
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Kafka Streams: What it is, and how to use it?
Fundamentals of Apache Kafka
Batch Processing at Scale with Flink & Iceberg
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Storing State Forever: Why It Can Be Good For Your Analytics
CDC Stream Processing With Apache Flink With Timo Walther | Current 2022
Stream processing using Kafka
Hive + Tez: A Performance Deep Dive
Deep Dive into Apache Kafka
Apache Kafka Best Practices
An Introduction to Apache Kafka
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Ad

Similar to Autoscaling Flink with Reactive Mode (20)

PDF
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PDF
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
PDF
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
PPTX
Tuning Flink Clusters for stability and efficiency
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PDF
A look at Flink 1.2
PDF
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
PDF
Flink Forward San Francisco 2019: Scaling a real-time streaming warehouse wit...
PPTX
Robust stream processing with Apache Flink
PDF
Flink Jobs Deployment On Kubernetes
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PDF
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
PDF
Apache Flink
PDF
How to build a tool for operating Flink on Kubernetes
PDF
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
PDF
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Tuning Flink Clusters for stability and efficiency
Using the New Apache Flink Kubernetes Operator in a Production Deployment
A look at Flink 1.2
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Flink Forward San Francisco 2019: Scaling a real-time streaming warehouse wit...
Robust stream processing with Apache Flink
Flink Jobs Deployment On Kubernetes
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Apache Flink in the Cloud-Native Era
Stephan Ewen - Experiences running Flink at Very Large Scale
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Apache Flink
How to build a tool for operating Flink on Kubernetes
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Ad

More from Flink Forward (16)

PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Processing Semantically-Ordered Streams in Financial Services
PPTX
Welcome to the Flink Community!
PPTX
Practical learnings from running thousands of Flink jobs
PPTX
Extending Flink SQL for stream processing use cases
PPTX
The top 3 challenges running multi-tenant Flink at scale
PPTX
Using Queryable State for Fun and Profit
PDF
Changelog Stream Processing with Apache Flink
PPTX
Large Scale Real Time Fraudulent Web Behavior Detection
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
Near real-time statistical modeling and anomaly detection using Flink!
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Processing Semantically-Ordered Streams in Financial Services
Welcome to the Flink Community!
Practical learnings from running thousands of Flink jobs
Extending Flink SQL for stream processing use cases
The top 3 challenges running multi-tenant Flink at scale
Using Queryable State for Fun and Profit
Changelog Stream Processing with Apache Flink
Large Scale Real Time Fraudulent Web Behavior Detection
Building Reliable Lakehouses with Apache Flink and Delta Lake
Near real-time statistical modeling and anomaly detection using Flink!
How to build a streaming Lakehouse with Flink, Kafka, and Hudi

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25-Week II
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release

Autoscaling Flink with Reactive Mode

  • 1. Autoscaling with Apache Flink Robert Metzger Staff Engineer @ decodable, Committer and PMC Chair @ Flink
  • 3. Reasons for changing loads - Seasonality: - day / night - weekend / weekday - Product popularity: new feature launches, ad campaigns - Upstream system outages: load spikes during recovery
  • 4. Solutions in Flink to Rescale - Flink 1.2 (2017): Rescalable State - Flink can restore from a savepoint with a different parallelism, so no data will be lost, all computations will stay correct - When used for scaling: requires custom tooling to orchestrate operations, and bookkeeping - Flink 1.13 (2021): Reactive Mode (beta) - Flink automatically adjusts when TaskManagers are added or removed - Requires outside entity to decide on # TaskManagers - Since Flink 1.15 (2022): Reactive Mode is out of beta Further reading: https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html
  • 5. How to use Reactive Mode? - Reactive Mode works with all standalone deployments - E.g. Kubernetes, Docker or via the provided deployment scripts - Set the configuration: scheduler-mode=reactive - Start the JobManager, and add as many TaskManagers as you need - (optionally) Use a service to determine the number of TaskManagers - Kubernetes Horizontal Pod Autoscaler - AWS AutoScaling Groups - Google Cloud Managed Instance Groups
  • 6. Reactive Mode: How does it work? JobManager TaskManager Job parallelism = 2 TaskManager Flink automatically adjusts when TaskManagers are added or removed Example: Load is increasing Load
  • 7. Reactive Mode: How does it work? JobManager TaskManager Job parallelism = 4 TaskManager Flink automatically adjusts when TaskManagers are added or removed Example: Load is increasing → add more TaskManagers TaskManager TaskManager NEW NEW
  • 8. Reactive Mode: How does it work? - The JobManager adjusts the job parallelism depending on the number of available TaskManagers - When the # TaskManager changes, the Flink job is restarting, restoring from the latest checkpoint - Possible metrics: CPU load / Kafka lag (recommended) / Throughput / latency - Scaling model similar to Kafka Streams
  • 9. Reactive Mode example: Kubernetes HPA - Kubernetes has a built-in component called HorizontalPodAutoscaler - Automatically adjusts the scale of a deployment based on a metric Flink TaskManager Deployment Flink JobManager Job Flink Job- Manager Pod Flink Task- Manager Pod Flink Task- Manager Pod Flink Task- Manager Pod min=1 max=15 cpu=80% on=TaskManager deployment Horizontalpodautoscaler Adjusted dynamically Source: https://flink.apache.org/2021/05/06/reactive-mode.html
  • 10. Reactive Mode and Flink Deployments → Reactive Mode only works with “standalone mode” Passive Deployment Flink resources managed externally (“Standalone mode”) → “a bunch of JVMs” Deployed on bare metal, Docker, Kubernetes Pros / Cons: + DIY scenarios + Fast deployments - Restart → Reactive Scaling (outside entity decides) Active Deployment Flink actively manages resources → Flink talks to a resource manager Implementations: Native Kubernetes, YARN Pros / cons: + Automatically restarts failed resources + Allocates only required resources - Requires a lot of K8s permissions → Autoscaling (Flink decides)
  • 11. Autoscaling with Flink? Enter Adaptive Scheduler - Benefits - Flink can make better scaling decisions - Example: rescale only right after a checkpoint completed → avoid reprocessing - Fewer components required (“batteries included”) - How? - Reactive Mode is based a new (Flink 1.13) internal workload scheduler, called Adaptive Scheduler. - Currently configured to behave “reactively”, can also be changed to automatic
  • 12. Internals: Adaptive Scheduler Source / Further reading: https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management SlotManager Resource Manager Active K8s / YARN Requirements Adaptive Scheduler I need 15 slots I have 8 slots
  • 13. Adaptive Scheduler for Autoscaling (future) Source / Further reading: https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler https://cwiki.apache.org/confluence/display/FLINK/FLIP-138%3A+Declarative+Resource+management SlotManager Resource Manager Active K8s / YARN Requirements Adaptive Scheduler I need x slots I have 8 slots Pluggable Autoscaler
  • 14. Ideas for autoscaler implementations - REST Interface - Set desired parallelism via REST call to JobManager - Either for entire job (and let JM decide on per-operator parallelism) or per- operator - User Code + provided autoscaling strategies - User provides Flink with a custom scaling logic with access to metrics - Problem: we want to avoid user-code on the JobManager - JobGraph configuration - Users configure min, target, max parallelism per operator
  • 15. Closing remarks - Autoscaling with Flink is possible today, it’s called “Reactive Mode” :-) - Getting started guide: https://flink.apache.org/2021/05/06/reactive-mode.html - Limitations of Adaptive Scheduler / Reactive Mode - Only works with Application Mode - Task local recovery not yet supported - Lack of good UI support (history of rescale events)
  • 17. 2022 Build real-time data apps & services. Fast. decodable.co

Editor's Notes

  • #3: Space between actual load and # of workers == wasted resources You want your resource allocation to be close to actual load
  • #5: Rescalable state: stop with savepoint, restore Good when scaling manually and very rarely Reactive Mode == Kafka Streams deployment model
  • #6: Rescalable state: stop with savepoint, restore Good when scaling manually and very rarely Reactive Mode == Kafka Streams deployment model
  • #7: How does Reactive Mode work?
  • #8: “Just add more hardware”
  • #9: Rescaling same operation as failure: restore from latest checkpoint Can be expensive with large state … only rescale rarely!
  • #10: Example implementation in Kubernetes, the most popular deployment option of Flink at the moment
  • #11: Relationship of scaling and deployment modes. Passive deployment: manually launch the flink components (K8s HA also works here!) Active deployment: flink takes care of launch itself (mostly)
  • #13: Blue line / states: interesting path Source code: hide empty description skinparam monochrome false skinparam defaultFontSize 15 [*] -> Created Created --> Waiting : Start scheduling state "Waiting for resources" as Waiting #lightblue state Executing #lightblue state Restarting #lightblue Waiting --> Waiting : Resources are not stable yet Waiting -[#blue,bold]-> Executing : Resources are stable Waiting --> Finished : Cancel, suspend or not \nenough resources Executing --> Canceling : Cancel Executing --> Failing : Unrecoverable fault Executing --> Finished : Suspend terminal state Executing -[#blue,bold]-> Restarting : Recoverable fault Restarting --> Finished : Suspend Restarting --> Canceling : Cancel Restarting -[#blue,bold]-> Waiting : Cancelation complete Canceling --> Finished : Cancelation complete Failing --> Finished : Failing complete Finished -> [*] https://www.planttext.com/?text=RPB1RiCW38RlF8NLOxM-m0wxLEi3h9fsw7PmYTim4OZ0JEtRpoHbB2YdHFYp_zy_zAOZe67aEtGKTJ0Z6--KEcs_OFS2-q38rAd75tPoze66ZRl2CnmP0qFKFNN9of6AB1Hi2d7n0G95duAck06CfLSLOZdlhR20WS1vcSrujWHtuaNBwurqMcsQ6nRmmJWJnQAmUtIQx1F454To7OY_h4BEfsiFd-xFx6ITYeggUddWF6LMd_yRu83cKNwNaTh_K9ZMk62otBBLtR6w-lPdIGvpii0K1kFGmfHkqoxRvqieKRHQ_yhhOYsnibj3rEkQwvWV36W_Z9R4NXsmcdr3bwGQjXnNhjI4awVv2m00
  • #14: Source code: hide empty description skinparam monochrome false skinparam defaultFontSize 15 [*] -> Created Created --> Waiting : Start scheduling state "Waiting for resources" as Waiting #lightblue state Executing #lightblue state Restarting #lightblue Waiting --> Waiting : Resources are not stable yet Waiting -[#blue,bold]-> Executing : Resources are stable Waiting --> Finished : Cancel, suspend or not \nenough resources Executing --> Canceling : Cancel Executing --> Failing : Unrecoverable fault Executing --> Finished : Suspend terminal state Executing -[#blue,bold]-> Restarting : Recoverable fault Restarting --> Finished : Suspend Restarting --> Canceling : Cancel Restarting -[#blue,bold]-> Waiting : Cancelation complete Canceling --> Finished : Cancelation complete Failing --> Finished : Failing complete Finished -> [*] https://www.planttext.com/?text=RPB1RiCW38RlF8NLOxM-m0wxLEi3h9fsw7PmYTim4OZ0JEtRpoHbB2YdHFYp_zy_zAOZe67aEtGKTJ0Z6--KEcs_OFS2-q38rAd75tPoze66ZRl2CnmP0qFKFNN9of6AB1Hi2d7n0G95duAck06CfLSLOZdlhR20WS1vcSrujWHtuaNBwurqMcsQ6nRmmJWJnQAmUtIQx1F454To7OY_h4BEfsiFd-xFx6ITYeggUddWF6LMd_yRu83cKNwNaTh_K9ZMk62otBBLtR6w-lPdIGvpii0K1kFGmfHkqoxRvqieKRHQ_yhhOYsnibj3rEkQwvWV36W_Z9R4NXsmcdr3bwGQjXnNhjI4awVv2m00