SlideShare a Scribd company logo
4
Most read
6
Most read
8
Most read
The Apache Flink® Conference
San Francisco 2022
August 3
Using the New Apache Flink
Kubernetes Operator in a
Production Deployment
Jim Busche, IBM
Ted Chang, IBM
Agenda
1. Introduction
○ Problems we are we solving.
2. Overview of Kubernetes operators and their benefits.
○ Five levels of the operator maturity model.
○ Introduce the newly released Apache Flink Kubernetes Operator.
3. Flink Kubernetes Operator Container Image modifications
○ UBI images
○ IBM Java
4. Enhancements we're making in:
○ Versioning/Upgradeability/Stability
○ Security
5. Demo OLM managed Custom Flink Operator in action
6. Q&A
Problems we Are Solving
❖ How to support products requires
different Flink versions?
❖ How to upgrade Flink Operator
running in production?
❖ How to make Flink more secure?
❖ Reduce duplicate effort among
What is Kubernetes operators
The Operator pattern aims to
❖ Capture the key aim of a human
operator who is managing a
service or set of services.
➢ Operands: The Managed
Services. Typically backing
services.
❖ Have deep knowledge how the
service ought to behave, how to
deploy it.
❖ React if there are problems.
“The Twelve-Factor App: IV. Backing services” https://12factor.net/backing-services
How are operators helpful
An operator makes a service self-managing
❖ Embeds domain-specific knowledge about the service and its lifecycle
❖ Installs the service, upgrades it, keeps it running, tracks metrics, etc.
Hybrid cloud: The service is hosted in the customer’s cluster
❖ The customer is responsible for managing the service
❖ Customer may lack staff, skills, interest
❖ A service manages itself is much more appealing
Operator Maturity Model
Operator Maturity Model
Level 1: Basic Install
Level 2: Seamless Upgrades
Level 3: Full Lifecycle
Level 4: Deep Insights
Level 5: Auto Pilot
https://docs.openshift.com/container-platform/4.10/operators/understanding/olm-what-operators-are.html#olm-maturity-model_olm-what-operators-are
Apache Flink Kubernetes Operator and CR
The Flink Kubernetes Operator extends the Kubernetes API with the ability to manage and operate Flink Deployments. The operator
features the following amongst others:
● Deploy and monitor Flink Application and Session deployments
● Upgrade, suspend and delete deployments
● Full logging and metrics integration
● Flexible deployments and native integration with Kubernetes tooling
Benefit of Using the Flink Kubernetes Operator
Operator handles the following
tasks declaratively using the the
CR
❖ Deploying jobs
❖ Configuration
❖ Upgrade
❖ Backup
❖ Restore
❖ High Availability
❖ Metrics Monitoring
Flink in our products
Watson AIOPs
Flink is used by the log anomaly
detection data prep pipeline and for
the event lifecycle.
https://www.ibm.com/cloud/blog/watson-aiops-bringing-ai-to-it-operations-management
Business Automation Insights
Data ingestion and processing
relies on Apache Flink data
processing framework
Software Compliance in Production
✓ Platform: Red Hat OpenShift
✓ Container Image: UBI
✓ Security
✓ Package Management: Operator
Lifecycle Management (OLM)
Before we can Flink into our product, certain requirements
must be met. For example:
Red Hat OpenShift
❖ Enterprise-grade Kubernetes features
including 7/24 Support
❖ Built-in Security Context Constraint (SSC)
provides default execution policies,
increasing the entire Kubernetes cluster security
level.
❖ Role-based access control (RBAC) in OpenShift
is a non-optional feature, enabling role-based
permissions as required.
❖ Containers required to run as non-root
Dockerfile modifications
❖ If building your own image, make sure you do
latest OS update to get the latest security patches.
For example:
➢ https://github.com/apache/flink-
kubernetes-
operator/blob/main/Dockerfile#L62
➢ ARG SKIP_OS_UPDATE=true Change to
ARG SKIP_OS_UPDATE=false
❖ You can use/create your own DockerFile/base
image, for example we used the Red Hat UBI
image with IBM Java 11
❖ What benefit do we have for swapping UBI base?
For production environments that require an
enterprise version of Linux.
Security Pipeline
❖ Tekton based Security Scan Pipeline running the following
➢ Twistlock vulnerability and compliance
■ Scans containers for vulnerabilities.
➢ Whitesource/Mend
■ WhiteSource is an open-source vulnerability scanner that scans source code for known OSS
vulnerabilities and for compliance.
■ Recommend fixes.
➢ Scorecard
scorecard --local=./flink-kubernetes-operator
RESULTS
-------
Aggregate score: 5.0 / 10
Operator Lifecycle Manager
Open source toolkit to manage
Operators in a Kubernetes Cluster
❖ Over-the-Air Updates and Catalogs
❖ Dependency Model
❖ Discoverability
❖ Cluster Stability
❖ Declarative UI controls
Demo of the Apache Flink Kubernetes Operator in-action
See the prepared recording
❖ Demo of the Apache Flink Operator in-action.
➢ Apply the catsrc
➢ Apply the Subscription (Recommended installPlanApproval: Manual)
➢ Approve the install plan
➢ Check the csv and Flink Kubernetes Operator
Summary, and Q&A
- Q&A
- Thank you for attending!

More Related Content

PDF
Introducing the Apache Flink Kubernetes Operator
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PDF
Flink powered stream processing platform at Pinterest
PDF
Apache Flink internals
PPTX
The top 3 challenges running multi-tenant Flink at scale
PPTX
Practical learnings from running thousands of Flink jobs
PPTX
A visual introduction to Apache Kafka
PDF
Introduction to Apache Flink
Introducing the Apache Flink Kubernetes Operator
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink powered stream processing platform at Pinterest
Apache Flink internals
The top 3 challenges running multi-tenant Flink at scale
Practical learnings from running thousands of Flink jobs
A visual introduction to Apache Kafka
Introduction to Apache Flink

What's hot (20)

PPTX
Autoscaling Flink with Reactive Mode
PPTX
Evening out the uneven: dealing with skew in Flink
PDF
Batch Processing at Scale with Flink & Iceberg
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Producer Performance Tuning for Apache Kafka
PPTX
Introduction to Apache ZooKeeper
ODP
Stream processing using Kafka
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PDF
Introduction to Kafka Streams
PDF
Polyglot persistence @ netflix (CDE Meetup)
PPTX
Kafka 101
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Apache Kafka Best Practices
PPTX
The Current State of Table API in 2022
PDF
An Introduction to Apache Kafka
PDF
Best Practices of Infrastructure as Code with Terraform
Autoscaling Flink with Reactive Mode
Evening out the uneven: dealing with skew in Flink
Batch Processing at Scale with Flink & Iceberg
HBase and HDFS: Understanding FileSystem Usage in HBase
Producer Performance Tuning for Apache Kafka
Introduction to Apache ZooKeeper
Stream processing using Kafka
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Where is my bottleneck? Performance troubleshooting in Flink
Introduction to Kafka Streams
Polyglot persistence @ netflix (CDE Meetup)
Kafka 101
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Apache Flink in the Cloud-Native Era
Apache Kafka Best Practices
The Current State of Table API in 2022
An Introduction to Apache Kafka
Best Practices of Infrastructure as Code with Terraform
Ad

Similar to Using the New Apache Flink Kubernetes Operator in a Production Deployment (20)

PDF
Meetup Openshift Geneva 03/10
PDF
No Compromise - Better, Stronger, Faster Java in the Cloud
PDF
How to build a tool for operating Flink on Kubernetes
PDF
OpenShift_Installation_Deep_Dive_Robert_Bohne.pdf
PDF
OpenShift 4 installation
PPTX
Kuma Meshes Part I - The basics - A tutorial
PDF
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
PPT
Continuous Delivery Agiles 2014 Medellin
PDF
Weave AI Controllers (Weave GitOps Office Hours)
PDF
IBM Bluemix hands on
PDF
A Deep Dive into the Liberty Buildpack on IBM BlueMix
PDF
Devops with Python by Yaniv Cohen DevopShift
PDF
"Wie passen Serverless & Autonomous zusammen?"
PPTX
414: Build an agile CI/CD Pipeline for application integration
PDF
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
PPTX
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
PPTX
JCON_15FactorWorkshop.pptx
PDF
Pivotal Platform: A First Look at the October Release
PDF
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
PDF
Immutable Infrastructure: Rise of the Machine Images
Meetup Openshift Geneva 03/10
No Compromise - Better, Stronger, Faster Java in the Cloud
How to build a tool for operating Flink on Kubernetes
OpenShift_Installation_Deep_Dive_Robert_Bohne.pdf
OpenShift 4 installation
Kuma Meshes Part I - The basics - A tutorial
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Continuous Delivery Agiles 2014 Medellin
Weave AI Controllers (Weave GitOps Office Hours)
IBM Bluemix hands on
A Deep Dive into the Liberty Buildpack on IBM BlueMix
Devops with Python by Yaniv Cohen DevopShift
"Wie passen Serverless & Autonomous zusammen?"
414: Build an agile CI/CD Pipeline for application integration
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
Toronto MuleSoft_Meetup_Run Time Fabric - Self Managed Kubernetes.pptx
JCON_15FactorWorkshop.pptx
Pivotal Platform: A First Look at the October Release
OSDC 2018 | Highly Available Cloud Foundry on Kubernetes by Cornelius Schumacher
Immutable Infrastructure: Rise of the Machine Images
Ad

More from Flink Forward (17)

PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PPTX
Welcome to the Flink Community!
PPTX
Extending Flink SQL for stream processing use cases
PPTX
Using Queryable State for Fun and Profit
PDF
Changelog Stream Processing with Apache Flink
PPTX
Large Scale Real Time Fraudulent Web Behavior Detection
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PPTX
Near real-time statistical modeling and anomaly detection using Flink!
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Welcome to the Flink Community!
Extending Flink SQL for stream processing use cases
Using Queryable State for Fun and Profit
Changelog Stream Processing with Apache Flink
Large Scale Real Time Fraudulent Web Behavior Detection
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Building Reliable Lakehouses with Apache Flink and Delta Lake
Near real-time statistical modeling and anomaly detection using Flink!

Recently uploaded (20)

PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PDF
Transforming Manufacturing operations through Intelligent Integrations
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Advanced IT Governance
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
KodekX | Application Modernization Development
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
Advanced Soft Computing BINUS July 2025.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
Transforming Manufacturing operations through Intelligent Integrations
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced methodologies resolving dimensionality complications for autism neur...
Advanced IT Governance
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
KodekX | Application Modernization Development
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Sensors and Actuators in IoT Systems using pdf
GamePlan Trading System Review: Professional Trader's Honest Take

Using the New Apache Flink Kubernetes Operator in a Production Deployment

  • 1. The Apache Flink® Conference San Francisco 2022 August 3
  • 2. Using the New Apache Flink Kubernetes Operator in a Production Deployment Jim Busche, IBM Ted Chang, IBM
  • 3. Agenda 1. Introduction ○ Problems we are we solving. 2. Overview of Kubernetes operators and their benefits. ○ Five levels of the operator maturity model. ○ Introduce the newly released Apache Flink Kubernetes Operator. 3. Flink Kubernetes Operator Container Image modifications ○ UBI images ○ IBM Java 4. Enhancements we're making in: ○ Versioning/Upgradeability/Stability ○ Security 5. Demo OLM managed Custom Flink Operator in action 6. Q&A
  • 4. Problems we Are Solving ❖ How to support products requires different Flink versions? ❖ How to upgrade Flink Operator running in production? ❖ How to make Flink more secure? ❖ Reduce duplicate effort among
  • 5. What is Kubernetes operators The Operator pattern aims to ❖ Capture the key aim of a human operator who is managing a service or set of services. ➢ Operands: The Managed Services. Typically backing services. ❖ Have deep knowledge how the service ought to behave, how to deploy it. ❖ React if there are problems. “The Twelve-Factor App: IV. Backing services” https://12factor.net/backing-services
  • 6. How are operators helpful An operator makes a service self-managing ❖ Embeds domain-specific knowledge about the service and its lifecycle ❖ Installs the service, upgrades it, keeps it running, tracks metrics, etc. Hybrid cloud: The service is hosted in the customer’s cluster ❖ The customer is responsible for managing the service ❖ Customer may lack staff, skills, interest ❖ A service manages itself is much more appealing
  • 7. Operator Maturity Model Operator Maturity Model Level 1: Basic Install Level 2: Seamless Upgrades Level 3: Full Lifecycle Level 4: Deep Insights Level 5: Auto Pilot https://docs.openshift.com/container-platform/4.10/operators/understanding/olm-what-operators-are.html#olm-maturity-model_olm-what-operators-are
  • 8. Apache Flink Kubernetes Operator and CR The Flink Kubernetes Operator extends the Kubernetes API with the ability to manage and operate Flink Deployments. The operator features the following amongst others: ● Deploy and monitor Flink Application and Session deployments ● Upgrade, suspend and delete deployments ● Full logging and metrics integration ● Flexible deployments and native integration with Kubernetes tooling
  • 9. Benefit of Using the Flink Kubernetes Operator Operator handles the following tasks declaratively using the the CR ❖ Deploying jobs ❖ Configuration ❖ Upgrade ❖ Backup ❖ Restore ❖ High Availability ❖ Metrics Monitoring
  • 10. Flink in our products Watson AIOPs Flink is used by the log anomaly detection data prep pipeline and for the event lifecycle. https://www.ibm.com/cloud/blog/watson-aiops-bringing-ai-to-it-operations-management Business Automation Insights Data ingestion and processing relies on Apache Flink data processing framework
  • 11. Software Compliance in Production ✓ Platform: Red Hat OpenShift ✓ Container Image: UBI ✓ Security ✓ Package Management: Operator Lifecycle Management (OLM) Before we can Flink into our product, certain requirements must be met. For example:
  • 12. Red Hat OpenShift ❖ Enterprise-grade Kubernetes features including 7/24 Support ❖ Built-in Security Context Constraint (SSC) provides default execution policies, increasing the entire Kubernetes cluster security level. ❖ Role-based access control (RBAC) in OpenShift is a non-optional feature, enabling role-based permissions as required. ❖ Containers required to run as non-root
  • 13. Dockerfile modifications ❖ If building your own image, make sure you do latest OS update to get the latest security patches. For example: ➢ https://github.com/apache/flink- kubernetes- operator/blob/main/Dockerfile#L62 ➢ ARG SKIP_OS_UPDATE=true Change to ARG SKIP_OS_UPDATE=false ❖ You can use/create your own DockerFile/base image, for example we used the Red Hat UBI image with IBM Java 11 ❖ What benefit do we have for swapping UBI base? For production environments that require an enterprise version of Linux.
  • 14. Security Pipeline ❖ Tekton based Security Scan Pipeline running the following ➢ Twistlock vulnerability and compliance ■ Scans containers for vulnerabilities. ➢ Whitesource/Mend ■ WhiteSource is an open-source vulnerability scanner that scans source code for known OSS vulnerabilities and for compliance. ■ Recommend fixes. ➢ Scorecard scorecard --local=./flink-kubernetes-operator RESULTS ------- Aggregate score: 5.0 / 10
  • 15. Operator Lifecycle Manager Open source toolkit to manage Operators in a Kubernetes Cluster ❖ Over-the-Air Updates and Catalogs ❖ Dependency Model ❖ Discoverability ❖ Cluster Stability ❖ Declarative UI controls
  • 16. Demo of the Apache Flink Kubernetes Operator in-action See the prepared recording ❖ Demo of the Apache Flink Operator in-action. ➢ Apply the catsrc ➢ Apply the Subscription (Recommended installPlanApproval: Manual) ➢ Approve the install plan ➢ Check the csv and Flink Kubernetes Operator
  • 17. Summary, and Q&A - Q&A - Thank you for attending!

Editor's Notes

  • #8: This is from the operator framework website which is also where the operator sdk comes from. And these are the operator capability levels as you can see there are 5 of them. Basically the higher the level the operator is the more sophisticated they get and the more stuff they can do for you. So level 1 is the operator can only install the service that's about it. But even that is valuable because that way if you have the operator to install your service. The operation team, the SRE, the cluster admin don’t have to learn how to install your service. All they have to learn is to install the operator which is very easy because all operators installation work the same way. On one hand the basica install doesn't do much but on the other hand is valuable. Then Level 2 is seamless upgrade. The operator should be able to upgrade a stateful workload. The operator is not doing anything magic. The workload itself has to be upgradable to begin with. So if there is a way to upgrade the workload from version 1 to version 2 but it's a very manual process. well then you can add that logic into the operator then you are making it into a level 2 operator. Level 3 it can manage the full lifecycle. So a lot of what that mean is these are stateful workloads. A lot of time a level 3 operator should be able to do is to backup and restore from somewhere. Or when there is a failure the operator should be able to recover from the crash and resume where it left off. Level 4 is that the operator is able to gather metrics on the service and tell you how it's doing. And optionally you can even display that in a dashboard that sort of thing. The level 5 is the auto scaling stuff which the operator can grow and shrink the service according to the workload stress level. There are three main ways to implement an operator. You can use Helm, ansible or go. Something like 70% operators use go.
  • #9: With all the basic knowledge of an operator in mind, it should be easier to explain the new and official Flink kubernetes operator. Obviously, this is the operator that deploys and manages Flink applications in a kubernets or openshift cluster. This operator is implemented using the Java Operator SDK because the Flink itself is written in Java and the the Flink community has a lot of experienced Java developers. There are third party Flink operators based on Go but some are either not maintained nor integrated well to manage Flink applications because the Flink is written in Java. The Flink Operator is an application that deploy and manage the lifecycle of Flink applications. Once you have the operator installed into your kubernetes cluster deploying a Flink application becomes easy. First we need to define a Custom Resource(CR) of FlinkDeployment. It is basically a yaml file that defines the name and also the specs the your Flink job that you want the operator to install and manage. And then you can apply the yaml file using the kubectl command to create an instance of a Flink application. The operator will create an instance of Flink application for each of the yaml file with a unique name and manage those Flink application instances for you.
  • #10: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ It basically makes a SRE life easier. Using the CR, someone can configure, deploy, a flink job and have the Flink operator manage the life cycle such as upgrade, backup, restore, and collect metrics for you. Which also makes the Flink operator somewhat at least a level 4 operator.
  • #14: https://developers.redhat.com/products/rhel/ubi