End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage

Jupyter Notebooks
Workflow Building
Pipelines
Tools
Serving
Metadata
Kale
Fairing
TFX
KF Pipelines
HP Tuning
Tensorboard
KFServing
Seldon Core
TFServing, + Training Operators
Pytorch
XGBoost, +
Tensorflow
Prometheus
Kubeflow: End to End ML Platform
Animesh Singh
MPI
MXNet

© 2019 IBM Corporation
Animesh Singh
STSM and Chief Architect - Data and AI Open Source
Platform
o  CTO, IBM RedHat Data and AI Open Source Alignment
o  IBM Kubeflow Engagement Lead, Kubeflow Committer
o  Chair, Linux Foundation AI - Trusted AI
o  Chair, CD Foundation MLOps Sig
o  Ambassador, CNCF
o  Member of IBM Academy of Technology (IBM AoT)
Kubeflow
github.com/kubeflow
Your Speaker Today: CODAIT
2

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
Kubeflow: Current IBM Contributors
Christian Kadner Weiqiang Zhuang Tommy Li Andrew Butler
Jin Chi He Feng Li Ke Zhu Kevin Yu

IBM is the 2nd Largest Contributor

IBMers contributing across projects in Kubeflow

Kubeflow Services
High Level
Services

Low Level APIs / Services
Katib
Pipelines
Notebooks
TFJob PyTorchJob
Jupyter CR
Seldon CR
Kubebench
Pipelines CR
Argo
Study Job
MPIJob
Spark Job
KFServing
TFX Developed By Kubeflow Developed Outside Kubeflow
Adapted from Kubeflow Contributor Summit 2019 talk: Kubeflow and ML
Landscape (Not all components are shown)
Kubernetes API Server
Istio Mesh and Gateway
kubectl apply -f tfjob

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
ML Lifecycle: Build: Development, Training and HPO

Develop (Kubeflow Jupyter Notebooks)
–  Data Scientist
–  Self-service Jupyter Notebooks provide faster model experimentation
–  Simplified configuration of CPU/GPU, RAM, Persistent Volumes
–  Faster model creation with training operators, TFX, magics, workflow automation (Kale, Fairing)
–  Simplify access to external data sources (using stored secrets)
–  Easier protection, faster restoration & sharing of “complete” notebooks
–  IT Operator
–  Profile Controller, Istio, Dex enable secure RBAC to notebooks, data & resources
–  Smaller base container images for notebooks, fewer crashes, faster to recover

Develop (Kubeflow Jupyter Notebooks)
12

Distributed Training Operators
13

Distributed
Training Operators
14

Distributed Tensorflow Operator
•  A distributed Tensorflow Job is collection of the following processes
o  Chief – The chief is responsible for orchestrating training and performing tasks like checkpointing the
model
o  Ps – The ps are parameters servers; the servers provide a distributed data store for the model
parameters to access
o  Worker – The workers do the actual work of training the model. In some cases, worker 0 might also
act as the chief
o  Evaluator - The evaluators can be used to compute evaluation metrics as the model is trained

Distributed MPI Operator - AllReduce
•  AllReduce is an operation that reduces many
arrays spread across multiple processes into a
single array which can be returned to all the
processes
•  This ensures consistency between distributed
processes while allowing all of them to take on
different workloads
•  The operation used to reduce the multiple
arrays back into a single array can vary
and that is what makes the different options
for AllReduce

Hyper Parameter Optimization and
Neural Architecture Search - Katib
•  Katib: Kubernetes Native System for Automated
tuning of machine learning model’s
Hyperparameter Turning and Neural
Architecture Search.
•  Github Repository:
https://github.com/kubeflow/katib

•  Hyperparameter Tuning
q  Random Search
q  Tree of Parzen Estimators (TPE)
q  Grid Search
q  Hyperband
q  Bayesian Optimization
q  CMA Evolution Strategy
•  Neural Architecture Search
q  Efficient Neural Architecture Search (ENAS)
q  Differentiable Architecture Search (DARTS)

Katib
18
Think 2020 / DOC ID / Month XX, 2020 / © 2020 IBM
Corporation

❑  Rollouts:
Is this rollout safe? How do I roll
back? Can I test a change
without swapping traffic?
❑  Protocol Standards:
How do I make a prediction?
GRPC? HTTP? Kafka?
❑  Cost:
Is the model over or under scaled?
Are resources being used efficiently?
❑  Monitoring:
Are the endpoints healthy? What is
the performance profile and request
trace?
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
❑  Frameworks:
How do I serve on Tensorflow?
XGBoost? Scikit Learn? Pytorch?
Custom Code?
❑  Features:
How do I explain the predictions?
What about detecting outliers and
skew? Bias detection? Adversarial
Detection?
❑  How do I wire up custom pre and
post processing
ML Lifecycle: Production Model Serving
❑  How do I handle batch
predictions?
❑  How do I leverage standardized
Data Plane protocol so that I can
move my model across MLServing
platforms?

●  Seldon Core was pioneering Graph Inferencing.
●  IBM and Bloomberg were exploring serverless ML lambdas. IBM gave a talk on
the ML Serving with Knative at last KubeCon in Seattle
●  Google had built a common Tensorflow HTTP API for models.
●  Microsoft Kubernetizing their Azure ML Stack
Experts fragmented across industry

●  Kubeflow created the conditions for collaboration.
●  A promise of open code and open community.
●  Shared responsibilities and expertise across multiple companies.
●  Diverse requirements from different customer segments
Putting the pieces together

●  Founded by Google, Seldon,
IBM, Bloomberg and Microsoft
●  Part of the Kubeflow project
●  Focus on 80% use cases -
single model rollout and update
●  Kfserving 1.0 goals:
○  Serverless ML Inference
○  Canary rollouts
○  Model Explanations
○  Optional Pre/Post
processing
Model Serving - KFServing

Manages the hosting aspects of your models
•  InferenceService - manages the lifecycle of
models

•  Configuration - manages history of model
deployments. Two configurations for default and
canary.

•  Revision - A snapshot of your model version
•  Route - Endpoint and network traffic management
Route Default
Configuration
Revision 1
Revision M 90
%
KFService
Canary
Configuration
Revision 1
Revision N 10
%
KFServing: Default and
Canary Configurations

Model Servers
- TensorFlow
- Nvidia TRTIS
- PyTorch
- XGBoost
- SKLearn
- ONNX

Components:
•  - Predictor, Explainer, Transformer
(pre-processor, post-processor)
Storage
- AWS/S3
- GCS
- Azure Blob
- PVC
Supported Frameworks, Components and
Storage Subsystems

GPU Autoscaling - KNative solution
Ingress
Activator
(buffers requests)
Autoscaler
Queue
Proxy
Model
server
when scale == 0 or handling
burst capacity
when scale > 0
metrics
●  Scale based on # in-flight requests against expected concurrency
●  Simple solution for heterogeneous ML inference autoscaling
scale
metrics
0...N Replicas
API
Requests

But the Data Scientist Sees...
●  A pointer to a Serialized Model File
●  9 lines of YAML
●  A live model at an HTTP endpoint
=
http
●  Scale to Zero
●  GPU Autoscaling
●  Safe Rollouts
●  Optimized Serving Containers
●  Network Policy and Auth
●  HTTP APIs (gRPC soon)
●  Tracing
●  Metrics
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
predictor:
tensorflow:
storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
Production users include:
Bloomberg

`
27
KFServing: Default, Canary and Autoscaler

KFServing – Existing Features
q  Crowd sourced capabilities – Contributions by AWS, Bloomberg, Google, Seldon, IBM, NVidia and others.
q  Support for multiple runtimes pre-integrated (TFServing, Nvdia Triton (GPU optimization), ONNX Runtime, SKLearn,
PyTorch, XGBoost, Custom models.
q  Serverless ML Inference and Autoscaling: Scale to zero (with no incoming traffic) and Request queue based autoscaling
q  Canary and Pinned rollouts: Control traffic percentage and direction, pinned rollouts
q  Pluggable pre-processor/post-processor via Transformer: Gives capabilities to plug in pre-processing/post-processing
implementation, control routing and placement (e.g. pre-processor on CPU, predictor on GPU)
q  Pluggable analysis algorithms: Explainability, Drift Detection, Anomaly Detection, Adversarial Detection (contributed by
Seldon) enabled by Payload Logging (built using CloudEvents standardized eventing protocol)
q  Batch Predictions: Batch prediction support for ML frameworks (TensorFlow, PyTorch, ...)
q  Integration with existing monitoring stack around Knative/Istio ecosystem: Kiali (Service placements, traffic and graphs),
Jaeger (request tracing), Grafana/Prometheus plug-ins for Knative)
q  Multiple clients: kubectl, Python SDK, Kubeflow Pipelines SDK
q  Standardized Data Plane V2 protocol for prediction/explainability et all: Already implemented by Nvidia Triton

q  MMS: Multi-Model-Serving for serving multiple models per custom KFService instance
q  More Data Plane v2 API Compliant Servers: SKLearn, XGBoost, PyTorch…
q  Multi-Model-Graphs and Pipelines: Support chaining multiple models together in a Pipelines
q  PyTorch support via AWS TorchServe
q  gRPC Support for all Model Servers
q  Support for multi-armed-bandits
q  Integration with IBM AIX360 for Explainability, AIF360 for Bias detection and ART for Adversarial detection
KFServing – Upcoming Features

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
ML Lifecycle: Orchestrate Build, Train, Validate and Deploy

Kubeflow Pipelines
§  Containerized implementations of ML Tasks
§  Pre-built components: Just provide params or code snippets
(e.g. training code)
§  Create your own components from code or libraries
§  Use any runtime, framework, data types
§  Attach k8s objects - volumes, secrets
§  Specification of the sequence of steps
§  Specified via Python DSL
§  Inferred from data dependencies on input/output
§  Input Parameters
§  A “Run” = Pipeline invoked w/ specific parameters
§  Can be cloned with different parameters
§  Schedules
§  Invoke a single run or create a recurring scheduled pipeline

Define Pipeline with Python SDK
@dsl.pipeline(name='Taxi Cab Classification Pipeline Example’)
def taxi_cab_classification(
output_dir,
project,
Train_data = 'gs://bucket/train.csv',
Evaluation_data = 'gs://bucket/eval.csv',
Target = 'tips',
Learning_rate = 0.1, hidden_layer_size = '100,50’, steps=3000):

tfdv = TfdvOp(train_data, evaluation_data, project, output_dir)
preprocess = PreprocessOp(train_data, evaluation_data, tfdv.output[“schema”], project, output_dir)
training = DnnTrainerOp(preprocess.output, tfdv.schema, learning_rate, hidden_layer_size, steps,
target, output_dir)
tfma = TfmaOp(training.output, evaluation_data, tfdv.schema, project, output_dir)
deploy = TfServingDeployerOp(training.output)

Compile and Submit Pipeline Run
dsl.compile(taxi_cab_classification, 'tfx.tar.gz')
run = client.run_pipeline(
'tfx_run', 'tfx.tar.gz', params={'output': ‘gs://dpa22’, 'project': ‘my-project-33’})

Visualize the state of various components

Pipelines versioning
Pipelines lets you group and manage multiple versions of a pipeline.

Artifact Tracking
Artifacts for a run of
the “TFX Taxi Trip”
example pipeline. For
each artifact, you can
view details and get
the artifact URL—in
this case, for the
model.

Lineage Tracking
For a given run, the Pipelines Lineage Explorer lets you view the history
and versions of your models, data, and more.

Kubeflow Pipeline Architecture

Kubeflow Pipelines can train, deploy and serve
Open Source Dojo 38

Kubernetes
Ready
ML and AI Platform
Operator Hub - operatorhub.io

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
Watson Productization of Kubeflow Pipelines

Watson AI Pipelines
•  Demonstrate that Watson can be used for end-end AI lifecycledata prep/model training/model risk
validation/model deployment/monitoring/updating models
•  Demonstrate that the full lifecycle can be operated programmatically, and have Tekton as a backend
instead of Argo

Pipeline: Train the model and monitor with OpenScale

Tekton
q  A PipelineResource defines
an object that is an input
(such as a git repository) or an
output (such as a docker
image) of the pipeline.
q  A PipelineRun defines an
execution of a pipeline. It
references the Pipeline to run
and the PipelineResources to
use as inputs and outputs.
q  A Pipeline defines the set
of Tasks that compose a
pipeline.
q  A Task defines a set of build
Steps such as compiling code,
running tests, and building
and deploying images.
TASK

STEP
POD

STEP
TASK

STEP STEP
POD

Container Container Container Container
TEKTON
q  The Tekton Pipelines project
provides Kubernetes-style
resources for declaring CI/CD-
style pipelines.
q  Tekton introduces several new
CRDs including Task, Pipeline,
TaskRun, and PipelineRun.
q  A PipelineRun represents a
single running instance of a
Pipeline and is responsible for
creating a Pod for each of its
Tasks and as many containers
within each Pod as it has Steps.

KFP API Server
Components Pipelines
Object Store
KFP UI
Relational
DB
Argo
Pipeline
Yaml

Tekton
Pipeline
Yaml

KFP – Tekton Phase One
Pluggable Components

Watson
Studio WML
Open
Scale Spark
Kubeflow
Training
Seldon AIF360 ART KATIB KFSERVING
!
!
!
!
!
!
!
…
…!
COMPILE
KFP SDK
TASK

STEP
POD

STEP STEP
POD POD POD
STEP
TASK

STEP STEP
STEP
POD

ARGO
TEKTON

KFP – Tekton Phase Two

Watson
Studio WML
Open
Scale Spark
Kubeflow
Training
!
!
!
!
!
!
!
…
…!
TASK

STEP
POD

STEP STEP
POD POD POD
STEP
TASK

STEP STEP
STEP
POD

ARGO
TEKTON
KFP API Server
Object Store
KFP UI
Relational
DB
Argo
Pipeline
Yaml

Tekton
Pipeline
Yaml

COMPILE
KFP SDK

KFP – Tekton Challenges
46
Multiple Moving parts, with different stakeholders

Tekton Community: Argo with version 2.6 much more mature than Tekton v0.11 (alpha) when the work started around 5 months ago
• Multiple features and capabilities lacking in Tekton when we kick started
• The team had to default to a spreadsheet to start tracking and mapping KFP DSL features, and areas where Tekton needed to bring features and functions.
Overall 50 DSL capabilities identified and corresponding Tekton features started getting mapped.
• Multiple features like Kubernetes resources support to create/patch/update/delete them, image pull secrets, loops, conditionals, support for system params didn’t
exist. Or existed partially
• Tekton started moving from alpha to beta as the work progressed, and few features left behind in alpha mode
• Multiple issues opened on Tekton. Required ramping up the team of Tekton contributors to help drive these issues . Formed a virtual team of IBM Open tech
developers (Andrea Frittoli, Priti Desai), IBM Systems team (Vincent Pli) DevOps team (Simon Kaegi), RedHat (Vincent Demeester etc.) to drive Tekton requirements

Kubeflow Pipeline and TFX Community: Open source team needed to be formed for the specific mission. And trained. Additionally Google
needed to be brought up on the same page, and convinced the validity of integration.
• Multiple design reviews established with Google, and jointly agreed on a direction after they were convinced why we were doing it, and why it makes sense.
• Convincing to accelerate the IR (Intermediate Representation) strategy with TFX, so as to be able to drive this the right way
• Huge dependency in Kubeflow Pipeline code on Argo, including the API backend and UI all written with Argo dependency
• Internal IBM team divided to attack different areas: Compiler (Christian Kadner), API (Tommy Li), UI (Andrew), Feng Li (IBM Systems, China)
• Inability of Kubeflow Pipeline backend to take multiple CRDs, which is the default model Tekton follows. So everything needed to be bundled in one Pipeline Spec
• Type check, workflow utils, and parameter replacement are heavily tied with Argo API. In addition, the persistent agent is watching the resources using the Argo API
type.
• MLOps Sig in CD Foundation leveraged to bring Kubeflow Pipelines and Tekton team together

KFP – Tekton: Delivered

Watson
Studio WML
Open
Scale Spark
Kubeflow
Training
!
!
!
!
!
!
!
…
…!
TASK

STEP
POD

STEP
TASK

STEP STEP
POD

TEKTON
KFP API Server
Object Store
KFP UI
Relational
DB
Tekton
Pipeline
Yaml

COMPILE
KFP SDK

Same KFP Experience: DAG, backed by Tekton YAML
48

Same KFP Exp: Logs, Lineage Tracking and Artifact Tracking
49

50
End to end Kubeflow Components : With KFP-Tekton

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
Kubeflow Adoption: External and Internal

Telstra AI Lab - (TAIL) - Configuration
•  Kubernetes – 1.15
•  Spectrum Scale CSI Driver
•  MetalLB for Load Balancing
•  Istio 1.3.1 for ingress
•  Kubeflow – 1.0.1
•  Jupyter Notebook images are IBM’s
multiarchitecture powerai images (
https://hub.docker.com/r/ibmcom/powerai/tags)
Telstra: Collaborating with IBM to build an Open Source based
OneAnalytics Platform leveraging Kubeflow
THINK 2020 Session: End-to-End Data Science and Machine Learning for Telcos: Telstra's Use Case
https://www.ibm.com/events/think/watch/replay/126561688

Telstra AI Lab - (TAIL) – Future state
•  RedHat Openshift – 4.3
•  GPU Operator
•  Kubeflow Operator
•  Extending the compute
•  Integrate feature stores and streaming
technologies
•  Integrate with CI/CD tools (Tekton
Pipelines)

Yara – Working with IBM to build a Data Science Platform for Digital Farming
ML use cases based on Kubeflow
54
THINK 2020 Session: Enable Smart Farming using Kubeflow
https://www.ibm.com/events/think/watch/replay/126494864

Watson STT: Kubeflow Pipelines running Operations

Watson SpeechToText training Kubeflow pipeline

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
OpenDataHub

'Upstream' is about extracting oil and natural gas from the ground; 'midstream' is about safely moving them thousands of miles;
and 'downstream' is converting these resources into the fuels and finished products we all depend on.
Upstream, Midstream and Downstream

Upstream, Midstream and Downstream
'Upstream' is about extracting oil and natural gas from the ground; 'midstream' is about safely moving them thousands of miles;
and 'downstream' is converting these resources into the fuels and finished products we all depend on.

Data Platform
OpenShift
Ready

Red Hat
OpenShift Container Platform
OPEN DATA HUB
REFERENCE ARCHITECTURE
Storage
Metadata
Management
Data
Analysis
AI
and
ML
Security and
Governance
Monitoring
and
Orchestratio
n
Data in
Motion
Data
Lake
In Memory
Relational
Databases
Streaming Data Object Storage Data Log Data
Big Data
Processing
Streaming Data Exploration
Interactive
Notebooks
Model Lifecycle
ML
Applications
Business
Applications
Metastore

Red Hat
OpenShift Container Platform
OPEN DATA HUB
REFERENCE IMPLEMENTATION
Storage
Metadata
Management
Data
Analysis
AI
and
ML
Security and
Governance
OpenShift Oauth
OpenShift Single
SignOn
(Keycloak)
RedHat Ceph
Object Gateway
RedHat 3scale
Monitoring
and
Orchestratio
n
Prometheus
Grafana
Kubeflow
Pipelines
Jenkins CI/CD
Data in
Motion
Data Lake
RedHat Ceph
Storage
In Memory
RedHat Data Grid
(Infinispan)
Relational
Databases
PostgreSQL
MySQL
Streaming Data
RedHat AMQ
Streams
Kafka Connect
Object Storage Data
RedHat Ceph S3 API
Log Data
FluentD
Logstash
Big Data
Processing
Spark
SparkSQL
Thrift
Streaming
Kafka Streams
Elastic Search
Data Exploration
Hue
Kibana
Interactive
Notebooks
JupyterHub
Hue
Model Lifecycle
Kubeflow
Seldon
MLFlow
ML
Applications
OpenDataHub
AI Library
Business
Applications
Superset
Metastore
Hive

Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
Data
Untrained
Model
OpenDataHub and Kubeflow: Relationship

Initial Goals: OpenDataHub and Kubeflow
Initial Goals:
•  Kubeflow has a great traction, Make it available for OpenShift users
Done in https://github.com/opendatahub-io/manifests
•  Offer ODH users components installed by KF
•  And offer components from ODH (Kafka, Apache SuperSet, Hive…) to KF community
•  Decide if we can leverage KF project and community as upstream for ODH
•  Think Kubernetes -> OpenShift
•  Frees up ODH maintainers time to make sure KF keeps running well on OpenShift

Kubeflow Operator – Contributed by IBM to Kubeflow community
to help enable OpenDataHub
•  https://operatorhub.io/operator/kubeflow

•  Deploy, manage and monitor Kubeflow

•  On various environments
q  IBM Cloud
q  GCP
q  AWS
q  Azure
q  OpenShift
q  Other K8S

Outcome: Kubeflow an Upstream for OpenDataHub
●  A version of the Operator based on Kubeflow
Architecture released:
https://developers.redhat.com/blog/2020/05/07/open-
data-hub-0-6-brings-component-updates-and-kubeflow-
architecture/?sc_cid=7013a000002DTqEAAW
●  Most of the components converted:
https://github.com/opendatahub-io/odh-manifests

●  Still a separate deployment – needs to do both ODH
and Kubeflow in one go.
Future
•  KF 1.0 on OpenShift
•  Disconnected deployment
•  Open Data Hub CI/CD
•  Kubeflow on OpenShift CI
•  UBI based ODH & KF
•  Multitenancy model
•  Mixing KF & ODH

Spark with Open Data Hub
71
•  Open Data Hub will also deploy
the Spark Operator to manage
Spark as an application.
•  Two versions of Spark – Spark in
dedicated mode and Spark on
K8s
•  Currently moving towards Spark
on K8s Operator from Google for
serverless Spark. IBM
Hummingbird team investigating
this

Airflow integration with Open Data Hub
72
•  Open Data Hub will also deploy the Airflow Operator to manage Airflow as an application.
•  Using the Airflow Operator originally developed in the GoogleCloudPlatform repository and later donated to
Apache.
•  The Operator creates a controller-manager pod which will be created as a part of the Open Data Hub
deployment.
•  Users can then install the Airflow components they need from the available options (eg: CeleryExecutor or
KubernetesExecutor, Postgres deployment or MySQL deployment etc. )

Apache Hive with OpenDataHub
•  Hive was one of the first abstraction engines to be built
on top of MapReduce.
•  Started at Facebook to enable data analysts to analyse
data in Hadoop by using familiar SQL syntax without
having to learn how to write MapReduce.
•  Hive an essential tool in the Hadoop ecosystem that
provides an SQL dialect for querying data stored in
HDFS, other file systems that integrate with Hadoop
such as MapR-FS and Amazon’s S3 and databases like
HBase(the Hadoop database) and Cassandra.
•  Hive is a Hadoop based system for querying and
analysing large volumes of structured data which is
stored on HDFS.
•  Hive is a query engine built to work on top of Hadoop
that can compile queries into MapReduce jobs and run
them on the cluster.

Kubernetes
Ready
Upstream Kubeflow Midstream OpenDataHub
OpenShift
Ready
Kubeflow
OpenDataHub
Open Source End To End
Data and AI Platform
RedHat MarketPlace https://marketplace.redhat.com/en-us

Coming Next: Kubeflow Dojo
https://github.com/kubeflow

https://github.com/opendatahub-io

https://github.com/IBM/
KubeflowDojo

Kubeflow Dojo: Prerequisites
•  Knowledge of Kubernetes, watch the dojo for Kubernetes project with the IBM internal link or external link
•  Access to a Kubernetes cluster, either minikube or remote hosted
•  Source code control and development with git and github, watch the presentation with the
IBM internal link or external link for git and external link for pull requests
•  Get familiar with golang language, watch the introduction dojo with the IBM internal link or external link
•  (optional) Knowledge of Istio and knative
•  If you have more time,
o  Read Kubeflow document to learn more about Kubeflow project
o  Browse through Kubeflow community github

Kubeflow Dojo: Tips for success
•  Access to a Kubernetes cluster
•  minimal spec: 8vcpu, 16gb ram and at least 50gb disk for docker registry
•  On IBM Kubernetes Service, provision the cluster with machine type b2c.4x16 and 2 worker
nodes
•  Follow Kubeflow document to have your cluster prepared
•  On IKS cluster, follow this link to install the IBM Cloud CLI and helm followed by setting up
IBM Cloud Block Storage as the default storage class

© 2019 IBM Corporation
Kubeflow Dojo: Live
Dates: 15th and 16th July

Kubeflow Dojo: Virtual
github.com/ibm/KubeflowDojo
80
Reach Out!

Animesh Singh
singhan@us.ibm.com
twitter.com/AnimeshSingh
github.com/AnimeshSingh

https://ec.yourlearning.ibm.com/w3/event/10082348

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage

More Related Content

What's hot (20)

Similar to End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage (20)

More from Animesh Singh (20)

Recently uploaded (20)

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage