SlideShare a Scribd company logo
De Big Data a AI pasando por Silicon Valley
Luciano Resende
IBM – CODAIT – Silicon Valley, California
1© 2019 IBM Corporation
About me - Luciano Resende
2
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
© 2018 IBM Corporation© 2019 IBM Corporation
3
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
IBM Open Source Participation
© 2019 IBM Corporation
4
IBM Open Source Participation
IBM generated open source innovation
• 137 Code Open (dWO) projects w/1000+ Github projects
• 4 graduates: Node-Red, OpenWhisk, SystemML,
Blockchain fabric to full open governance in the last year
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom
of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
© 2019 IBM Corporation
5
IBM’s history of strong AI leadership
1997: Deep Blue
• Deep Blue became the first machine to beat a world chess
champion in tournament play
2011: Jeopardy!
• Watson beat two top
Jeopardy! champions
1968, 2001: A Space Odyssey
• IBM was a technical
advisor
• HAL is “the latest in
machine intelligence”
2018: Open Tech, AI & emerging
standards
• New IBM centers of gravity for AI
• OS projects increasing exponentially
• Emerging global standards in AI
© 2019 IBM Corporation
2018: Project Debater
Center for Open Source
Data and AI Technologies
CODAIT
codait.org
2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
6© 2019 IBM Corporation
Center for Open Source Data
and AI Technologies
IBM Developer / © 2019 IBM Corporation 7
Using AI
- Model Asset
Exchange
- Data Asset
Exchange
AI
Frameworks Building AI AI Platforms Trusted AI
- Fairness
- Robustness
- Transparency
and
Accountability
- Explainability
AI Examples Today
8© 2019 IBM Corporation
Home Automation & Security
- Multiple connected or
standalone devices
- Controlled by Voice
- Amazon Echo (Alexa)
- Google Home
- Apple HomePod (Siri)
9
© 2019 IBM Corporation
Autonomous Driving
In 2016, Google's self-driving
car system has been officially
recognized as a driver in the US,
paving the way for the
legalization of autonomous
vehicles.
Doordash is currently testing
self-driving robots for food
delivery.
10
https://www.dezeen.com/2016/02/12/google-self-driving-car-artficial-intelligence-system-recognised-as-driver-usa/
https://medium.com/@DoorDash/welcoming-our-newest-robots-to-the-doordash-fleet-with-marble-e752a85d6602
© 2019 IBM Corporation
AMAZON Go
AMAZON GO – No lines, no
checkout, just grab and go
11
© 2019 IBM Corporation
But how simple is to
apply AI to your
Application?
12© 2019 IBM Corporation
“cat”
A simple Deep Learning Model
13May 17, 2018 / © 2018 IBM Corporation
Dense
(3×8)
Dense
(8×6)
Input
(3)
Output
(2)
Dense
(6×4)
Dense
(4×2)
Neural Network
Graph
Weights
(not to scale)
Driver Program
© 2019 IBM Corporation
Example: Get an Image Classifier
14
Step 1: Find a suitable neural
network graph.
– Need to read some papers
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
Example: Get an Image Classifier
15
Step 2: Find code to generate
the neural network graph
May 17, 2018 / © 2018 IBM Corporation
TensorFlow code to build ResNet50 neural network graph
© 2019 IBM Corporation
Example: Get an Image Classifier
16
Step 3: Find some pre-trained
weights for your graph
May 17, 2018 / © 2018 IBM Corporation
Caffe2 ResNet50 model weights
Example: Get an Image Classifier
17
Step 4: Find example code
that performs model
inference
May 17, 2018 / © 2018 IBM Corporation
TensorFlow code for training and batch inference on ResNet50
© 2019 IBM Corporation
Example: Get an Image Classifier
18
Step 5: Write your own code to
perform model inference on one
image at a time
Step 6: Package your inference
code, graph creation code, and pre-
trained weights together
Step 7: Deploy your package
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
Model Marketplaces
19
Collections of well-
understood deep learning
models
Provide a central place to find
known-good implementations
of these models
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
IBM Model Asset eXchange
MAX is a one-stop shop open source
ecosystem for data scientists and AI
developers to share and consume models that
use machine learning engines, such
as TensorFlow, PyTorch and Caffe2.
It also provides a standard approach to
classify, annotate, and deploy these models
for prediction and inferencing.
MAX
https://developer.ibm.com/
code/exchanges/models/
May 17, 2018 / © 2018 IBM Corporation 20© 2019 IBM Corporation
© 2019 IBM Corporation
22© 2019 IBM Corporation
23
© 2019 IBM Corporation
Leveraging MAX
25
I am an application engineer and want to
augment my application/solution with AI.
• Use MAX pre-trained and ready to use
models.
• Deploy collocated with your application
as a docker container or in a
Kubernetes environment
• Integrate the simple to use Inference
REST API
• Use the demo applications as an
example on how to use the apis
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
Learning from MAX
26
I am an data scientist and want to learn
from MAX serving and deployments
patterns.
• All MAX code is available in
github.com/IBM/MAX*.
• Understand and reuse MAX’s inference
code in your own projects as allowed
per open source license
• Understand and reuse MAX’s
packaging and deployment patterns
based on containers and easily
deployable in Kubernetes and apply to
your models
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
MAX Summary
27
Free, open-source models.
Wide variety of domains.
Multiple deep learning frameworks.
Vetted and tested code and IP.
Build and deploy a container based web
service in 30 seconds.
Start training on Watson Studio in
minutes.
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
The IBM Data Asset eXchange
28
Also known as DAX.
A place to find curated free
and open datasets under
open data licenses.
Part of developer.ibm.com.
The MAX Named Entity Tagger
29
A model that identifies
mentions of named entities
like persons, organizations in
English-language text.
Trained by Nick Pentreath on
the CODAIT team
Most difficult part: Finding
usable training data
Groningen Meaning Bank
30
A project at the University of
Groningen to create an open
data set for training linguistic
models like named entity
taggers.
Public domain data with
public domain annotations,
assembled by a 10-person
team with help from online
volunteers.
We needed to make further
modifications to pass IBM’s
own controls.
Contracts Proposition Bank
31
A collection of annotated
sentences drawn from IBM’s
public contracts, annotated
with
Created by IBM Research.
Used by IBM researchers to
train better SRL parsers for
the legal documents domain.
Available on DAX.
IBM’s Open Data
32
IBM Research has produced
dozens, perhaps hundreds, of
open data sets.
The data is not kept in one place.
IBM is working to improve this.
– Initiatives within IBM Research
– DAX
– The Community Data License Agreement
The Community Data License Agreement
http://cdla.io
33
Linux Foundation initiative to
create a new legal framework
that meets the needs of AI
data sets.
IBM is a major supporter.
The Community Data License Agreement
http://cdla.io
34
Two licenses written
specifically for AI data
• CDLA-Sharing: “Copyleft”
license analogous the GPL
• CDLA-Permissive: Similar to
BSD license
Both licenses distinguish clearly
between use (analysis,
modeling) and modification of
the data set.
IBM Data Asset eXchange (DAX)
35
• Curated free and open datasets under open data licenses
• Standardized dataset formats and metadata
• Ready for use in enterprise AI applications
• Complement to the Model Asset eXchange (MAX)
Data Asset eXchange
ibm.biz/data-asset-exchange
Model Asset eXchange
ibm.biz/model-exchange
Is AI Fair?
And Transparent?
36© 2019 IBM Corporation
Unwanted bias and algorithmic fairness
Machine learning, by its very nature, is always a form of statistical discrimination
Discrimination becomes
objectionable when it
places certain privileged
groups at systematic
advantage and certain
unprivileged groups at
systematic disadvantage
Illegal in certain contexts
© 2019 IBM Corporation
38
AI Fairness 360
Toolbox:
Fairness metrics (30+)
Fairness metric
explanations
Bias mitigation
algorithms (10)
AIF360AIF360 toolkit is an open-source library to
help detect and remove bias in machine
learning models.
The AI Fairness 360 Python package includes
a comprehensive set of metrics for datasets
and models to test for biases, explanations for
these metrics, and algorithms to mitigate bias
in datasets and models.
https://github.com/IBM/AIF360
https://developer.ibm.com/patterns/ensuring-
fairness-when-processing-loan-applications/
© 2019 IBM Corporation
39
AIF360 Demo: http://aif360.mybluemix.net
© 2019 IBM Corporation
Adversarial Attacks
Defending Machine
Learning Systems
40© 2019 IBM Corporation
Adversarial Attacks
41Sources: Explaining and Harnessing Adversarial Examples
Robust Physical-World Attacks on Deep Learning Visual Classification
© 2019 IBM Corporation
Adversarial Attacks
42Sources: Explaining and Harnessing Adversarial Examples
Robust Physical-World Attacks on Deep Learning Visual Classification
© 2019 IBM Corporation
Adversarial Attacks - Hiding from Surveillance
43https://www.technologyreview.com/f/613409/how-to-hide-from-the-ai-surveillance-state-with-a-color-printout/© 2019 IBM Corporation
IBM Adversarial Robustness
Toolbox
ART
ART is a library dedicated to adversarial
machine learning. Its purpose is to allow rapid
crafting and analysis of attack and defense
methods for machine learning models. The
Adversarial Robustness Toolbox provides an
implementation for many state-of-the-art
methods for attacking and defending
classifiers.
44
https://github.com/IBM/adversarial-robustness-toolbox
https://developer.ibm.com/patterns/integrate-
adversarial-attacks-model-training-pipeline/
Toolbox
Evasion attacks (11)
Defenses (9)
Detection methods for
adversarial samples &
poisoning attacks
Robustness metrics
© 2019 IBM Corporation
45
ART Demo: https://art-demo.mybluemix.net/
© 2019 IBM Corporation
Building your models
interactively with
Jupyter Stack
46© 2019 IBM Corporation
Jupyter Notebooks
Notebooks are interactive
computational environments,
in which you can combine
code execution, rich text,
mathematics, plots and rich
media.
47
© 2019 IBM Corporation
JupyterLab
JupyterLab is the next generation
UI for the Jupyter Ecosystem.
Bring all the previous
improvements into a single unified
platform plus more!
Provides a modular, extensible
architecture
Retains backward compatibility
with the old notebook we know
and love
48
© 2019 IBM Corporation
Jupyter Notebook
Simple, but Powerful
As simple as opening a web
page, with the capabilities of
a powerful, multilingual,
development environment.
Interactive widgets
Code can produce rich
outputs such as images,
videos, markdown, LaTeX
and JavaScript. Interactive
widgets can be used to
manipulate and visualize
data in real-time.
Language of choice
Jupyter Notebooks have
support for over 50
programming languages,
including those popular in
Data Science, Data
Engineer, and AI such as
Python, R, Julia and Scala.
Big Data Integration
Leverage Big Data platforms
such as Apache Spark from
Python, R and Scala.
Explore the same data with
pandas, scikit-learn,
ggplot2, dplyr, etc.
Share Notebooks
Notebooks can be shared
with others using e-mail,
Dropbox, Google Drive,
GitHub, etc
49
Enterprise Requirements
Multiuser, Self Service, Secure
Scale to support Analytics Workloads
- Processing large amount of data
in a distributed fashion.
Support for Heterogenic AI
Workloads
- Resource intensive workloads
- Heterogenous frameworks (isolation required)
- Sharing of hardware resources (GPUs/TPUs)
IBM Developer / © 2019 IBM Corporation 50
Vanilla Jupyter Notebook
Kernel
Kernel
Kernel
Kernel
Kernel
Single user sharing the same
privileges
- Users can see and control each other
process using Jupyter administrative
utilities
Not Scalable
- Jupyter Kernels running as local
process where resources are limited
by what is available on the one single
node that runs all Kernels and
associated Spark drivers 8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
IBM Developer / © 2019 IBM Corporation 51
JupyterHub
JupyterHub brings the power of
notebooks to groups of users.
It gives users access to
computational environments
and resources, in a self-service
fashion, without burdening the
users with installation and
maintenance tasks.
52
© 2019 IBM Corporation
Jupyter Enterprise
Gateway Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
53
A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share resources
across an Apache Spark or Kubernetes
cluster for Enterprise/Cloud use cases
© 2019 IBM Corporation
Spectrum Conductor
+ +
Jupyter Enterprise Gateway
Features
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels
as Spark applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional
Resource Managers
Enhanced Security
– End-to-End secure communications
Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user
impersonation when running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel Kernel
Kernel
Kernel
Kernel
Kernel
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
54
© 2019 IBM Corporation
Jupyter Enterprise Gateway 2.x
AI Workloads with Containers
– Current version : 2.1.0
• Innovations around Container Environments
• Support vanilla kernels, Spark on K8s, Docker Swarm
– Distributed kernels as individual containers in both Docker Swarm
or Kubernetes environment
• Provided kernel images for:
– Python (IPython), Python w/ Spark, Python w/
Tensorflow, and Python w/ Tensorflow and GPUs, Scala
(Toree) w/ Spark, R (IRKernel), R w/ Spark
– JupyterHub integration.
– Dynamic Configurable (reloadable configuration)
– Deployment with helm,
– Jinja templates for kernel configuration
55IBM Developer / © 2019 IBM Corporation
Jupyter Enterprise Gateway - Kubernetes
Jupyter Enterprise Gateway - Kubernetes
Jupyter Enterprise Gateway & JupyterHub
Leveraging
AI Platforms
for model training
58© 2019 IBM Corporation
Enterprise Machine Learning
Training/Deploying Models requires a lot of DevOPS
60May 17, 2018 / © 2018 IBM Corporation
Model Serving
Monitoring
Resource
Management
Configuration
Hyperparameter
Optimization
Reproducibility
© 2019 IBM Corporation
AI Platforms
61
Aims to enable the Data Scientist to train their AI Models (e.g. Deep Neural
Networks) in a consistent way independent of the framework in use or
resources required for the job.
Leverages Kubernetes platform ability to easy management of
containerized applications with the benefit of Elasticity and Quality of
Services as well as sharing of restrict accelerated hardware
May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
End to end ML platform on Kubernetes.
Initially originated at Google.
Key Projects
– Model Training and Hyper
parameter optimization
– Model Serving
– Model Management
– Pipelines:
• Combine components into
complex workflows
– Metadata
• Collect data from multiple components
Kubeflow
Overall community, and IBM’s presence in Kubeflow
• Commits in
KubeFlow
compared with
other companies
• IBM is 2nd
• or 3rd largest
contributor in the
past 12 months
• IBM maintainers
(approvers/review
ers) in Katib
Kubeflow Serving,
(HPO+Training),
Manifests,
Pipelines etc.
https://www.stackalytics.com/unaffiliated?project_type=kubeflow-group
IBMers contributing to:
• 590+ Commits
• 924K Lines of
Code
https://www.stackalytics.com/unaffiliated?project_type=kubeflow-group&company=ibm
© 2018 IBM Corporation
Model Asset Exchange
https://developer.ibm.com/code/exchanges/models/
Data Asset Exchange
https://developer.ibm.com/exchanges/data/
AI Fairness 360
https://github.com/IBM/AIF360
Adversarial Robustness Toolbox
https://github.com/IBM/adversarial-robustness-toolbox
Jupyter Enterprise Gateway
https://github.com/jupyter/enterprise_gateway
Kubeflow
https://github.com/kubeflow
65
Open Source Resources
Thank you!
@lresende1975
© 2019 IBM Corporation
© 2018 IBM Corporation 66

More Related Content

PDF
Using Algorithmia to leverage AI and Machine Learning APIs
PDF
Bring cloud on premises with a kubernetes-native infrastructure
PPTX
Best Practices in Starting an Open Source Project for Companies
PPTX
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
PDF
Webinar: Hybrid Cloud Integration - Why It's Different and Why It Matters
PDF
The Power of the Hybrid Cloud
PPTX
Scenarios for building Hybrid Cloud
PDF
Creare applicazioni dotate d'intelligenza cognitiva - seconda parte
Using Algorithmia to leverage AI and Machine Learning APIs
Bring cloud on premises with a kubernetes-native infrastructure
Best Practices in Starting an Open Source Project for Companies
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Webinar: Hybrid Cloud Integration - Why It's Different and Why It Matters
The Power of the Hybrid Cloud
Scenarios for building Hybrid Cloud
Creare applicazioni dotate d'intelligenza cognitiva - seconda parte

What's hot (20)

PPTX
Open Collaboration in a Digital World | Find your place in the future
PDF
Open Source and Standards Communities Coming Together to Solve Real World Pro...
PDF
AI ML by Silver Touch Tech Lab
PDF
No sql now2011_review_of_adhoc_architectures
PPTX
Digital transformation and AI @Edge
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PPTX
DevOps + DataOps = Digital Transformation
PPTX
The Growing Research that Open Source Owns the Future in Cloud
PDF
Eclipse MicroProfile: Accelerating Cloud-Native Application Development with ...
PPTX
Watson AI platform for business - IBM Cloud
PDF
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
PDF
Knowledge and Scalability Through Graph Composition
PPT
新生利用图书馆讲座
PPTX
Nimbix AI Cloud and PowerAI
PDF
On Demand BI
PDF
Kubernetes and Container Technologies from Cloud Native Computing Foundation
PDF
Towards Secure and Interpretable AI: Scalable Methods, Interactive Visualizat...
PDF
5 Reasons not to use Dita from a CCMS Perspective
PDF
Connect Faster with SnapLogic at Workday Rising
PPTX
The Developer is the New CIO: How Vendors Adapt to the Changing Landscape
Open Collaboration in a Digital World | Find your place in the future
Open Source and Standards Communities Coming Together to Solve Real World Pro...
AI ML by Silver Touch Tech Lab
No sql now2011_review_of_adhoc_architectures
Digital transformation and AI @Edge
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
DevOps + DataOps = Digital Transformation
The Growing Research that Open Source Owns the Future in Cloud
Eclipse MicroProfile: Accelerating Cloud-Native Application Development with ...
Watson AI platform for business - IBM Cloud
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
Knowledge and Scalability Through Graph Composition
新生利用图书馆讲座
Nimbix AI Cloud and PowerAI
On Demand BI
Kubernetes and Container Technologies from Cloud Native Computing Foundation
Towards Secure and Interpretable AI: Scalable Methods, Interactive Visualizat...
5 Reasons not to use Dita from a CCMS Perspective
Connect Faster with SnapLogic at Workday Rising
The Developer is the New CIO: How Vendors Adapt to the Changing Landscape
Ad

Similar to From Data to AI - Silicon Valley Open Source projects come to you - Madrid meetup (20)

PDF
Open Source AI - News and examples
PPTX
Inteligencia artificial, open source e IBM Call for Code
PDF
Deploying End-to-End Deep Learning Pipelines with ONNX
PDF
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
PPTX
End-to-End Deep Learning Deployment with ONNX
PPTX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
PDF
Continuous Deployment for Deep Learning
PPTX
Northwestern 20181004 v9
PPTX
Open, Secure & Transparent AI Pipelines
PPTX
Innovations using PowerAI
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
PPTX
Introduction to PowerAI - The Enterprise AI Platform
PPTX
Defend against adversarial AI using Adversarial Robustness Toolbox
PDF
G111614 top-trends-sydney2019-v1910a
PPTX
IBM Developer Model Asset eXchange
PPTX
Cognitive Assistant for Data Scientists (CADS)
PDF
Ibm db2update2019 machine learning and db2 ai
PPTX
InTTrust -IBM Artificial Intelligence Event
PDF
Trusted, Transparent and Fair AI using Open Source
Open Source AI - News and examples
Inteligencia artificial, open source e IBM Call for Code
Deploying End-to-End Deep Learning Pipelines with ONNX
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
End-to-End Deep Learning Deployment with ONNX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Continuous Deployment for Deep Learning
Northwestern 20181004 v9
Open, Secure & Transparent AI Pipelines
Innovations using PowerAI
Enabling a hardware accelerated deep learning data science experience for Apa...
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
Introduction to PowerAI - The Enterprise AI Platform
Defend against adversarial AI using Adversarial Robustness Toolbox
G111614 top-trends-sydney2019-v1910a
IBM Developer Model Asset eXchange
Cognitive Assistant for Data Scientists (CADS)
Ibm db2update2019 machine learning and db2 ai
InTTrust -IBM Artificial Intelligence Event
Trusted, Transparent and Fair AI using Open Source
Ad

More from Luciano Resende (20)

PDF
A Jupyter kernel for Scala and Apache Spark.pdf
PDF
Using Elyra for COVID-19 Analytics
PDF
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
PDF
Ai pipelines powered by jupyter notebooks
PDF
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
PDF
Scaling notebooks for Deep Learning workloads
PDF
Jupyter Enterprise Gateway Overview
PDF
IoT Applications and Patterns using Apache Spark & Apache Bahir
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
PDF
Building analytical microservices powered by jupyter kernels
PDF
Building iot applications with Apache Spark and Apache Bahir
PDF
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
PDF
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
PDF
What's new in Apache SystemML - Declarative Machine Learning
PDF
Big analytics meetup - Extended Jupyter Kernel Gateway
PDF
Jupyter con meetup extended jupyter kernel gateway
PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PDF
How mentoring can help you start contributing to open source
PDF
SystemML - Declarative Machine Learning
PDF
Luciano Resende's keynote at Apache big data conference
A Jupyter kernel for Scala and Apache Spark.pdf
Using Elyra for COVID-19 Analytics
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Ai pipelines powered by jupyter notebooks
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Scaling notebooks for Deep Learning workloads
Jupyter Enterprise Gateway Overview
IoT Applications and Patterns using Apache Spark & Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
Building analytical microservices powered by jupyter kernels
Building iot applications with Apache Spark and Apache Bahir
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
What's new in Apache SystemML - Declarative Machine Learning
Big analytics meetup - Extended Jupyter Kernel Gateway
Jupyter con meetup extended jupyter kernel gateway
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
How mentoring can help you start contributing to open source
SystemML - Declarative Machine Learning
Luciano Resende's keynote at Apache big data conference

Recently uploaded (20)

PPTX
Modelling in Business Intelligence , information system
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Lecture1 pattern recognition............
PPTX
Leprosy and NLEP programme community medicine
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Mega Projects Data Mega Projects Data
PDF
Transcultural that can help you someday.
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Managing Community Partner Relationships
PPT
Predictive modeling basics in data cleaning process
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
SAP 2 completion done . PRESENTATION.pptx
Modelling in Business Intelligence , information system
Database Infoormation System (DBIS).pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Galatica Smart Energy Infrastructure Startup Pitch Deck
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Lecture1 pattern recognition............
Leprosy and NLEP programme community medicine
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
Transcultural that can help you someday.
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
climate analysis of Dhaka ,Banglades.pptx
Managing Community Partner Relationships
Predictive modeling basics in data cleaning process
Pilar Kemerdekaan dan Identi Bangsa.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Introduction to Data Science and Data Analysis
SAP 2 completion done . PRESENTATION.pptx

From Data to AI - Silicon Valley Open Source projects come to you - Madrid meetup

  • 1. De Big Data a AI pasando por Silicon Valley Luciano Resende IBM – CODAIT – Silicon Valley, California 1© 2019 IBM Corporation
  • 2. About me - Luciano Resende 2 Open Source AI Platform Architect – IBM – CODAIT • Senior Technical Staff Member at IBM, contributing to open source for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms [email protected] https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende © 2018 IBM Corporation© 2019 IBM Corporation
  • 3. 3 Learn Open Source @ IBM Program touches 78,000 IBMers annually Consume Virtually all IBM products contain some open source • 40,363 pkgs Per Year Contribute • >62K OS Certs per year • ~10K IBM commits per month Connect > 1000 active IBM Contributors Working in key OS projects IBM Open Source Participation © 2019 IBM Corporation
  • 4. 4 IBM Open Source Participation IBM generated open source innovation • 137 Code Open (dWO) projects w/1000+ Github projects • 4 graduates: Node-Red, OpenWhisk, SystemML, Blockchain fabric to full open governance in the last year • developer.ibm.com/code/open/code/ Community • IBM focused on 18 strategic communities • Drive open governance in “Centers of Gravity” • IBM Leaders drive key technologies and assure freedom of action The IBM OS Way is now open sourced • Training, Recognition, Tooling • Organization, Consuming, Contributing © 2019 IBM Corporation
  • 5. 5 IBM’s history of strong AI leadership 1997: Deep Blue • Deep Blue became the first machine to beat a world chess champion in tournament play 2011: Jeopardy! • Watson beat two top Jeopardy! champions 1968, 2001: A Space Odyssey • IBM was a technical advisor • HAL is “the latest in machine intelligence” 2018: Open Tech, AI & emerging standards • New IBM centers of gravity for AI • OS projects increasing exponentially • Emerging global standards in AI © 2019 IBM Corporation 2018: Project Debater
  • 6. Center for Open Source Data and AI Technologies CODAIT codait.org 2018 / © 2018 IBM Corporation codait (French) = coder/coded https://m.interglot.com/fr/en/codait CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 6© 2019 IBM Corporation
  • 7. Center for Open Source Data and AI Technologies IBM Developer / © 2019 IBM Corporation 7 Using AI - Model Asset Exchange - Data Asset Exchange AI Frameworks Building AI AI Platforms Trusted AI - Fairness - Robustness - Transparency and Accountability - Explainability
  • 8. AI Examples Today 8© 2019 IBM Corporation
  • 9. Home Automation & Security - Multiple connected or standalone devices - Controlled by Voice - Amazon Echo (Alexa) - Google Home - Apple HomePod (Siri) 9 © 2019 IBM Corporation
  • 10. Autonomous Driving In 2016, Google's self-driving car system has been officially recognized as a driver in the US, paving the way for the legalization of autonomous vehicles. Doordash is currently testing self-driving robots for food delivery. 10 https://www.dezeen.com/2016/02/12/google-self-driving-car-artficial-intelligence-system-recognised-as-driver-usa/ https://medium.com/@DoorDash/welcoming-our-newest-robots-to-the-doordash-fleet-with-marble-e752a85d6602 © 2019 IBM Corporation
  • 11. AMAZON Go AMAZON GO – No lines, no checkout, just grab and go 11 © 2019 IBM Corporation
  • 12. But how simple is to apply AI to your Application? 12© 2019 IBM Corporation
  • 13. “cat” A simple Deep Learning Model 13May 17, 2018 / © 2018 IBM Corporation Dense (3×8) Dense (8×6) Input (3) Output (2) Dense (6×4) Dense (4×2) Neural Network Graph Weights (not to scale) Driver Program © 2019 IBM Corporation
  • 14. Example: Get an Image Classifier 14 Step 1: Find a suitable neural network graph. – Need to read some papers May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 15. Example: Get an Image Classifier 15 Step 2: Find code to generate the neural network graph May 17, 2018 / © 2018 IBM Corporation TensorFlow code to build ResNet50 neural network graph © 2019 IBM Corporation
  • 16. Example: Get an Image Classifier 16 Step 3: Find some pre-trained weights for your graph May 17, 2018 / © 2018 IBM Corporation Caffe2 ResNet50 model weights
  • 17. Example: Get an Image Classifier 17 Step 4: Find example code that performs model inference May 17, 2018 / © 2018 IBM Corporation TensorFlow code for training and batch inference on ResNet50 © 2019 IBM Corporation
  • 18. Example: Get an Image Classifier 18 Step 5: Write your own code to perform model inference on one image at a time Step 6: Package your inference code, graph creation code, and pre- trained weights together Step 7: Deploy your package May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 19. Model Marketplaces 19 Collections of well- understood deep learning models Provide a central place to find known-good implementations of these models May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 20. IBM Model Asset eXchange MAX is a one-stop shop open source ecosystem for data scientists and AI developers to share and consume models that use machine learning engines, such as TensorFlow, PyTorch and Caffe2. It also provides a standard approach to classify, annotate, and deploy these models for prediction and inferencing. MAX https://developer.ibm.com/ code/exchanges/models/ May 17, 2018 / © 2018 IBM Corporation 20© 2019 IBM Corporation
  • 21. © 2019 IBM Corporation
  • 22. 22© 2019 IBM Corporation
  • 23. 23
  • 24. © 2019 IBM Corporation
  • 25. Leveraging MAX 25 I am an application engineer and want to augment my application/solution with AI. • Use MAX pre-trained and ready to use models. • Deploy collocated with your application as a docker container or in a Kubernetes environment • Integrate the simple to use Inference REST API • Use the demo applications as an example on how to use the apis May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 26. Learning from MAX 26 I am an data scientist and want to learn from MAX serving and deployments patterns. • All MAX code is available in github.com/IBM/MAX*. • Understand and reuse MAX’s inference code in your own projects as allowed per open source license • Understand and reuse MAX’s packaging and deployment patterns based on containers and easily deployable in Kubernetes and apply to your models May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 27. MAX Summary 27 Free, open-source models. Wide variety of domains. Multiple deep learning frameworks. Vetted and tested code and IP. Build and deploy a container based web service in 30 seconds. Start training on Watson Studio in minutes. May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 28. The IBM Data Asset eXchange 28 Also known as DAX. A place to find curated free and open datasets under open data licenses. Part of developer.ibm.com.
  • 29. The MAX Named Entity Tagger 29 A model that identifies mentions of named entities like persons, organizations in English-language text. Trained by Nick Pentreath on the CODAIT team Most difficult part: Finding usable training data
  • 30. Groningen Meaning Bank 30 A project at the University of Groningen to create an open data set for training linguistic models like named entity taggers. Public domain data with public domain annotations, assembled by a 10-person team with help from online volunteers. We needed to make further modifications to pass IBM’s own controls.
  • 31. Contracts Proposition Bank 31 A collection of annotated sentences drawn from IBM’s public contracts, annotated with Created by IBM Research. Used by IBM researchers to train better SRL parsers for the legal documents domain. Available on DAX.
  • 32. IBM’s Open Data 32 IBM Research has produced dozens, perhaps hundreds, of open data sets. The data is not kept in one place. IBM is working to improve this. – Initiatives within IBM Research – DAX – The Community Data License Agreement
  • 33. The Community Data License Agreement http://cdla.io 33 Linux Foundation initiative to create a new legal framework that meets the needs of AI data sets. IBM is a major supporter.
  • 34. The Community Data License Agreement http://cdla.io 34 Two licenses written specifically for AI data • CDLA-Sharing: “Copyleft” license analogous the GPL • CDLA-Permissive: Similar to BSD license Both licenses distinguish clearly between use (analysis, modeling) and modification of the data set.
  • 35. IBM Data Asset eXchange (DAX) 35 • Curated free and open datasets under open data licenses • Standardized dataset formats and metadata • Ready for use in enterprise AI applications • Complement to the Model Asset eXchange (MAX) Data Asset eXchange ibm.biz/data-asset-exchange Model Asset eXchange ibm.biz/model-exchange
  • 36. Is AI Fair? And Transparent? 36© 2019 IBM Corporation
  • 37. Unwanted bias and algorithmic fairness Machine learning, by its very nature, is always a form of statistical discrimination Discrimination becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage Illegal in certain contexts © 2019 IBM Corporation
  • 38. 38 AI Fairness 360 Toolbox: Fairness metrics (30+) Fairness metric explanations Bias mitigation algorithms (10) AIF360AIF360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. https://github.com/IBM/AIF360 https://developer.ibm.com/patterns/ensuring- fairness-when-processing-loan-applications/ © 2019 IBM Corporation
  • 40. Adversarial Attacks Defending Machine Learning Systems 40© 2019 IBM Corporation
  • 41. Adversarial Attacks 41Sources: Explaining and Harnessing Adversarial Examples Robust Physical-World Attacks on Deep Learning Visual Classification © 2019 IBM Corporation
  • 42. Adversarial Attacks 42Sources: Explaining and Harnessing Adversarial Examples Robust Physical-World Attacks on Deep Learning Visual Classification © 2019 IBM Corporation
  • 43. Adversarial Attacks - Hiding from Surveillance 43https://www.technologyreview.com/f/613409/how-to-hide-from-the-ai-surveillance-state-with-a-color-printout/© 2019 IBM Corporation
  • 44. IBM Adversarial Robustness Toolbox ART ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. 44 https://github.com/IBM/adversarial-robustness-toolbox https://developer.ibm.com/patterns/integrate- adversarial-attacks-model-training-pipeline/ Toolbox Evasion attacks (11) Defenses (9) Detection methods for adversarial samples & poisoning attacks Robustness metrics © 2019 IBM Corporation
  • 46. Building your models interactively with Jupyter Stack 46© 2019 IBM Corporation
  • 47. Jupyter Notebooks Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media. 47 © 2019 IBM Corporation
  • 48. JupyterLab JupyterLab is the next generation UI for the Jupyter Ecosystem. Bring all the previous improvements into a single unified platform plus more! Provides a modular, extensible architecture Retains backward compatibility with the old notebook we know and love 48 © 2019 IBM Corporation
  • 49. Jupyter Notebook Simple, but Powerful As simple as opening a web page, with the capabilities of a powerful, multilingual, development environment. Interactive widgets Code can produce rich outputs such as images, videos, markdown, LaTeX and JavaScript. Interactive widgets can be used to manipulate and visualize data in real-time. Language of choice Jupyter Notebooks have support for over 50 programming languages, including those popular in Data Science, Data Engineer, and AI such as Python, R, Julia and Scala. Big Data Integration Leverage Big Data platforms such as Apache Spark from Python, R and Scala. Explore the same data with pandas, scikit-learn, ggplot2, dplyr, etc. Share Notebooks Notebooks can be shared with others using e-mail, Dropbox, Google Drive, GitHub, etc 49
  • 50. Enterprise Requirements Multiuser, Self Service, Secure Scale to support Analytics Workloads - Processing large amount of data in a distributed fashion. Support for Heterogenic AI Workloads - Resource intensive workloads - Heterogenous frameworks (isolation required) - Sharing of hardware resources (GPUs/TPUs) IBM Developer / © 2019 IBM Corporation 50
  • 51. Vanilla Jupyter Notebook Kernel Kernel Kernel Kernel Kernel Single user sharing the same privileges - Users can see and control each other process using Jupyter administrative utilities Not Scalable - Jupyter Kernels running as local process where resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers 8 8 8 8 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS IBM Developer / © 2019 IBM Corporation 51
  • 52. JupyterHub JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources, in a self-service fashion, without burdening the users with installation and maintenance tasks. 52 © 2019 IBM Corporation
  • 53. Jupyter Enterprise Gateway Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Supported Kernels Supported Platforms 53 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases © 2019 IBM Corporation Spectrum Conductor + +
  • 54. Jupyter Enterprise Gateway Features Optimized Resource Allocation – Utilize resources on all cluster nodes by running kernels as Spark applications in YARN Cluster Mode. – Pluggable architecture to enable support for additional Resource Managers Enhanced Security – End-to-End secure communications Multiuser support with user impersonation – Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). – Individual HDFS home folder for each notebook user. – Use the same user ID for notebook and batch jobs. Kernel Kernel Kernel Kernel Kernel Kernel Kernel 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS 54 © 2019 IBM Corporation
  • 55. Jupyter Enterprise Gateway 2.x AI Workloads with Containers – Current version : 2.1.0 • Innovations around Container Environments • Support vanilla kernels, Spark on K8s, Docker Swarm – Distributed kernels as individual containers in both Docker Swarm or Kubernetes environment • Provided kernel images for: – Python (IPython), Python w/ Spark, Python w/ Tensorflow, and Python w/ Tensorflow and GPUs, Scala (Toree) w/ Spark, R (IRKernel), R w/ Spark – JupyterHub integration. – Dynamic Configurable (reloadable configuration) – Deployment with helm, – Jinja templates for kernel configuration 55IBM Developer / © 2019 IBM Corporation
  • 57. Jupyter Enterprise Gateway - Kubernetes Jupyter Enterprise Gateway & JupyterHub
  • 58. Leveraging AI Platforms for model training 58© 2019 IBM Corporation
  • 60. Training/Deploying Models requires a lot of DevOPS 60May 17, 2018 / © 2018 IBM Corporation Model Serving Monitoring Resource Management Configuration Hyperparameter Optimization Reproducibility © 2019 IBM Corporation
  • 61. AI Platforms 61 Aims to enable the Data Scientist to train their AI Models (e.g. Deep Neural Networks) in a consistent way independent of the framework in use or resources required for the job. Leverages Kubernetes platform ability to easy management of containerized applications with the benefit of Elasticity and Quality of Services as well as sharing of restrict accelerated hardware May 17, 2018 / © 2018 IBM Corporation© 2019 IBM Corporation
  • 62. End to end ML platform on Kubernetes. Initially originated at Google. Key Projects – Model Training and Hyper parameter optimization – Model Serving – Model Management – Pipelines: • Combine components into complex workflows – Metadata • Collect data from multiple components Kubeflow
  • 63. Overall community, and IBM’s presence in Kubeflow • Commits in KubeFlow compared with other companies • IBM is 2nd • or 3rd largest contributor in the past 12 months • IBM maintainers (approvers/review ers) in Katib Kubeflow Serving, (HPO+Training), Manifests, Pipelines etc. https://www.stackalytics.com/unaffiliated?project_type=kubeflow-group
  • 64. IBMers contributing to: • 590+ Commits • 924K Lines of Code https://www.stackalytics.com/unaffiliated?project_type=kubeflow-group&company=ibm
  • 65. © 2018 IBM Corporation Model Asset Exchange https://developer.ibm.com/code/exchanges/models/ Data Asset Exchange https://developer.ibm.com/exchanges/data/ AI Fairness 360 https://github.com/IBM/AIF360 Adversarial Robustness Toolbox https://github.com/IBM/adversarial-robustness-toolbox Jupyter Enterprise Gateway https://github.com/jupyter/enterprise_gateway Kubeflow https://github.com/kubeflow 65 Open Source Resources Thank you! @lresende1975 © 2019 IBM Corporation
  • 66. © 2018 IBM Corporation 66