SlideShare a Scribd company logo
IBM SparkTechnology Center
Apache SystemML
Declarative Machine Learning
Luciano Resende
IBM | Spark Technology Center
BigDataDevelopersMeetup–Spain/Madrid–Nov2017
Spark Technology Center
lresende@apache.org@
http://lresende.blogspot.com/
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
Luciano Resende
Data Science Platform Architect – IBM – Spark Technology Center
Apache Member and also a SystemML committer and PMC member
Open Source Community Leadership
Spark Technology Center
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee memberships OSS Advisory Board
Open Source
4
IBM
Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark
community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications
— http://bigdatauniversity.com
Key statistics:
About 50 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now an Apache Incubator project.
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
Spark Technology Center
Contributions 46,385 Spark LOC
863 Spark JIRAs
457 SystemML JIRAs
67 Speakers at Events
Spark Technology Center
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC impact on community
Spark Technology Center
Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
SQL
TPC-DS and Performance
Query Pushdown/Federation
Project Focus Areas
6
Spark Technology Center
7
Origins of the SystemML Project
2007-2008: Multiple projects at IBM Research – Almaden involving machine
learning on Hadoop.
2009: We create a dedicated team for scalable ML.
2009-2010: Through engagements with customers, we observe how data scientists
create machine learning algorithms.
State-of-the-Art: Small Data
R or
Python
Data
Scientist
Personal
Computer
Data
Results
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per
iteration
😞 Errors while translating
algorithms
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
SystemML
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
SystemML
😃 Fast iteration
😃 Same answer
14
Linear Algebra
is the Language of Machine Learning.
Linear algebra is
powerful,
precise,
and high-level.
Express complex transformations over
large arrays of data…
…using a small number of instructions.
…in a clear and unambiguous way
SystemML Provides
Highly Optimized
Distributed Linear
Algebra
Running Example:
Alternating Least Squares
Problem:
Movie Recommendations
Movies
Users
i
j
User i liked movie
j.
Movies Factor
UsersFactor
Multiply these two
factors to produce a
less-sparse matrix.
×
New nonzero values
become movies
suggestions.
Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
Alternating Least Squares (in R)
1. Start with random factors.
2. Hold the Movies factor constant and
find the best value for the Users factor.
(Value that most closely approximates the original matrix)
3. Hold the Users factor constant and find
the best value for the Movies factor.
4. Repeat steps 2-3 until convergence.
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
1
2
2
3
3
4
4
4
Every line has a clear purpose!
Alternating Least Squares (spark.ml)
19
Alternating Least Squares (spark.ml)
20
Alternating Least Squares (spark.ml)
21
Alternating Least Squares (spark.ml)
22
Alternating Least Squares (spark.ml)
25 lines’ worth of algorithm…
…mixed with 800 lines of performance code
Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
Alternating Least Squares (in R)
SystemML can compile and run this algorithm at scale
No additional performance code needed!
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
alpha = norm_R2 / sum (S * HS);
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}
(in SystemML’s
subset of R)
How fast does it run?
Running time comparisons between machine learning algorithms are problematic
Different, equally-valid answers
Different convergence rates on different data
But we’ll do one anyway
Spark Technology CenterPerformance Comparison: ALS
0
5000
10000
15000
20000
1.2GB (sparse
binary)
12GB 120GB
RunningTime(sec)
R
MLLib
SystemML
>24h>24h
OOM
OOM
Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data,
sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations
tuned so that all algorithms produce comparable result quality.Details:
SystemML runs the R script in parallel
Same answer as original R script
Performance is comparable to a low-level RDD-
based implementation
Also, for python lovers, equivalent python DML
exists!
How does SystemML achieve this result?
Takeaway Points
The SystemML Optimizer and Runtime for Spark
Automates critical performance
decisions
Distributed or local computation?
How to partition the data?
To persist or not to persist?
Distributed vs local: Hybrid runtime
Multithreaded computation in Spark
Driver
Distributed computation in Spark
Executors
Optimizer makes a cost-based choice
28
High-Level Operations (HOPs)
General representation of statements in the data
analysis language
Low-Level Operations (LOPs)
General representation of operations in the
runtime framework
High-level language
front-ends
Multiple execution
environments
Cost
Based
Optimizer
Many other rewrites
Cost-based selection of operators
Dynamic recompilation for accurate stats
Parallel FOR (ParFor) optimizer
Direct operations on RDD partitions
YARN and MapReduce support
New in Next Release: Compressed Linear
Algebra
29
But wait, there’s
more!
Summary
Cost-based compilation of machine learning algorithms generates execution plans
for single-node in-memory, cluster, and hybrid execution
for varying data characteristics:
varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data
for varying cluster characteristics (memory configurations, degree of parallelism)
Out-of-the-box, scalable machine learning algorithms
e.g. descriptive statistics, regression, clustering, and classification
"Roll-your-own" algorithms
Enable programmer productivity (no worry about scalability, numeric stability, and optimizations)
Fast turn-around for new algorithms
Higher-level language shields algorithm development investment from platform
progression
Yarn for resource negotiation and elasticity
Spark for in-memory, iterative processing
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the system
improves.
31
32
Algorithms
Category Description
Descriptive Statistics
Univariate
Bivariate
Stratified Bivariate
Classification
Logistic Regression (multinomial)
Multi-Class SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Regression
Linear Regression system of equations
CG (conjugate gradient)
Generalized Linear
Models (GLM)
Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root, inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Stepwise
Linear
GLM
Dimension Reduction PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
PMML models lm, kmeans, svm, glm, mlogit
Spark Technology Center
33
What’s new in
Apache SystemML
Expressing Algorithms with SystemML
Gaussian Nonnegative Matrix Factorization
in DML (SystemML’s R-like syntax)
while (i < max_iteration) {
H <- H * ((t(W) %*% V) /
(((t(W) %*% W) %*% H)+Eps))
W <- W * ((V %*% t(H)) /
((W %*% (H %*% t(H)))+Eps))
i <- i + 1
}
Gaussian Nonnegative Matrix Factorization
in PyDML (SystemML’s Python-like syntax)
while (i < max_iteration):
H = H * (dot(W.transpose(), V) /
(dot(dot(W.transpose(), W, H)
+ Eps))
W = W * (dot(V, H.transpose()) /
(dot(W, dot(H,H.transpose()))
+ Eps))
i = i + 1
34
SystemML users write machine learning algorithms in a domain specific language.
SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications
The R4ML project provides similar functionality for SparkR.
Scikit-Learn
Compatibility: The
MLLearn API
Python API designed to be compatible with scikit-
learn and Spark MLPipelines
Algorithms that are currently part of mllearn API:
•LogisticRegression, LinearRegression, SVM, NaiveBayes
and Caffe2DML (discussed later)
Hyperparameter naming/initialization similar to
scikit-learn (penalty, fit_intercept,
normalize, …) to reduce learning curve
Supports loading and saving the model
Linear Regression Example
From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Python script using sklearn
Changes required to run on SystemML
Integration with Apache Spark’s ML Pipelines
Changes required to run on SystemML
From https://spark.apache.org/docs/latest/ml-pipeline.html
38
caffe2dml
(experimental)
caffe2dml is a tool that converts the
specification for a Caffe deep learning model
into a SystemML script to perform training or
scoring at scale.
The generated scripts produce TensorBoard-
compatible log output.
Caffe2DML
Caffe
Network
File
Caffe
Solver
File
Log
Generated DML
Script
Apache
SystemML
Example: Training Lenet with Caffe2DML
SystemML Deep
Learning `nn`
Library
• Deep learning library written in DML.
• Multiple layers:
• Core: Affine, 2D Conv, 2D Transpose Conv, 2D
Max Pooling, 1D/2D Batch Norm, RNN, LSTM
• Nonlinearity/Transfer: ReLU, Sigmoid, Tanh,
Softmax
• Regularization: Dropout, L1, L2
• Loss: Log-loss, Cross-entropy, L1, L2
• Multiple optimizers:
• SGD, SGD w/ momentum, SGD w/ Nesterov
momentum, Adagrad, RMSprop, Adam
• Layers have a simple `forward` & `backward` API.
• Optimizers have a simple `update` API.
https://github.com/apache/systemml/tree/master/scripts/nn
(LeNet-like convnet)
41
GPU Support in
SystemML Spark Technology Center
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the
system improves.
9
42
GPU Support in
SystemML
SystemML’s optimizer can target multiple runtime back
ends:
Single-node SMP
Multi-node Spark
Hybrid: Large SMP plus a pool of Spark workers
We are adding new GPU-accelerated runtimes to SystemML
Single-node single GPU
Single-node multi-GPU
Distributed multi-GPU on Spark
GPU-accelerate an algorithm without changing its code
43
GPU Support in
SystemML:
Current Status
(In Progress) Single Node, Single GPU Support
• Deep Neural Network Operators
conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply,
max_pooling, max_pooling_backward, relu_max_pooling,
relu_max_pooling_backward
• Unary Aggregates
{All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product
• Matrix Multiplication
Various shapes & sparsities
• Transpose
• Matrix-Matrix and Matrix-Scalar Element-Wise
+, -, *, /, ^
• Trigonometric & Mathematical Operations (on entire Matrices)
sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve
• Some Fused/Special Case Operators
Ax+y, X*t(X), Max(X, 0.0)
• (In Progress) Automatically determine whether to use the GPU or not
(In Progress) - Single Node, Multiple GPU Support
(Planned) - Multiple Node, Multiple GPU Support
44
Summary:
Cool New Stuff in
Apache
SystemML
Top-level Apache project
API improvements
Deep learning
Code generation
Compressed linear algebra
45
SystemML 1.0 Apache SystemML 1.0
RC1 scheduled for December 2017
Spark Technology Center
46
Apache SystemML
Tutorial
Tutorial hosted at IBM developerWorks Code
Patterns
https://developer.ibm.com/code/patterns/perform-a-machine-learning-
exercise/
Tutorial source code available on GitHub
https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_-
perform-a-machine-learning-exercise-_-Get-the-Code
Try this on DSX/IBM Cloud
https://ibm.biz/BdjJJG
47
SystemML
Tutorial
Spark Technology Center
48
Apache SystemML
References
For
More
Information…
Try Apache SystemML!
http://systemml.apache.org
Read our VLDB 2016 paper on compressed linear algebra:
Best Paper award!
Ahmed Elgohary et al, “Compressed Linear Algebra for Large-
Scale Machine Learning.” VLDB 2016
Read our CIDR 2017 paper on codegen:
Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and
Operator Fusion for Large-Scale Machine Learning,” CIDR
2017
Get the slides for our Strata 2016 talk on deep learning with
SystemML:
Leveraging deep learning to predict breast cancer proliferation
scores with Apache Spark and Apache SystemML49
SystemML
http://systemml.apache.org
SystemML source code (Github)
https://github.com/apache/systemml
DML (R) Language Reference
https://apache.github.io/systemml/dml-language-reference.html
Algorithms Reference
http://systemml.apache.org/algorithms
Runtime Reference
https://apache.github.io/systemml/#running-systemml
50
References
Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif

More Related Content

PDF
2014-06-20 Multinomial Logistic Regression with Apache Spark
PDF
Real Time Big Data Management
PPT
Functional Programming - Past, Present and Future
PDF
Multinomial Logistic Regression with Apache Spark
PPTX
Discrete Logarithmic Problem- Basis of Elliptic Curve Cryptosystems
PDF
R basics
 
PPT
Stacks queues lists
PDF
presentation
2014-06-20 Multinomial Logistic Regression with Apache Spark
Real Time Big Data Management
Functional Programming - Past, Present and Future
Multinomial Logistic Regression with Apache Spark
Discrete Logarithmic Problem- Basis of Elliptic Curve Cryptosystems
R basics
 
Stacks queues lists
presentation

What's hot (16)

PDF
Algorithem complexity in data sructure
PDF
Algorithm chapter 6
PDF
Real-Time Big Data Stream Analytics
PDF
Sequential Pattern Mining and GSP
PDF
A calculus of mobile Real-Time processes
PDF
Data Structure: Algorithm and analysis
PDF
Algebraic Approach to Implementing an ATL Model Checker
PDF
Dsp manual completed2
PPTX
Aaex4 group2(中英夾雜)
PDF
PDF
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
PPT
DESIGN AND ANALYSIS OF ALGORITHMS
PPT
Big oh Representation Used in Time complexities
PDF
Symbolic Execution as DPLL Modulo Theories
Algorithem complexity in data sructure
Algorithm chapter 6
Real-Time Big Data Stream Analytics
Sequential Pattern Mining and GSP
A calculus of mobile Real-Time processes
Data Structure: Algorithm and analysis
Algebraic Approach to Implementing an ATL Model Checker
Dsp manual completed2
Aaex4 group2(中英夾雜)
"Java 8, Lambda e la programmazione funzionale" by Theodor Dumitrescu
DESIGN AND ANALYSIS OF ALGORITHMS
Big oh Representation Used in Time complexities
Symbolic Execution as DPLL Modulo Theories
Ad

Similar to What's new in Apache SystemML - Declarative Machine Learning (20)

PDF
SystemML - Declarative Machine Learning
PPTX
Inside Apache SystemML
PPTX
HDL17_MIPS CPU Design using Verilog.pptx
PPTX
R Programming Language
PDF
Rcpp: Seemless R and C++
PPTX
1_Introduction.pptx
PPTX
R basics for MBA Students[1].pptx
PDF
20180420 hk-the powerofmysql8
PDF
Python Programming - IX. On Randomness
PDF
Rcpp: Seemless R and C++
PPTX
R Language Introduction
PDF
Parallel and Distributed computing: why parallellismpdf
PPTX
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
ODP
Scala as a Declarative Language
PPT
r,rstats,r language,r packages
ODP
PHP applications/environments monitoring: APM & Pinba
PPT
Lecture1_R Programming Introduction1.ppt
PPT
R_Language_study_forstudents_R_Material.ppt
PPT
Brief introduction to R Lecturenotes1_R .ppt
PDF
Morel, a Functional Query Language
SystemML - Declarative Machine Learning
Inside Apache SystemML
HDL17_MIPS CPU Design using Verilog.pptx
R Programming Language
Rcpp: Seemless R and C++
1_Introduction.pptx
R basics for MBA Students[1].pptx
20180420 hk-the powerofmysql8
Python Programming - IX. On Randomness
Rcpp: Seemless R and C++
R Language Introduction
Parallel and Distributed computing: why parallellismpdf
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Scala as a Declarative Language
r,rstats,r language,r packages
PHP applications/environments monitoring: APM & Pinba
Lecture1_R Programming Introduction1.ppt
R_Language_study_forstudents_R_Material.ppt
Brief introduction to R Lecturenotes1_R .ppt
Morel, a Functional Query Language
Ad

More from Luciano Resende (20)

PDF
A Jupyter kernel for Scala and Apache Spark.pdf
PDF
Using Elyra for COVID-19 Analytics
PDF
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
PDF
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
PDF
Ai pipelines powered by jupyter notebooks
PDF
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
PDF
Scaling notebooks for Deep Learning workloads
PDF
Jupyter Enterprise Gateway Overview
PPTX
Inteligencia artificial, open source e IBM Call for Code
PDF
IoT Applications and Patterns using Apache Spark & Apache Bahir
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
PDF
Open Source AI - News and examples
PDF
Building analytical microservices powered by jupyter kernels
PDF
Building iot applications with Apache Spark and Apache Bahir
PDF
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
PDF
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
PDF
Big analytics meetup - Extended Jupyter Kernel Gateway
PDF
Jupyter con meetup extended jupyter kernel gateway
PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PDF
How mentoring can help you start contributing to open source
A Jupyter kernel for Scala and Apache Spark.pdf
Using Elyra for COVID-19 Analytics
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Ai pipelines powered by jupyter notebooks
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Scaling notebooks for Deep Learning workloads
Jupyter Enterprise Gateway Overview
Inteligencia artificial, open source e IBM Call for Code
IoT Applications and Patterns using Apache Spark & Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
Open Source AI - News and examples
Building analytical microservices powered by jupyter kernels
Building iot applications with Apache Spark and Apache Bahir
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Big analytics meetup - Extended Jupyter Kernel Gateway
Jupyter con meetup extended jupyter kernel gateway
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
How mentoring can help you start contributing to open source

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Monthly Chronicles - July 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.

What's new in Apache SystemML - Declarative Machine Learning

  • 1. IBM SparkTechnology Center Apache SystemML Declarative Machine Learning Luciano Resende IBM | Spark Technology Center BigDataDevelopersMeetup–Spain/Madrid–Nov2017
  • 3. Open Source Community Leadership Spark Technology Center Founding Partner 188+ Project Committers 77+ Projects Key Open source steering committee memberships OSS Advisory Board Open Source
  • 4. 4 IBM Spark Technology Center Founded in 2015. Location: Physical: 505 Howard St., San Francisco CA Web: http://spark.tc Twitter: @apachespark_tc Mission: Contribute intellectual and technical capital to the Apache Spark community. Make the core technology enterprise- and cloud-ready. Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com Key statistics: About 50 developers, co-located with 25 IBM designers. Major contributions to Apache Spark http://jiras.spark.tc Apache SystemML is now an Apache Incubator project. Founding member of UC Berkeley AMPLab and RISE Lab Member of R Consortium and Scala Center Spark Technology Center
  • 5. Contributions 46,385 Spark LOC 863 Spark JIRAs 457 SystemML JIRAs 67 Speakers at Events Spark Technology Center Focus on meaningful code contributions across all major Spark projects 863 code contributions (JIRAs) and counting – Check out http://jiras.spark.tc Over 422 commits in Spark 2.0 , and continuing major contributions in 2.x Contributions by the Spark Technology Center across almost all components of Spark — Spark Core, SparkR, SQL, MLlib, Streaming, PySpark, build and infrastructure, etc STC impact on community
  • 6. Spark Technology Center Machine Learning Spark MLLib R4ML Online Retraining Apache Arrow SystemML Deep Learning Consumability Reference architectures Spark Notebook stack Spark Resource optimization Spark Web UI Apache Bahir RedRock Immersive Insights SQL TPC-DS and Performance Query Pushdown/Federation Project Focus Areas 6
  • 8. Origins of the SystemML Project 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2009: We create a dedicated team for scalable ML. 2009-2010: Through engagements with customers, we observe how data scientists create machine learning algorithms.
  • 9. State-of-the-Art: Small Data R or Python Data Scientist Personal Computer Data Results
  • 10. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala
  • 11. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala 😞 Days or weeks per iteration 😞 Errors while translating algorithms
  • 12. State-of-the-Art: Big Data R or Python Data Scientist Results SystemML
  • 13. State-of-the-Art: Big Data R or Python Data Scientist Results SystemML 😃 Fast iteration 😃 Same answer
  • 14. 14 Linear Algebra is the Language of Machine Learning. Linear algebra is powerful, precise, and high-level. Express complex transformations over large arrays of data… …using a small number of instructions. …in a clear and unambiguous way SystemML Provides Highly Optimized Distributed Linear Algebra
  • 15. Running Example: Alternating Least Squares Problem: Movie Recommendations Movies Users i j User i liked movie j. Movies Factor UsersFactor Multiply these two factors to produce a less-sparse matrix. × New nonzero values become movies suggestions.
  • 16. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; }
  • 17. Alternating Least Squares (in R) 1. Start with random factors. 2. Hold the Movies factor constant and find the best value for the Users factor. (Value that most closely approximates the original matrix) 3. Hold the Users factor constant and find the best value for the Movies factor. 4. Repeat steps 2-3 until convergence. U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } 1 2 2 3 3 4 4 4 Every line has a clear purpose!
  • 22. 22 Alternating Least Squares (spark.ml) 25 lines’ worth of algorithm… …mixed with 800 lines of performance code
  • 23. Alternating Least Squares (in R) U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; }
  • 24. Alternating Least Squares (in R) SystemML can compile and run this algorithm at scale No additional performance code needed! U = rand(nrow(X), r, min = -1.0, max = 1.0); V = rand(r, ncol(X), min = -1.0, max = 1.0); while(i < mi) { i = i + 1; ii = 1; if (is_U) G = (W * (U %*% V - X)) %*% t(V) + lambda * U; else G = t(U) %*% (W * (U %*% V - X)) + lambda * V; norm_G2 = sum(G ^ 2); norm_R2 = norm_G2; R = -G; S = R; while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) { if (is_U) { HS = (W * (S %*% V)) %*% t(V) + lambda * S; alpha = norm_R2 / sum (S * HS); U = U + alpha * S; } else { HS = t(U) %*% (W * (U %*% S)) + lambda * S; alpha = norm_R2 / sum (S * HS); V = V + alpha * S; } R = R - alpha * HS; old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2); S = R + (norm_R2 / old_norm_R2) * S; ii = ii + 1; } is_U = ! is_U; } (in SystemML’s subset of R)
  • 25. How fast does it run? Running time comparisons between machine learning algorithms are problematic Different, equally-valid answers Different convergence rates on different data But we’ll do one anyway
  • 26. Spark Technology CenterPerformance Comparison: ALS 0 5000 10000 15000 20000 1.2GB (sparse binary) 12GB 120GB RunningTime(sec) R MLLib SystemML >24h>24h OOM OOM Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data, sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations tuned so that all algorithms produce comparable result quality.Details:
  • 27. SystemML runs the R script in parallel Same answer as original R script Performance is comparable to a low-level RDD- based implementation Also, for python lovers, equivalent python DML exists! How does SystemML achieve this result? Takeaway Points
  • 28. The SystemML Optimizer and Runtime for Spark Automates critical performance decisions Distributed or local computation? How to partition the data? To persist or not to persist? Distributed vs local: Hybrid runtime Multithreaded computation in Spark Driver Distributed computation in Spark Executors Optimizer makes a cost-based choice 28 High-Level Operations (HOPs) General representation of statements in the data analysis language Low-Level Operations (LOPs) General representation of operations in the runtime framework High-level language front-ends Multiple execution environments Cost Based Optimizer
  • 29. Many other rewrites Cost-based selection of operators Dynamic recompilation for accurate stats Parallel FOR (ParFor) optimizer Direct operations on RDD partitions YARN and MapReduce support New in Next Release: Compressed Linear Algebra 29 But wait, there’s more!
  • 30. Summary Cost-based compilation of machine learning algorithms generates execution plans for single-node in-memory, cluster, and hybrid execution for varying data characteristics: varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data for varying cluster characteristics (memory configurations, degree of parallelism) Out-of-the-box, scalable machine learning algorithms e.g. descriptive statistics, regression, clustering, and classification "Roll-your-own" algorithms Enable programmer productivity (no worry about scalability, numeric stability, and optimizations) Fast turn-around for new algorithms Higher-level language shields algorithm development investment from platform progression Yarn for resource negotiation and elasticity Spark for in-memory, iterative processing
  • 31. Benefits of the SystemML Approach Simplifies algorithm development. Makes experimentation easier. Your code gets faster as the system improves. 31
  • 32. 32 Algorithms Category Description Descriptive Statistics Univariate Bivariate Stratified Bivariate Classification Logistic Regression (multinomial) Multi-Class SVM Naïve Bayes (multinomial) Decision Trees Random Forest Clustering k-Means Regression Linear Regression system of equations CG (conjugate gradient) Generalized Linear Models (GLM) Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli Links for all distributions: identity, log, sq. root, inverse, 1/μ2 Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit Stepwise Linear GLM Dimension Reduction PCA Matrix Factorization ALS direct solve CG (conjugate gradient descent) Survival Models Kaplan Meier Estimate Cox Proportional Hazard Regression Predict Algorithm-specific scoring Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation PMML models lm, kmeans, svm, glm, mlogit
  • 33. Spark Technology Center 33 What’s new in Apache SystemML
  • 34. Expressing Algorithms with SystemML Gaussian Nonnegative Matrix Factorization in DML (SystemML’s R-like syntax) while (i < max_iteration) { H <- H * ((t(W) %*% V) / (((t(W) %*% W) %*% H)+Eps)) W <- W * ((V %*% t(H)) / ((W %*% (H %*% t(H)))+Eps)) i <- i + 1 } Gaussian Nonnegative Matrix Factorization in PyDML (SystemML’s Python-like syntax) while (i < max_iteration): H = H * (dot(W.transpose(), V) / (dot(dot(W.transpose(), W, H) + Eps)) W = W * (dot(V, H.transpose()) / (dot(W, dot(H,H.transpose())) + Eps)) i = i + 1 34 SystemML users write machine learning algorithms in a domain specific language. SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications The R4ML project provides similar functionality for SparkR.
  • 35. Scikit-Learn Compatibility: The MLLearn API Python API designed to be compatible with scikit- learn and Spark MLPipelines Algorithms that are currently part of mllearn API: •LogisticRegression, LinearRegression, SVM, NaiveBayes and Caffe2DML (discussed later) Hyperparameter naming/initialization similar to scikit-learn (penalty, fit_intercept, normalize, …) to reduce learning curve Supports loading and saving the model
  • 36. Linear Regression Example From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html Python script using sklearn Changes required to run on SystemML
  • 37. Integration with Apache Spark’s ML Pipelines Changes required to run on SystemML From https://spark.apache.org/docs/latest/ml-pipeline.html
  • 38. 38 caffe2dml (experimental) caffe2dml is a tool that converts the specification for a Caffe deep learning model into a SystemML script to perform training or scoring at scale. The generated scripts produce TensorBoard- compatible log output. Caffe2DML Caffe Network File Caffe Solver File Log Generated DML Script Apache SystemML
  • 39. Example: Training Lenet with Caffe2DML
  • 40. SystemML Deep Learning `nn` Library • Deep learning library written in DML. • Multiple layers: • Core: Affine, 2D Conv, 2D Transpose Conv, 2D Max Pooling, 1D/2D Batch Norm, RNN, LSTM • Nonlinearity/Transfer: ReLU, Sigmoid, Tanh, Softmax • Regularization: Dropout, L1, L2 • Loss: Log-loss, Cross-entropy, L1, L2 • Multiple optimizers: • SGD, SGD w/ momentum, SGD w/ Nesterov momentum, Adagrad, RMSprop, Adam • Layers have a simple `forward` & `backward` API. • Optimizers have a simple `update` API. https://github.com/apache/systemml/tree/master/scripts/nn (LeNet-like convnet)
  • 41. 41 GPU Support in SystemML Spark Technology Center Benefits of the SystemML Approach Simplifies algorithm development. Makes experimentation easier. Your code gets faster as the system improves. 9
  • 42. 42 GPU Support in SystemML SystemML’s optimizer can target multiple runtime back ends: Single-node SMP Multi-node Spark Hybrid: Large SMP plus a pool of Spark workers We are adding new GPU-accelerated runtimes to SystemML Single-node single GPU Single-node multi-GPU Distributed multi-GPU on Spark GPU-accelerate an algorithm without changing its code
  • 43. 43 GPU Support in SystemML: Current Status (In Progress) Single Node, Single GPU Support • Deep Neural Network Operators conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply, max_pooling, max_pooling_backward, relu_max_pooling, relu_max_pooling_backward • Unary Aggregates {All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product • Matrix Multiplication Various shapes & sparsities • Transpose • Matrix-Matrix and Matrix-Scalar Element-Wise +, -, *, /, ^ • Trigonometric & Mathematical Operations (on entire Matrices) sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve • Some Fused/Special Case Operators Ax+y, X*t(X), Max(X, 0.0) • (In Progress) Automatically determine whether to use the GPU or not (In Progress) - Single Node, Multiple GPU Support (Planned) - Multiple Node, Multiple GPU Support
  • 44. 44 Summary: Cool New Stuff in Apache SystemML Top-level Apache project API improvements Deep learning Code generation Compressed linear algebra
  • 45. 45 SystemML 1.0 Apache SystemML 1.0 RC1 scheduled for December 2017
  • 47. Tutorial hosted at IBM developerWorks Code Patterns https://developer.ibm.com/code/patterns/perform-a-machine-learning- exercise/ Tutorial source code available on GitHub https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_- perform-a-machine-learning-exercise-_-Get-the-Code Try this on DSX/IBM Cloud https://ibm.biz/BdjJJG 47 SystemML Tutorial
  • 48. Spark Technology Center 48 Apache SystemML References
  • 49. For More Information… Try Apache SystemML! http://systemml.apache.org Read our VLDB 2016 paper on compressed linear algebra: Best Paper award! Ahmed Elgohary et al, “Compressed Linear Algebra for Large- Scale Machine Learning.” VLDB 2016 Read our CIDR 2017 paper on codegen: Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning,” CIDR 2017 Get the slides for our Strata 2016 talk on deep learning with SystemML: Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML49
  • 50. SystemML http://systemml.apache.org SystemML source code (Github) https://github.com/apache/systemml DML (R) Language Reference https://apache.github.io/systemml/dml-language-reference.html Algorithms Reference http://systemml.apache.org/algorithms Runtime Reference https://apache.github.io/systemml/#running-systemml 50 References Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif