What's new in Apache SystemML - Declarative Machine Learning

IBM SparkTechnology Center
Apache SystemML
Declarative Machine Learning
Luciano Resende
IBM | Spark Technology Center
BigDataDevelopersMeetup–Spain/Madrid–Nov2017

Spark Technology Center
lresende@apache.org@
http://lresende.blogspot.com/
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
Luciano Resende
Data Science Platform Architect – IBM – Spark Technology Center
Apache Member and also a SystemML committer and PMC member

Open Source Community Leadership
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee memberships OSS Advisory Board
Open Source

4
IBM
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark
community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications
— http://bigdatauniversity.com
Key statistics:
About 50 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now an Apache Incubator project.
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center

Contributions 46,385 Spark LOC
863 Spark JIRAs
457 SystemML JIRAs
67 Speakers at Events
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC impact on community

Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
SQL
TPC-DS and Performance
Query Pushdown/Federation
Project Focus Areas
6

Origins of the SystemML Project
2007-2008: Multiple projects at IBM Research – Almaden involving machine
learning on Hadoop.
2009: We create a dedicated team for scalable ML.
2009-2010: Through engagements with customers, we observe how data scientists
create machine learning algorithms.

State-of-the-Art: Small Data
R or
Python
Data
Scientist
Personal
Computer
Data
Results

State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala

R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per
iteration
😞 Errors while translating
algorithms

R or
Python
Data
Scientist
Results
SystemML

R or
Python
Data
Scientist
Results
SystemML
😃 Fast iteration
😃 Same answer

14
Linear Algebra
is the Language of Machine Learning.
Linear algebra is
powerful,
precise,
and high-level.
Express complex transformations over
large arrays of data…
…using a small number of instructions.
…in a clear and unambiguous way
SystemML Provides
Highly Optimized
Distributed Linear
Algebra

Running Example:
Alternating Least Squares
Problem:
Movie Recommendations
Movies
Users
i
j
User i liked movie
j.
Movies Factor
UsersFactor
Multiply these two
factors to produce a
less-sparse matrix.
×
New nonzero values
become movies
suggestions.

Alternating Least Squares (in R)
U = rand(nrow(X), r, min = -1.0, max = 1.0);
V = rand(r, ncol(X), min = -1.0, max = 1.0);
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
norm_G2 = sum(G ^ 2); norm_R2 = norm_G2;
R = -G; S = R;
while(norm_R2 > 10E-9 * norm_G2 & ii <= mii) {
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
alpha = norm_R2 / sum (S * HS);
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
V = V + alpha * S;
}
R = R - alpha * HS;
old_norm_R2 = norm_R2; norm_R2 = sum(R ^ 2);
S = R + (norm_R2 / old_norm_R2) * S;
ii = ii + 1;
}
is_U = ! is_U;
}

1. Start with random factors.
2. Hold the Movies factor constant and
find the best value for the Users factor.
(Value that most closely approximates the original matrix)
3. Hold the Users factor constant and find
the best value for the Movies factor.
4. Repeat steps 2-3 until convergence.
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
R = -G; S = R;
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
V = V + alpha * S;
}
R = R - alpha * HS;
ii = ii + 1;
}
is_U = ! is_U;
}
1
2
2
3
3
4
4
4
Every line has a clear purpose!

Alternating Least Squares (spark.ml)

19

20

21

22
25 lines’ worth of algorithm…
…mixed with 800 lines of performance code

SystemML can compile and run this algorithm at scale
No additional performance code needed!
while(i < mi) {
i = i + 1; ii = 1;
if (is_U)
G = (W * (U %*% V - X)) %*% t(V) + lambda * U;
else
G = t(U) %*% (W * (U %*% V - X)) + lambda * V;
R = -G; S = R;
if (is_U) {
HS = (W * (S %*% V)) %*% t(V) + lambda * S;
U = U + alpha * S;
} else {
HS = t(U) %*% (W * (U %*% S)) + lambda * S;
V = V + alpha * S;
}
R = R - alpha * HS;
ii = ii + 1;
}
is_U = ! is_U;
}
(in SystemML’s
subset of R)

How fast does it run?
Running time comparisons between machine learning algorithms are problematic
Different, equally-valid answers
Different convergence rates on different data
But we’ll do one anyway

Spark Technology CenterPerformance Comparison: ALS
0
5000
10000
15000
20000
1.2GB (sparse
binary)
12GB 120GB
RunningTime(sec)
R
MLLib
SystemML
>24h>24h
OOM
OOM
Synthetic data, 0.01 sparsity, 10^5 products × {10^5,10^6,10^7} users. Data generated by multiplying two rank-50 matrices of normally-distributed data,
sampling from the resulting product, then adding Gaussian noise. Cluster of 6 servers with 12 cores and 96GB of memory per server. Number of iterations
tuned so that all algorithms produce comparable result quality.Details:

SystemML runs the R script in parallel
Same answer as original R script
Performance is comparable to a low-level RDD-
based implementation
Also, for python lovers, equivalent python DML
exists!
How does SystemML achieve this result?
Takeaway Points

The SystemML Optimizer and Runtime for Spark
Automates critical performance
decisions
Distributed or local computation?
How to partition the data?
To persist or not to persist?
Distributed vs local: Hybrid runtime
Multithreaded computation in Spark
Driver
Distributed computation in Spark
Executors
Optimizer makes a cost-based choice
28
High-Level Operations (HOPs)
General representation of statements in the data
analysis language
Low-Level Operations (LOPs)
General representation of operations in the
runtime framework
High-level language
front-ends
Multiple execution
environments
Cost
Based
Optimizer

Many other rewrites
Cost-based selection of operators
Dynamic recompilation for accurate stats
Parallel FOR (ParFor) optimizer
Direct operations on RDD partitions
YARN and MapReduce support
New in Next Release: Compressed Linear
Algebra
29
But wait, there’s
more!

Summary
Cost-based compilation of machine learning algorithms generates execution plans
for single-node in-memory, cluster, and hybrid execution
for varying data characteristics:
varying number of observations (1,000s to 10s of billions), number of variables (10s to 10s of millions), dense and sparse data
for varying cluster characteristics (memory configurations, degree of parallelism)
Out-of-the-box, scalable machine learning algorithms
e.g. descriptive statistics, regression, clustering, and classification
"Roll-your-own" algorithms
Enable programmer productivity (no worry about scalability, numeric stability, and optimizations)
Fast turn-around for new algorithms
Higher-level language shields algorithm development investment from platform
progression
Yarn for resource negotiation and elasticity
Spark for in-memory, iterative processing

Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the system
improves.
31

32
Algorithms
Category Description
Descriptive Statistics
Univariate
Bivariate
Stratified Bivariate
Classification
Logistic Regression (multinomial)
Multi-Class SVM
Naïve Bayes (multinomial)
Decision Trees
Random Forest
Clustering k-Means
Regression
Linear Regression system of equations
CG (conjugate gradient)
Generalized Linear
Models (GLM)
Distributions: Gaussian, Poisson, Gamma, Inverse Gaussian, Binomial, Bernoulli
Links for all distributions: identity, log, sq. root, inverse, 1/μ2
Links for Binomial / Bernoulli: logit, probit, cloglog, cauchit
Stepwise
Linear
GLM
Dimension Reduction PCA
Matrix Factorization ALS
direct solve
CG (conjugate gradient descent)
Survival Models
Kaplan Meier Estimate
Cox Proportional Hazard Regression
Predict Algorithm-specific scoring
Transformation (native) Recoding, dummy coding, binning, scaling, missing value imputation
PMML models lm, kmeans, svm, glm, mlogit

33
What’s new in
Apache SystemML

Expressing Algorithms with SystemML
Gaussian Nonnegative Matrix Factorization
in DML (SystemML’s R-like syntax)
while (i < max_iteration) {
H <- H * ((t(W) %*% V) /
(((t(W) %*% W) %*% H)+Eps))
W <- W * ((V %*% t(H)) /
((W %*% (H %*% t(H)))+Eps))
i <- i + 1
}
Gaussian Nonnegative Matrix Factorization
in PyDML (SystemML’s Python-like syntax)
while (i < max_iteration):
H = H * (dot(W.transpose(), V) /
(dot(dot(W.transpose(), W, H)
+ Eps))
W = W * (dot(V, H.transpose()) /
(dot(W, dot(H,H.transpose()))
+ Eps))
i = i + 1
34
SystemML users write machine learning algorithms in a domain specific language.
SystemML has APIs for embedding these algorithms in Python, Scala, or Java Spark applications
The R4ML project provides similar functionality for SparkR.

Scikit-Learn
Compatibility: The
MLLearn API
Python API designed to be compatible with scikit-
learn and Spark MLPipelines
Algorithms that are currently part of mllearn API:
•LogisticRegression, LinearRegression, SVM, NaiveBayes
and Caffe2DML (discussed later)
Hyperparameter naming/initialization similar to
scikit-learn (penalty, fit_intercept,
normalize, …) to reduce learning curve
Supports loading and saving the model

Linear Regression Example
From http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Python script using sklearn
Changes required to run on SystemML

Integration with Apache Spark’s ML Pipelines
Changes required to run on SystemML
From https://spark.apache.org/docs/latest/ml-pipeline.html

38
caffe2dml
(experimental)
caffe2dml is a tool that converts the
specification for a Caffe deep learning model
into a SystemML script to perform training or
scoring at scale.
The generated scripts produce TensorBoard-
compatible log output.
Caffe2DML
Caffe
Network
File
Caffe
Solver
File
Log
Generated DML
Script
Apache
SystemML

Example: Training Lenet with Caffe2DML

SystemML Deep
Learning `nn`
Library
• Deep learning library written in DML.
• Multiple layers:
• Core: Affine, 2D Conv, 2D Transpose Conv, 2D
Max Pooling, 1D/2D Batch Norm, RNN, LSTM
• Nonlinearity/Transfer: ReLU, Sigmoid, Tanh,
Softmax
• Regularization: Dropout, L1, L2
• Loss: Log-loss, Cross-entropy, L1, L2
• Multiple optimizers:
• SGD, SGD w/ momentum, SGD w/ Nesterov
momentum, Adagrad, RMSprop, Adam
• Layers have a simple `forward` & `backward` API.
• Optimizers have a simple `update` API.
https://github.com/apache/systemml/tree/master/scripts/nn
(LeNet-like convnet)

41
GPU Support in
SystemML Spark Technology Center
Benefits of the
SystemML
Approach
Simplifies algorithm development.
Makes experimentation easier.
Your code gets faster as the
system improves.
9

42
GPU Support in
SystemML
SystemML’s optimizer can target multiple runtime back
ends:
Single-node SMP
Multi-node Spark
Hybrid: Large SMP plus a pool of Spark workers
We are adding new GPU-accelerated runtimes to SystemML
Single-node single GPU
Single-node multi-GPU
Distributed multi-GPU on Spark
GPU-accelerate an algorithm without changing its code

43
GPU Support in
SystemML:
Current Status
(In Progress) Single Node, Single GPU Support
• Deep Neural Network Operators
conv2d, conv2d_backward_data, conv2d_backward_filter, bias_add, bias_multiply,
max_pooling, max_pooling_backward, relu_max_pooling,
relu_max_pooling_backward
• Unary Aggregates
{All/Row/Col}-Sum, Mean, Variance, Min, Max & All-Product
• Matrix Multiplication
Various shapes & sparsities
• Transpose
• Matrix-Matrix and Matrix-Scalar Element-Wise
+, -, *, /, ^
• Trigonometric & Mathematical Operations (on entire Matrices)
sin, cos, tan, asin, acos, atan, log, sqrt, abs, floor, round, ceil, solve
• Some Fused/Special Case Operators
Ax+y, X*t(X), Max(X, 0.0)
• (In Progress) Automatically determine whether to use the GPU or not
(In Progress) - Single Node, Multiple GPU Support
(Planned) - Multiple Node, Multiple GPU Support

44
Summary:
Cool New Stuff in
Apache
SystemML
Top-level Apache project
API improvements
Deep learning
Code generation
Compressed linear algebra

45
SystemML 1.0 Apache SystemML 1.0
RC1 scheduled for December 2017

46
Apache SystemML
Tutorial

Tutorial hosted at IBM developerWorks Code
Patterns
https://developer.ibm.com/code/patterns/perform-a-machine-learning-
exercise/
Tutorial source code available on GitHub
https://github.com/IBM/SystemML_Usage?cm_sp=Developer-_-
perform-a-machine-learning-exercise-_-Get-the-Code
Try this on DSX/IBM Cloud
https://ibm.biz/BdjJJG
47
SystemML
Tutorial

48
Apache SystemML
References

For
More
Information…
Try Apache SystemML!
http://systemml.apache.org
Read our VLDB 2016 paper on compressed linear algebra:
Best Paper award!
Ahmed Elgohary et al, “Compressed Linear Algebra for Large-
Scale Machine Learning.” VLDB 2016
Read our CIDR 2017 paper on codegen:
Tarek Elgamal et al, “SPOOF: Sum-Product Optimization and
Operator Fusion for Large-Scale Machine Learning,” CIDR
2017
Get the slides for our Strata 2016 talk on deep learning with
SystemML:
Leveraging deep learning to predict breast cancer proliferation
scores with Apache Spark and Apache SystemML49

SystemML
http://systemml.apache.org
SystemML source code (Github)
https://github.com/apache/systemml
DML (R) Language Reference
https://apache.github.io/systemml/dml-language-reference.html
Algorithms Reference
http://systemml.apache.org/algorithms
Runtime Reference
https://apache.github.io/systemml/#running-systemml
50
References
Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif

What's new in Apache SystemML - Declarative Machine Learning

More Related Content

What's hot (16)

Similar to What's new in Apache SystemML - Declarative Machine Learning (20)

More from Luciano Resende (20)

Recently uploaded (20)

What's new in Apache SystemML - Declarative Machine Learning