SlideShare a Scribd company logo
Can one algorithm rule them all?
How to automate statistical computations
Alp Kucukelbir
COLUMBIA UNIVERSITY
Can one algorithm rule them all?
Not yet. (But some tools can help!)
Rajesh Ranganath Dustin Tran
Andrew Gelman David Blei
Machine Learning
data
machine
learning
hidden
patterns
We want to discover and explore hidden patterns
to study hard-to-see connections,
to predict future outcomes,
to explore causal relationships.
How taxis navigate the city of Porto [1.7m trips] (K et.al., 2016).
How do we use machine learning?
statistical model
data
machine
learning
expert
hidden
patterns
many months later
statistical model
data
machine
learning
expert
hidden
patterns
many months later
statistical model
data
machine
learning
expert
hidden
patterns
many months later
Statistical Model
Make assumptions about data.
Capture uncertainties using probability.
statistical model
data
machine
learning
expert
hidden
patterns
many months later
Statistical Model
Make assumptions about data.
Capture uncertainties using probability.
statistical model
data
machine
learning
expert
hidden
patterns
many months later
Statistical Model
Make assumptions about data.
Capture uncertainties using probability.
Machine Learning Expert
aka a PhD student.
statistical model
data
machine
learning
expert
hidden
patterns
many months later
Statistical Model
Make assumptions about data.
Capture uncertainties using probability.
Machine Learning Expert
aka a PhD student.
statistical model
data
machine
learning
expert
hidden
patterns
many months later
Machine learning should be
1. Easy to use 2. Scalable 3. Flexible.
statistical model
data
automatic
tool
hidden
patternsinstant
revise
Machine learning should be
1. Easy to use 2. Scalable 3. Flexible.
statistical model
data
automatic
tool
hidden
patternsinstant
revise
Machine learning should be
1. Easy to use 2. Scalable 3. Flexible.
“[Statistical] models are developed iteratively: we build a
model, use it to analyze data, assess how it succeeds and
fails, revise it, and repeat.” (Box, 1960; Blei, 2014)
What does this automatic tool need to do?
statistical model
data
machine
learning
expert
hidden
patterns
many months later
statistical model
data
inference
(maths)
inference
(algorithm)
hidden
patterns
statistical model
data
inference
(maths)
inference
(algorithm)
hidden
patterns
X θ
Bayesian Model
likelihood p(X | θ)
model p(X,θ) = p(X | θ) p(θ)
prior p(θ)
statistical model
data
inference
(maths)
inference
(algorithm)
hidden
patterns
X θ
Bayesian Model
likelihood p(X | θ)
model p(X,θ) = p(X | θ) p(θ)
prior p(θ)
The model describes a data generating process.
The latent variables θ capture hidden patterns.
statistical model
data
inference
(maths)
inference
(algorithm)
hidden
patterns
X θ
Bayesian Inference
posterior p(θ | X) =
p(X,θ)
p(X,θ)dθ
The posterior describes hidden patterns given data X.
It is typically intractable.
statistical model
data
inference
(maths)
inference
(algorithm)
hidden
patterns
X θ
Approximating the Posterior
Sampling draw samples using MCMC
Variational approximate using a simple function
The computations depend heavily on the model!
Common Statistical Computations
Expectations
q(θ;φ) logp(X,θ) = logp(X,θ) q(θ;φ)dθ
Gradients (of expectations)
∇φ q(θ;φ) logp(X,θ)
Maximization (by following gradients)
max
φ
q(θ;φ) logp(X,θ)
Automating Expectations
Monte Carlo sampling
θ
f(θ)
a a + 1
θ
f(θ)
a a + 1
f(θ(s)
)
a+1
a
f(θ)dθ ≈
1
S
S
s=1
f(θ(s)
)
where θ(s)
∼ Uniform(a,a + 1)
Automating Expectations
Monte Carlo sampling
q(θ;φ) logp(X,θ) = logp(X,θ) q(θ;φ)dθ
≈
1
S
S
s=1
logp(X,θ(s)
)
where θ(s)
∼ q(θ;φ)
Monte Carlo Statistical Methods, Robert and Casella, 1999
Monte Carlo and Quasi-Monte Carlo Sampling, Lemieux, 2009
Automating Expectations
Probability Distributions
Stan, GSL (C++)
NumPy, SciPy, edward (Python)
built-in (R)
Distributions.jl (Julia)
Automating Gradients
Symbolic or Automatic Differentiation
Let f(x1,x2) = logx1 +x1x2 −sinx2. Compute ∂ f(2,5)/∂ x1.
Automatic di↵erentiation in machine learning: a survey 9
Table 2 Forward mode AD example, with y = f(x1, x2) = ln(x1) + x1x2 sin(x2) at
(x1, x2) = (2, 5) and setting ˙x1 = 1 to compute @y
@x1
. The original forward run on the left
is augmented by the forward AD operations on the right, where each line supplements the
original on its left.
Forward Evaluation Trace
v 1 = x1 = 2
v0 = x2 = 5
v1 = ln v 1 = ln 2
v2 = v 1 ⇥v0 = 2 ⇥ 5
v3 = sin v0 = sin 5
v4 = v1 + v2 = 0.693 + 10
v5 = v4 v3 = 10.693 + 0.959
y = v5 = 11.652
Forward Derivative Trace
˙v 1 = ˙x1 = 1
˙v0 = ˙x2 = 0
˙v1 = ˙v 1/v 1 = 1/2
˙v2 = ˙v 1⇥v0+ ˙v0⇥v 1 = 1⇥5+0⇥2
˙v3 = ˙v0 ⇥ cos v0 = 0 ⇥ cos 5
˙v4 = ˙v1 + ˙v2 = 0.5 + 5
˙v5 = ˙v4 ˙v3 = 5.5 0
˙y = ˙v5 = 5.5
each intermediate variable vi a derivative
˙vi =
@vi
@x1
.
Applying the chain rule to each elementary operation in the forward evalu-
ation trace, we generate the corresponding derivative trace, given on the right
hand side of Table 2. Evaluating variables vi one by one together with their
corresponding ˙vi values gives us the required derivative in the final variable
@y
Automatic differentiation in machine learning: a survey, Baydin
et al., 2015
#include < stan /math . hpp>
i n t main () {
using namespace std ;
stan : : math : : var x1 = 2 , x2 = 5;
stan : : math : : var f ;
f = log ( x1 ) + x1*x2 - sin ( x2 ) ;
cout << " f ( x1 , x2 ) = " << f . val () << endl ;
f . grad () ;
cout << " df / dx1 = " << x1 . adj () << endl
<< " df / dx2 = " << x2 . adj () << endl ;
return 0;
}
The Stan math library, Carpenter et al., 2015
Automating Gradients
Automatic Differentiation
Stan, Adept, CppAD (C++)
autograd, Tensorflow (Python)
radx (R)
http://www.juliadiff.org/ (Julia)
Symbolic Differentiation
SymbolicC++ (C++)
SymPy, Theano (Python)
Deriv, Ryacas (R)
http://www.juliadiff.org/ (Julia)
Stochastic Optimization
Follow noisy unbiased gradients.
8.5. Online learning and stochastic optimization
black line = LMS trajectory towards LS soln (red cross)
w0
w1
−1 0 1 2 3
−1
−0.5
0
0.5
1
1.5
2
2.5
3
(a)
0 5 10 15
3
4
5
6
7
8
9
10
RSS vs iteration
(b)
Figure 8.8 Illustration of the LMS algorithm. Left: we start from θ = (−0.5,
to the least squares solution of ˆθ = (1.45, 0.92) (red cross). Right: plot of obje
Note that it does not decrease monotonically. Figure generated by LMSdemo.
where i = i(k) is the training example to use at iteration k. If the data s
i(k) = k; we shall assume this from now on, for notational simplicity.
Figure 8.8a.
Scale up by subsampling the data at each step.
Machine Learning: a Probabilistic Perspective, Murphy, 2012
Stochastic Optimization
Generic Implementations
Vowpal Wabbit, sgd (C++)
Theano, Tensorflow (Python)
sgd (R)
SGDOptim.jl (Julia)
ADVI (Automatic Differentiation Variational Inference)
An easy-to use, scalable, flexible algorithm
smc‐ tan.org
Stan is a probabilistic programming system.
1. Write the model in a simple language.
2. Provide data.
3. Run.
RStan, PyStan, Stan.jl, ...
How taxis navigate the city of Porto [1.7m trips] (K et.al., 2016).
Exploring Taxi Rides
Data: 1.7 million taxi rides
Write down a pPCA model. (∼minutes)
Use ADVI to infer subspace. (∼hours)
Project data into pPCA subspace. (∼minutes)
Write down a mixture model. (∼minutes)
Use ADVI to find patterns. (∼minutes)
Write down a supervised pPCA model. (∼minutes)
Repeat. (∼hours)
What would have taken us weeks → a single day.
statistical model
data
automatic
tool
hidden
patternsinstant
revise
Monte Carlo Statistical Methods, Robert and Casella, 1999
Monte Carlo and Quasi-Monte Carlo Sampling, Lemieux, 2009
Automatic differentiation in machine learning: a survey, Baydin et al., 2015
The Stan math library, Carpenter et al., 2015
Machine Learning: a Probabilistic Perspective, Murphy, 2012
Automatic differentiation variational inference, K et al., 2016
proditus.com mc-stan.org Thank you!
EXTRA SLIDES
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
Kullback Leibler Divergence
KL(q(θ) p(θ | X)) =
θ
q(θ)log
q(θ)
p(θ | X)
dθ
= q(θ) log
q(θ)
p(θ | X)
= q(θ) [logq(θ) − logp(θ | X)]
Related Objective Function
(φ) = logp(X) − KL(q(θ) p(θ | X))
= logp(X) − q(θ) [logq(θ) − logp(θ | X)]
= logp(X) + q(θ) [logp(X | θ)] − q(θ) [logq(θ)]
= q(θ) [logp(θ,X)] − q(θ) [logq(θ)]
= q(θ ;φ) logp(X,θ)
cross-entropy
− q(θ ;φ) logq(θ ; φ)
entropy

More Related Content

What's hot (20)

PPTX
Computer Science Assignment Help
Programming Homework Help
 
PPTX
Fourier Transform Assignment Help
Matlab Assignment Experts
 
PDF
xldb-2015
Mohitdeep Singh
 
PDF
Introduction to R Graphics with ggplot2
izahn
 
PPTX
Lightning talk at MLConf NYC 2015
Mohitdeep Singh
 
PDF
Cubist
FAO
 
PDF
Parallel External Memory Algorithms Applied to Generalized Linear Models
Revolution Analytics
 
PDF
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Alexis Perrier
 
PDF
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Christopher Conlan
 
PDF
Machine Learning Basics for Web Application Developers
Etsuji Nakai
 
PDF
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
Hesen Peng
 
PPTX
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
 
PPTX
PCA and SVD in brief
N. I. Md. Ashafuddula
 
PDF
Graphing stata (2 hour course)
izahn
 
PPTX
Principal component analysis
Farah M. Altufaili
 
PDF
Cs229 notes10
VuTran231
 
DOCX
Principal Component Analysis
Mason Ziemer
 
PDF
Machine learning (11)
NYversity
 
Computer Science Assignment Help
Programming Homework Help
 
Fourier Transform Assignment Help
Matlab Assignment Experts
 
xldb-2015
Mohitdeep Singh
 
Introduction to R Graphics with ggplot2
izahn
 
Lightning talk at MLConf NYC 2015
Mohitdeep Singh
 
Cubist
FAO
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Revolution Analytics
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Alexis Perrier
 
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
The Statistical and Applied Mathematical Sciences Institute
 
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Christopher Conlan
 
Machine Learning Basics for Web Application Developers
Etsuji Nakai
 
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
The Statistical and Applied Mathematical Sciences Institute
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Hesen Peng
 
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
 
PCA and SVD in brief
N. I. Md. Ashafuddula
 
Graphing stata (2 hour course)
izahn
 
Principal component analysis
Farah M. Altufaili
 
Cs229 notes10
VuTran231
 
Principal Component Analysis
Mason Ziemer
 
Machine learning (11)
NYversity
 

Viewers also liked (19)

PDF
Improving Data Interoperability for Python and R
Work-Bench
 
PDF
High-Performance Python
Work-Bench
 
PPTX
Inside the R Consortium
Work-Bench
 
PDF
Scaling Analysis Responsibly
Work-Bench
 
PDF
Scaling Data Science at Airbnb
Work-Bench
 
PDF
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
Work-Bench
 
PDF
The Feels
Work-Bench
 
PDF
Analyzing NYC Transit Data
Work-Bench
 
PDF
Reflection on the Data Science Profession in NYC
Work-Bench
 
PDF
The Political Impact of Social Penumbras
Work-Bench
 
PDF
Data Science Challenges in Personal Program Analysis
Work-Bench
 
PDF
Using R at NYT Graphics
Work-Bench
 
PDF
Thinking Small About Big Data
Work-Bench
 
PDF
Building Scalable Prediction Services in R
Work-Bench
 
PDF
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
Work-Bench
 
PDF
Iterating over statistical models: NCAA tournament edition
Work-Bench
 
PDF
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Work-Bench
 
PDF
R for Everything
Work-Bench
 
PDF
Julia + R for Data Science
Work-Bench
 
Improving Data Interoperability for Python and R
Work-Bench
 
High-Performance Python
Work-Bench
 
Inside the R Consortium
Work-Bench
 
Scaling Analysis Responsibly
Work-Bench
 
Scaling Data Science at Airbnb
Work-Bench
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
Work-Bench
 
The Feels
Work-Bench
 
Analyzing NYC Transit Data
Work-Bench
 
Reflection on the Data Science Profession in NYC
Work-Bench
 
The Political Impact of Social Penumbras
Work-Bench
 
Data Science Challenges in Personal Program Analysis
Work-Bench
 
Using R at NYT Graphics
Work-Bench
 
Thinking Small About Big Data
Work-Bench
 
Building Scalable Prediction Services in R
Work-Bench
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
Work-Bench
 
Iterating over statistical models: NCAA tournament edition
Work-Bench
 
Dr. Datascience or: How I Learned to Stop Munging and Love Tests
Work-Bench
 
R for Everything
Work-Bench
 
Julia + R for Data Science
Work-Bench
 
Ad

Similar to One Algorithm to Rule Them All: How to Automate Statistical Computation (20)

PPTX
Bayesian Neural Networks
Natan Katz
 
PPT
Statistical Machine________ Learning.ppt
SandeepGupta229023
 
PDF
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
PPT
Jörg Stelzer
butest
 
PPTX
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPT
Machine Learning and Statistical Analysis
butest
 
PPTX
Machine Learning Algorithms Review(Part 2)
Zihui Li
 
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
Umberto Picchini
 
PDF
Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris
Paris Women in Machine Learning and Data Science
 
PPTX
Monte Carlo Berkeley.pptx
HaibinSu2
 
PPTX
Informs presentation new ppt
Salford Systems
 
PDF
MLEARN 210 B Autumn 2018: Lecture 1
heinestien
 
PDF
Some Take-Home Message about Machine Learning
Gianluca Bontempi
 
PDF
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
Paris Women in Machine Learning and Data Science
 
Bayesian Neural Networks
Natan Katz
 
Statistical Machine________ Learning.ppt
SandeepGupta229023
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Jörg Stelzer
butest
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
butest
 
Machine Learning Algorithms Review(Part 2)
Zihui Li
 
Inference for stochastic differential equations via approximate Bayesian comp...
Umberto Picchini
 
Workshop on Bayesian Workflows with CmdStanPy by Mitzi Morris
Paris Women in Machine Learning and Data Science
 
Monte Carlo Berkeley.pptx
HaibinSu2
 
Informs presentation new ppt
Salford Systems
 
MLEARN 210 B Autumn 2018: Lecture 1
heinestien
 
Some Take-Home Message about Machine Learning
Gianluca Bontempi
 
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
Paris Women in Machine Learning and Data Science
 
Ad

More from Work-Bench (8)

PDF
2017 Enterprise Almanac
Work-Bench
 
PDF
AI to Enable Next Generation of People Managers
Work-Bench
 
PDF
Startup Recruiting Workbook: Sourcing and Interview Process
Work-Bench
 
PDF
Cloud Native Infrastructure Management Solutions Compared
Work-Bench
 
PPTX
Building a Demand Generation Machine at MongoDB
Work-Bench
 
PPTX
How to Market Your Startup to the Enterprise
Work-Bench
 
PDF
Marketing & Design for the Enterprise
Work-Bench
 
PDF
Playing the Marketing Long Game
Work-Bench
 
2017 Enterprise Almanac
Work-Bench
 
AI to Enable Next Generation of People Managers
Work-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Work-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Work-Bench
 
Building a Demand Generation Machine at MongoDB
Work-Bench
 
How to Market Your Startup to the Enterprise
Work-Bench
 
Marketing & Design for the Enterprise
Work-Bench
 
Playing the Marketing Long Game
Work-Bench
 

Recently uploaded (20)

PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PDF
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
 
DOCX
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PPTX
Cultural Diversity Presentation.pptx
Shwong11
 
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PPTX
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PDF
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
 
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
Cultural Diversity Presentation.pptx
Shwong11
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
Discrete Logarithm Problem in Cryptography (1).pptx
meshablinx38
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 

One Algorithm to Rule Them All: How to Automate Statistical Computation