SlideShare a Scribd company logo
Say What You Mean
Braxton McKee, CEO & Founder
Scaling up machine learning algorithms
directly from source code
Q: Why should I have to rewrite my
program as my dataset gets larger?
def sq_distance(p1,p2):
return sum((p1[i]-p2[i])**2 for i in range(len(p1)))
def index_of_nearest(p, points):
return min((sq_distance(p, points[i]),i)
for i in range(len(points)))[1]
def nearest_center(points, centers):
return [index_of_nearest(p, centers) for p in points]
Example: Nearest Neighbors
Unfortunately, this is not fast.
A: You shouldn’t have to!
Q: Why should I have to rewrite my
program as my dataset gets larger?
Pyfora
Automatically scalable Python
for large-scale machine learning and data science
100% Open Source
http://github.com/ufora/ufora
http://docs.pyfora.com/
Goals of Pyfora
• Provide identical semantics to regular Python
• Easily use hundreds of CPUs / GPUs and TBs of
RAM
• Scale by analyzing source code, not by calling
libraries
No more complex frameworks or
Approaches to Scaling
APIs and Frameworks
• Library of functions for
specific patterns of
parallelism
• Programmer (re)writes
program to fit the pattern.
Approaches to Scaling
APIs and Frameworks
• Library of functions for
specific patterns of
parallelism
• Programmer (re)writes
program to fit the pattern.
Programming Language
• Semantics of calculation
entirely defined by source-
code
• Compiler and Runtime are
responsible for efficient
execution.
Approaches to Scaling
APIs and Frameworks
• MPI
• Hadoop
• Spark
Programming
Languages
•CUDA
•CILK
•SQL
•Python with Pyfora
API Language
Pros
• More control over performance
• Easy to integrate lots of different
systems.
• Simpler code
• Much more expressive
• Programs are easier to understand.
• Cleaner failure modes
• Much deeper optimizations are possible.
Cons
• More code
• Program meaning obscured by
implementation details
• Hard to debug when something goes
wrong
• Very hard to implement
With a strong implementation,
“language approach” should win
• Any pattern that can be implemented in an API can be
recognized in a language.
• Language-based systems have the entire source code, so they
have more to work with than API based systems.
• Can measure behavior at runtime and use this to optimize.
Example: Nearest Neighbors
def sq_distance(p1,p2):
return sum((p1[i]-p2[i])**2 for i in range(len(p1)))
def index_of_nearest(p, points):
return min((sq_distance(p, points[i]),i)
for i in xrange(len(points)))[1]
def nearest_center(points, centers):
return [index_of_nearest(p, centers) for p in points]
How can we make this fast?
• JIT compile to make single-threaded code fast
• Parallelize to use multiple CPUs
• Distribute data to use multiple machines
Why is this tricky?
Optimal behavior depends on the sizes and shapes of data.
Centers Points
If both sets are small, don’t bother to distribute.
Why is this tricky?
Centers
Points
If “points” is tall and thin, it’s
natural to split it across many
machines and replicate
“centers”
Why is this tricky?
Centers
Points
If “points” and “centers” are really wide (say, they’re
images), it would be better to split them horizontally,
compute distances between all pairs in slices, and merge
them.
Why is this tricky?
You will end up writing totally different code for
each of these different situations.
The source code contains the necessary
structure.
The key is to defer decisions to runtime, when the
system can actually see how big the datasets are.
Getting it right is valuable
• Much less work for the programmer
• Code is actually readable
• Code becomes more reusable.
• Use the language the way it was intended:
For instance, in Python, the “row” objects can be anything that looks like
a list.
What are some other common
implementation problems we can
solve this way?
Problem: Wrong-sized chunking
• API-based frameworks require you to explicitly partition your
data into chunks.
• If you are running a complex task, the runtime may be really
long for a small subset of chunks. You’ll end up waiting a long
time for that last mapper.
• If your tasks allocate memory, you can run out of RAM and
crash.
Solution: Dynamically rebalance
CORE
#1
CORE #2 CORE #3 CORE #4
Splitting
Adaptive
Parallelism
Solution: Dynamically rebalance
• This requires you to be able to interrupt running tasks as
they’re executing.
• Adding support for this to an API makes it much more
complicated to use.
• This is much easier to do with compiler support.
Problem: Nested parallelism
Example:
• You have an iterative model
• There is lots of parallelism in each iteration
• But you also want to search over many hyperparameters
With API-based approaches, you have to manage this yourself,
either by constructing a graph of subtasks, or figuring out how to
flatten your workload into something that can be map-reduced.
sources of parallelism
def fit_model(learning_rate, model, params):
while not model.finished(params):
params = model.update_params(learning_rate, params)
return params
fits = [[fit_model(rate, model, params) for rate in learning_rates]
for model in models]
Solution: infer parallelism from
source
Problem: Common data is too big
Example:
• You have a bunch of datasets (say, for a bunch of products, the
customers who bought that product)
• You want to compute something on all pairs of sets (say, some
average on common customers for both)
• The whole set-of-sets is too big for memory
[[some_function(s1,s2) for s1 in sets] for s2 in sets]
Problem: Common data is too big
This creates problems because:
• If you just do map-reduce on the outer loop, you still need to get to the
data for all the other sets.
• If you try to actually produce all pairs of sets, you’ll end up with
something many many times larger than the original dataset.
[[some_function(s1,s2) for s1 in sets] for s2 in sets]
Solution: infer cache locality
• Think of each call to “f” as a separate task.
• Break tasks into smaller tasks until each one’s active working
set is a reasonable size.
• Schedule tasks that use the same data on the same machine to
minimize data movement.
[[some_function(s1,s2) for s1 in sets] for s2 in sets]
Solution: infer cache locality
f(s0,s0)
f(s0,s1)
f(s0,s2)
f(s0,s3)
f(s0,s4)
f(s0,s5)
f(s1,s0)
f(s1,s1)
f(s1,s2)
f(s1,s3)
f(s1,s4)
f(s1,s5)
f(s2,s0)
f(s2,s1)
f(s2,s2)
f(s2,s3)
f(s2,s4)
f(s2,s5)
f(s3,s0)
f(s3,s1)
f(s3,s2)
f(s3,s3)
f(s3,s4)
f(s3,s5)
f(s4,s0)
f(s4,s1)
f(s4,s2)
f(s4,s3)
f(s4,s4)
f(s4,s5)
f(s5,s0)
f(s5,s1)
f(s5,s2)
f(s5,s3)
f(s5,s4)
f(s5,s5)
f(s6,s0)
f(s6,s1)
f(s6,s2)
f(s6,s3)
f(s6,s4)
f(s6,s5)
f(s7,s0)
f(s7,s1)
f(s7,s2)
f(s7,s3)
f(s7,s4)
f(s7,s5)
f(s8,
f(s8,
f(s8,
f(s8,
f(s8,
f(s8,
So how does Pyfora work?
• Operate on a subset of Python that restricts mutability.
• Built a JIT compiler that can “pop” code back into the interpreter
• Can move sets of stackframes from one machine to another
• Can rewrite selected stackframes to use futures if there is parallelism to
exploit.
• Carefully track what data a thread is using.
• Dynamically schedule threads and data on machines to
optimize for cache locality.
import pyfora
executor = pyfora.connect(“http://...”)
data = executor.importS3Dataset(“myBucket”,”myData.csv”)
def calibrate(dataframe, params):
#some complex model with loops and parallelism
with executor.remotely:
dataframe = parse_csv(data)
models = [calibrate(dataframe, p) for p in params]
print(models.toLocal().result())
What are we working on?
• More libraries!
• Better predictions on how long functions will take and what data
they consume. This helps to make better scheduling decisions.
• Compiler optimizations (immutable Python is a rich source of
these)
• Automatic compilation and scheduling of data and compute on
GPU
Thanks!
• Check out the repo: github.com/ufora/ufora
• Follow me on Twitter and Medium: @braxtonmckee
• Subscribe to “This Week in Data” (see top of ufora.com)
• Email me: braxton@ufora.com

More Related Content

PPTX
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
PPTX
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
PPTX
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
PDF
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
PPTX
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PPTX
An Introduction to TensorFlow architecture
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
An Introduction to TensorFlow architecture

What's hot (20)

PDF
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
PDF
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
PDF
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
PPTX
Tensor flow
PDF
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
PPTX
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
PPTX
Neural networks and google tensor flow
PDF
Introduction to TensorFlow
PPTX
Online learning, Vowpal Wabbit and Hadoop
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
ODP
Wapid and wobust active online machine leawning with Vowpal Wabbit
PDF
TensorFlow and Keras: An Overview
PPTX
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
PDF
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
PDF
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
PPTX
TensorFlow Tutorial Part1
PDF
running Tensorflow in Production
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
PDF
Distributed implementation of a lstm on spark and tensorflow
PPTX
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Tensor flow
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017
Neural networks and google tensor flow
Introduction to TensorFlow
Online learning, Vowpal Wabbit and Hadoop
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Wapid and wobust active online machine leawning with Vowpal Wabbit
TensorFlow and Keras: An Overview
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
TensorFlow Tutorial Part1
running Tensorflow in Production
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Distributed implementation of a lstm on spark and tensorflow
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Ad

Viewers also liked (18)

PPTX
G. amaya tema i ii y iii
PDF
APAC Webinar: How to Sell Into Multiple Lines of Business - 8 Oct 2015
PDF
sofia mba
PPTX
Ernesto tic
TXT
We vent
PDF
Inscripció Cap d'Any a Madrid
PPTX
GetLinkedInHelp.com Presentation on How to Socially Sell Your Way to More Clo...
PDF
JM Millwright
PDF
particledraw
PDF
Novos caminhos do Marketing 3.0
PDF
Family Pasta Night
DOCX
Amit Bandyopadhyay1
PDF
Analyzing Air Quality Measurements in Macedonia with Apache Drill
PPTX
Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16
DOCX
Cold War and Cuba model answers
PDF
English for programmers
PPSX
The World's Best Drone Photography: From SkyPixel
G. amaya tema i ii y iii
APAC Webinar: How to Sell Into Multiple Lines of Business - 8 Oct 2015
sofia mba
Ernesto tic
We vent
Inscripció Cap d'Any a Madrid
GetLinkedInHelp.com Presentation on How to Socially Sell Your Way to More Clo...
JM Millwright
particledraw
Novos caminhos do Marketing 3.0
Family Pasta Night
Amit Bandyopadhyay1
Analyzing Air Quality Measurements in Macedonia with Apache Drill
Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16
Cold War and Cuba model answers
English for programmers
The World's Best Drone Photography: From SkyPixel
Ad

Similar to Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16 (20)

PDF
Data Analytics and Simulation in Parallel with MATLAB*
PDF
MapReduce: teoria e prática
PPTX
Optimizing Performance - Clojure Remote - Nikola Peric
PDF
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
PPTX
Speed up R with parallel programming in the Cloud
PDF
Toronto meetup 20190917
PPT
Intermachine Parallelism
PPT
Euro python2011 High Performance Python
PDF
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
PDF
Boost Productivity with 30 Simple Python Scripts.pdf
PPT
Behm Shah Pagerank
PDF
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
PDF
Data Science
PPT
Migration To Multi Core - Parallel Programming Models
PDF
HFSP: the Hadoop Fair Sojourn Protocol
PDF
Understanding and building big data Architectures - NoSQL
PDF
k-means algorithm implementation on Hadoop
PDF
Python VS GO
PPTX
Using R on High Performance Computers
PDF
Fast and Scalable Python
Data Analytics and Simulation in Parallel with MATLAB*
MapReduce: teoria e prática
Optimizing Performance - Clojure Remote - Nikola Peric
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Speed up R with parallel programming in the Cloud
Toronto meetup 20190917
Intermachine Parallelism
Euro python2011 High Performance Python
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Boost Productivity with 30 Simple Python Scripts.pdf
Behm Shah Pagerank
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Data Science
Migration To Multi Core - Parallel Programming Models
HFSP: the Hadoop Fair Sojourn Protocol
Understanding and building big data Architectures - NoSQL
k-means algorithm implementation on Hadoop
Python VS GO
Using R on High Performance Computers
Fast and Scalable Python

More from MLconf (20)

PDF
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PPTX
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
PDF
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
PPTX
Josh Wills - Data Labeling as Religious Experience
PDF
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
PDF
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
PDF
Meghana Ravikumar - Optimized Image Classification on the Cheap
PDF
Noam Finkelstein - The Importance of Modeling Data Collection
PDF
June Andrews - The Uncanny Valley of ML
PDF
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
PDF
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
PDF
Vito Ostuni - The Voice: New Challenges in a Zero UI World
PDF
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
PDF
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
PPTX
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PPTX
Soumith Chintala - Increasing the Impact of AI Through Better Software
PPTX
Roy Lowrance - Predicting Bond Prices: Regime Changes
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Josh Wills - Data Labeling as Religious Experience
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Meghana Ravikumar - Optimized Image Classification on the Cheap
Noam Finkelstein - The Importance of Modeling Data Collection
June Andrews - The Uncanny Valley of ML
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Neel Sundaresan - Teaching a machine to code
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Soumith Chintala - Increasing the Impact of AI Through Better Software
Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
August Patch Tuesday
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
project resource management chapter-09.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
Encapsulation_ Review paper, used for researhc scholars
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
Zenith AI: Advanced Artificial Intelligence
Building Integrated photovoltaic BIPV_UPV.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Tartificialntelligence_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Getting Started with Data Integration: FME Form 101
August Patch Tuesday
Programs and apps: productivity, graphics, security and other tools
Heart disease approach using modified random forest and particle swarm optimi...
project resource management chapter-09.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
WOOl fibre morphology and structure.pdf for textiles

Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16

  • 1. Say What You Mean Braxton McKee, CEO & Founder Scaling up machine learning algorithms directly from source code
  • 2. Q: Why should I have to rewrite my program as my dataset gets larger?
  • 3. def sq_distance(p1,p2): return sum((p1[i]-p2[i])**2 for i in range(len(p1))) def index_of_nearest(p, points): return min((sq_distance(p, points[i]),i) for i in range(len(points)))[1] def nearest_center(points, centers): return [index_of_nearest(p, centers) for p in points] Example: Nearest Neighbors
  • 5. A: You shouldn’t have to! Q: Why should I have to rewrite my program as my dataset gets larger?
  • 6. Pyfora Automatically scalable Python for large-scale machine learning and data science 100% Open Source http://github.com/ufora/ufora http://docs.pyfora.com/
  • 7. Goals of Pyfora • Provide identical semantics to regular Python • Easily use hundreds of CPUs / GPUs and TBs of RAM • Scale by analyzing source code, not by calling libraries No more complex frameworks or
  • 8. Approaches to Scaling APIs and Frameworks • Library of functions for specific patterns of parallelism • Programmer (re)writes program to fit the pattern.
  • 9. Approaches to Scaling APIs and Frameworks • Library of functions for specific patterns of parallelism • Programmer (re)writes program to fit the pattern. Programming Language • Semantics of calculation entirely defined by source- code • Compiler and Runtime are responsible for efficient execution.
  • 10. Approaches to Scaling APIs and Frameworks • MPI • Hadoop • Spark Programming Languages •CUDA •CILK •SQL •Python with Pyfora
  • 11. API Language Pros • More control over performance • Easy to integrate lots of different systems. • Simpler code • Much more expressive • Programs are easier to understand. • Cleaner failure modes • Much deeper optimizations are possible. Cons • More code • Program meaning obscured by implementation details • Hard to debug when something goes wrong • Very hard to implement
  • 12. With a strong implementation, “language approach” should win • Any pattern that can be implemented in an API can be recognized in a language. • Language-based systems have the entire source code, so they have more to work with than API based systems. • Can measure behavior at runtime and use this to optimize.
  • 13. Example: Nearest Neighbors def sq_distance(p1,p2): return sum((p1[i]-p2[i])**2 for i in range(len(p1))) def index_of_nearest(p, points): return min((sq_distance(p, points[i]),i) for i in xrange(len(points)))[1] def nearest_center(points, centers): return [index_of_nearest(p, centers) for p in points]
  • 14. How can we make this fast? • JIT compile to make single-threaded code fast • Parallelize to use multiple CPUs • Distribute data to use multiple machines
  • 15. Why is this tricky? Optimal behavior depends on the sizes and shapes of data. Centers Points If both sets are small, don’t bother to distribute.
  • 16. Why is this tricky? Centers Points If “points” is tall and thin, it’s natural to split it across many machines and replicate “centers”
  • 17. Why is this tricky? Centers Points If “points” and “centers” are really wide (say, they’re images), it would be better to split them horizontally, compute distances between all pairs in slices, and merge them.
  • 18. Why is this tricky? You will end up writing totally different code for each of these different situations. The source code contains the necessary structure. The key is to defer decisions to runtime, when the system can actually see how big the datasets are.
  • 19. Getting it right is valuable • Much less work for the programmer • Code is actually readable • Code becomes more reusable. • Use the language the way it was intended: For instance, in Python, the “row” objects can be anything that looks like a list.
  • 20. What are some other common implementation problems we can solve this way?
  • 21. Problem: Wrong-sized chunking • API-based frameworks require you to explicitly partition your data into chunks. • If you are running a complex task, the runtime may be really long for a small subset of chunks. You’ll end up waiting a long time for that last mapper. • If your tasks allocate memory, you can run out of RAM and crash.
  • 22. Solution: Dynamically rebalance CORE #1 CORE #2 CORE #3 CORE #4 Splitting Adaptive Parallelism
  • 23. Solution: Dynamically rebalance • This requires you to be able to interrupt running tasks as they’re executing. • Adding support for this to an API makes it much more complicated to use. • This is much easier to do with compiler support.
  • 24. Problem: Nested parallelism Example: • You have an iterative model • There is lots of parallelism in each iteration • But you also want to search over many hyperparameters With API-based approaches, you have to manage this yourself, either by constructing a graph of subtasks, or figuring out how to flatten your workload into something that can be map-reduced.
  • 25. sources of parallelism def fit_model(learning_rate, model, params): while not model.finished(params): params = model.update_params(learning_rate, params) return params fits = [[fit_model(rate, model, params) for rate in learning_rates] for model in models] Solution: infer parallelism from source
  • 26. Problem: Common data is too big Example: • You have a bunch of datasets (say, for a bunch of products, the customers who bought that product) • You want to compute something on all pairs of sets (say, some average on common customers for both) • The whole set-of-sets is too big for memory [[some_function(s1,s2) for s1 in sets] for s2 in sets]
  • 27. Problem: Common data is too big This creates problems because: • If you just do map-reduce on the outer loop, you still need to get to the data for all the other sets. • If you try to actually produce all pairs of sets, you’ll end up with something many many times larger than the original dataset. [[some_function(s1,s2) for s1 in sets] for s2 in sets]
  • 28. Solution: infer cache locality • Think of each call to “f” as a separate task. • Break tasks into smaller tasks until each one’s active working set is a reasonable size. • Schedule tasks that use the same data on the same machine to minimize data movement. [[some_function(s1,s2) for s1 in sets] for s2 in sets]
  • 29. Solution: infer cache locality f(s0,s0) f(s0,s1) f(s0,s2) f(s0,s3) f(s0,s4) f(s0,s5) f(s1,s0) f(s1,s1) f(s1,s2) f(s1,s3) f(s1,s4) f(s1,s5) f(s2,s0) f(s2,s1) f(s2,s2) f(s2,s3) f(s2,s4) f(s2,s5) f(s3,s0) f(s3,s1) f(s3,s2) f(s3,s3) f(s3,s4) f(s3,s5) f(s4,s0) f(s4,s1) f(s4,s2) f(s4,s3) f(s4,s4) f(s4,s5) f(s5,s0) f(s5,s1) f(s5,s2) f(s5,s3) f(s5,s4) f(s5,s5) f(s6,s0) f(s6,s1) f(s6,s2) f(s6,s3) f(s6,s4) f(s6,s5) f(s7,s0) f(s7,s1) f(s7,s2) f(s7,s3) f(s7,s4) f(s7,s5) f(s8, f(s8, f(s8, f(s8, f(s8, f(s8,
  • 30. So how does Pyfora work? • Operate on a subset of Python that restricts mutability. • Built a JIT compiler that can “pop” code back into the interpreter • Can move sets of stackframes from one machine to another • Can rewrite selected stackframes to use futures if there is parallelism to exploit. • Carefully track what data a thread is using. • Dynamically schedule threads and data on machines to optimize for cache locality.
  • 31. import pyfora executor = pyfora.connect(“http://...”) data = executor.importS3Dataset(“myBucket”,”myData.csv”) def calibrate(dataframe, params): #some complex model with loops and parallelism with executor.remotely: dataframe = parse_csv(data) models = [calibrate(dataframe, p) for p in params] print(models.toLocal().result())
  • 32. What are we working on? • More libraries! • Better predictions on how long functions will take and what data they consume. This helps to make better scheduling decisions. • Compiler optimizations (immutable Python is a rich source of these) • Automatic compilation and scheduling of data and compute on GPU
  • 33. Thanks! • Check out the repo: github.com/ufora/ufora • Follow me on Twitter and Medium: @braxtonmckee • Subscribe to “This Week in Data” (see top of ufora.com) • Email me: [email protected]