SlideShare a Scribd company logo
Bolt:
Building a distributed ndarray
Jason Wittenbach
Janelia Research Campus (HHMI)
Freeman Lab
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
t, (x, y, z)
time ~ 104
space ~ 107
~ 1011 elements
~ 1 TB
(x, y, t)
(x, y, z, t)
(x, y, z, c, t)
(n, k)
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
time
time
time
x
y
Bolt: Building A Distributed ndarray
neuroscience
astronomy
geospatial
climate science
• a distributed ndarray
• built on PySpark
• conforms to NumPy API
data.mean(axis=0)
data[2, 4:10]
data.T
(t, x, y, z) (x, y, z, t)
(v, w, x, y, z)
(v,w) y
x z
(v,w,x)
(v, w, x, y, z)
y
z
(v, w, x, y, z)
(v)
indexing slicing
apply-along-axis
indexing
transpose
slicing
apply-along-axis
reshape
indexing
transpose
map
slicing
apply-along-axis
reshape
reduce filter
chunking padding
indexing
filter
map
(u,v) x
y
apply-along-axis
reduceByKey
(u,v) x
y
map
transpose
(u,v) x
y
map
transpose
(u,v) x
y
map
transpose
(u,v) x
y
shuffle
(t x,y,z) (x,y,z t)
(t x, y, z)
(t)
x y
z
(t x, y, z)
(t)
x y
z
(t x, y, z)
(t,chunk) (t,chunk)
(t,chunk)
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
chunking
transpose
+ = shuffle
optimization
???
thanks join us!
Jeremy Freeman
Nicholas Sofroniew
Andrew Osheroff
Freeman Lab
Ken Carlile
Robert Lines
Janelia Scientific
Computing
bolt-project
thunder-project
GitHub
@jsonWittenbach
Twitter

More Related Content

PDF
Paper Review: An exact mapping between the Variational Renormalization Group ...
PPTX
GenGIS presentation at Vizbi 2016
PDF
TensorFlow London 11: Gema Parreno 'Use Cases of TensorFlow'
PDF
Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)
PPT
An interactive teaching and learning on earth and space education
PDF
Machine Learning Today: Current Research And Advances From AMLAB, UvA
PDF
Day 5 examples
PDF
Geometry, Topology, and all of Your Wildest Dreams Will Come True
Paper Review: An exact mapping between the Variational Renormalization Group ...
GenGIS presentation at Vizbi 2016
TensorFlow London 11: Gema Parreno 'Use Cases of TensorFlow'
Flyby: Improved Dense Matrix Multiplication-(Tom Vacek, Thomson Reuters)
An interactive teaching and learning on earth and space education
Machine Learning Today: Current Research And Advances From AMLAB, UvA
Day 5 examples
Geometry, Topology, and all of Your Wildest Dreams Will Come True

Viewers also liked (20)

PDF
Huohua: A Distributed Time Series Analysis Framework For Spark
PDF
Low Latency Execution For Apache Spark
PDF
Airstream: Spark Streaming At Airbnb
PDF
Re-Architecting Spark For Performance Understandability
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
PDF
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
PDF
Large Scale Deep Learning with TensorFlow
PPTX
Big Data Scala by the Bay: Interactive Spark in your Browser
PDF
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
PPTX
Interactive Analytics using Apache Spark
PDF
Huawei Advanced Data Science With Spark Streaming
PDF
Building a REST Job Server for Interactive Spark as a Service
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
PDF
High-Performance Python On Spark
POTX
Apache Spark Streaming: Architecture and Fault Tolerance
PDF
Connecting Python To The Spark Ecosystem
PDF
Recent Developments In SparkR For Advanced Analytics
PDF
Scaling Machine Learning To Billions Of Parameters
PDF
Spark And Cassandra: 2 Fast, 2 Furious
Huohua: A Distributed Time Series Analysis Framework For Spark
Low Latency Execution For Apache Spark
Airstream: Spark Streaming At Airbnb
Re-Architecting Spark For Performance Understandability
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Large Scale Deep Learning with TensorFlow
Big Data Scala by the Bay: Interactive Spark in your Browser
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Interactive Analytics using Apache Spark
Huawei Advanced Data Science With Spark Streaming
Building a REST Job Server for Interactive Spark as a Service
Building Custom Machine Learning Algorithms With Apache SystemML
High-Performance Python On Spark
Apache Spark Streaming: Architecture and Fault Tolerance
Connecting Python To The Spark Ecosystem
Recent Developments In SparkR For Advanced Analytics
Scaling Machine Learning To Billions Of Parameters
Spark And Cassandra: 2 Fast, 2 Furious
Ad

Similar to Bolt: Building A Distributed ndarray (20)

PDF
Dmss2011 public
PDF
Contribution of Fixed Point Theorem in Quasi Metric Spaces
DOCX
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
PDF
Ijetcas14 605
PDF
New Mathematical Tools for the Financial Sector
PDF
TENSOR DECOMPOSITION WITH PYTHON
PDF
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
PDF
About RNN
PDF
About RNN
PDF
Some fixed point theorems in fuzzy mappings
PDF
Locality-sensitive hashing for search in metric space
PDF
Dixon Deep Learning
PDF
ABC based on Wasserstein distances
PDF
Common fixed theorems for weakly compatible mappings via an
PDF
An Overview of Separation Axioms by Nearly Open Sets in Topology.
PDF
Fixed points theorem on a pair of random generalized non linear contractions
PDF
Computing the Nucleon Spin from Lattice QCD
PDF
FiniteElementNotes
PDF
UCB 2012-02-28
PDF
Patch Matching with Polynomial Exponential Families and Projective Divergences
Dmss2011 public
Contribution of Fixed Point Theorem in Quasi Metric Spaces
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
Ijetcas14 605
New Mathematical Tools for the Financial Sector
TENSOR DECOMPOSITION WITH PYTHON
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
About RNN
About RNN
Some fixed point theorems in fuzzy mappings
Locality-sensitive hashing for search in metric space
Dixon Deep Learning
ABC based on Wasserstein distances
Common fixed theorems for weakly compatible mappings via an
An Overview of Separation Axioms by Nearly Open Sets in Topology.
Fixed points theorem on a pair of random generalized non linear contractions
Computing the Nucleon Spin from Lattice QCD
FiniteElementNotes
UCB 2012-02-28
Patch Matching with Polynomial Exponential Families and Projective Divergences
Ad

More from Jen Aman (20)

PPTX
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
PDF
Snorkel: Dark Data and Machine Learning with Christopher Ré
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
RISELab:Enabling Intelligent Real-Time Decisions
PDF
Spatial Analysis On Histological Images Using Spark
PDF
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
PDF
A Graph-Based Method For Cross-Entity Threat Detection
PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Deploying Accelerators At Datacenter Scale Using Spark
PDF
Re-Architecting Spark For Performance Understandability
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Livy: A REST Web Service For Apache Spark
PDF
GPU Computing With Apache Spark And Python
PDF
Spark on Mesos
PDF
Elasticsearch And Apache Lucene For Apache Spark And MLlib
PDF
Spark at Bloomberg: Dynamically Composable Analytics
PDF
Spark Uber Development Kit
PDF
EclairJS = Node.Js + Apache Spark
PDF
Spark: Interactive To Production
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Snorkel: Dark Data and Machine Learning with Christopher Ré
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
RISELab:Enabling Intelligent Real-Time Decisions
Spatial Analysis On Histological Images Using Spark
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
A Graph-Based Method For Cross-Entity Threat Detection
Time-Evolving Graph Processing On Commodity Clusters
Deploying Accelerators At Datacenter Scale Using Spark
Re-Architecting Spark For Performance Understandability
Efficient State Management With Spark 2.0 And Scale-Out Databases
Livy: A REST Web Service For Apache Spark
GPU Computing With Apache Spark And Python
Spark on Mesos
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Spark at Bloomberg: Dynamically Composable Analytics
Spark Uber Development Kit
EclairJS = Node.Js + Apache Spark
Spark: Interactive To Production

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Predictive modeling basics in data cleaning process
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Introduction to the R Programming Language
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Managing Community Partner Relationships
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Predictive modeling basics in data cleaning process
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Introduction to the R Programming Language
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Managing Community Partner Relationships
Introduction to Knowledge Engineering Part 1
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Qualitative Qantitative and Mixed Methods.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
ISS -ESG Data flows What is ESG and HowHow
Optimise Shopper Experiences with a Strong Data Estate.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

Bolt: Building A Distributed ndarray