SlideShare a Scribd company logo
+ +
Copyright © ArangoDB Inc., 2019
- Confidential
ArangoML Pipeline Cloud
From Data to Managed Metadata
TL;DR
2
1. Different ways data is important for your machine learning pipeline
2. ArangoML Pipeline Cloud: The managed solution for your ML Metadata
Jörg Schad, PhD
Head of Engineering and ML
@ArangoDB
● Suki.ai
● Mesosphere
● Architect @SAP Hana
● PhD Distributed DB
Systems
● Twitter: @joerg_schad
Chris Woodward
Developer Relations Engineer
@ArangoDB
● Training
● Development
● Community
● Twitter: @cw00dw0rd
● Slack: Chris.ArangoDB
Copyright©ArangoDBInc.,2019-
Confidential
5
Copyright©ArangoDBInc.,2019-
Confidential
6
Copyright©ArangoDBInc.,2019-
Confidential
7
Get
Data
Write intelligent machine learning code
Train
Model
Run
Model
Repeat
What Data Scientists should be doing…
Copyright©ArangoDBInc.,2019-
Confidential
8
Sculley, D., Holt, G., Golovin, D. et al. Hidden Technical Debt in Machine Learning Systems
What Data Scientist are doing…
Machine Learning Pipeline
https://www.tensorflow.org/tfx/guide
● Native Multi Model Database
○ Stores, K/V, Documents & Graphs
● Distributed
○ Graphs can span multiple nodes
● AQL - SQL-like multi-model query language
● ACID Transactions including Multi Collection
Transactions
Databases I
Databases II
Feature Engineering
Why Graph?
Natural Language Processing
https://ieeexplore.ieee.org/abstract/document/4700287
Databases III
Challenges
https://blog.acolyer.org/2019/09/23/the-secret-sharer/
Challenges
● Understand complete provenance of Model
a. Understand Provenance
b. Complete version history
c. Audit
● Find all Models in production derived from dataset x
● Compare performance of different model performance
● Identify reusable steps
● Is my serving data distribution the same as for training data
● ...
From Data to Metadata….
Common Metadata
Metadata?
https://www.kubeflow.org/docs/components/misc/metadata/
ML Project
Dataset
Feature
Transform
Experiment
Train
Performance
Test
Performance
Model Function
Model Serving Performance
Notebook
ArangoML Pipeline
“A common extensible metadata layer for ML pipelines which
allows Data Scientists and DataOps to manage all information
related to their ML pipelines in one place.”
https://www.arangodb.com/2019/09/arangoml-pipeline-common-metadata-layer-machine-learning-pipelines/
Multi-Model Metadata
Multi-Model Metadata
FOR f in featuresets
FILTER f.name == 'my_feature'
FOR entity IN 1..3 ANY f featureset_dataset
RETURN entity
Find relevant entities for given
model
Audit Resource accounting
Explore Performance differences Trace Data Lineage (e.g., GDPR) Permission tracking
Search/reuse existing entities Reproducible Model Building ….
... Detect Data Shift
….
Data Scientist DataOps Administrator
ML Project
Dataset
Feature
Transform
Experiment
Train
Performance
Test
Performance
Model Function
ArangoML “Schema”
Model Serving Performance
Notebook
https://github.com/arangoml/arangopipe
● Python package
● HTTP API
● TFX Integration [coming shortly]
https://github.com/arangoml/arangopipe
Discover
https://github.com/arangoml/arangopipe
Graphs (again)
TFX MLMD
https://www.tensorflow.org/tfx/guide/mlmd
Kubeflow Metadata
https://www.kubeflow.org/docs/components/misc/metadata/
How to get started?
33
docker run -p 6529:8529 -p 8888:8888 -p 3000:3000 -it arangopipe/ap_tensor_flow
But what about production?
How to get started?
34
docker run -p 6529:8529 -p 8888:8888 -p 3000:3000 -it arangopipe/ap_tensor_flow
But what about production?
ArangoML Pipeline Cloud
35
1. Fully managed Cloud Solution
2. SLAs
a. Temporary cloud instance with no setup
b. Production instance
https://colab.research.google.com/github/arangoml/arangopipe/blob/master/arangopipe_managed_service.ipynb
Demo Time!
Thanks for listening!
37
https://www.arangodb.com/• https://github.com/arangoml/arangopipe
• Blogpost
• Getting Started Notebook
Test-drive ArangoDB and ArangoML using Oasis
14-days for free

More Related Content

PDF
ArangoDB 3.7 Roadmap: Performance at Scale
PDF
Running complex data queries in a distributed system
PDF
An introduction to multi-model databases
PDF
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
PDF
Guacamole Fiesta: What do avocados and databases have in common?
PDF
Custom Pregel Algorithms in ArangoDB
PDF
Graph Analytics with ArangoDB
PPTX
Are you a Tortoise or a Hare?
ArangoDB 3.7 Roadmap: Performance at Scale
Running complex data queries in a distributed system
An introduction to multi-model databases
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Guacamole Fiesta: What do avocados and databases have in common?
Custom Pregel Algorithms in ArangoDB
Graph Analytics with ArangoDB
Are you a Tortoise or a Hare?

What's hot (20)

PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
PPTX
ArangoDB 3.9 - Further Powering Graphs at Scale
PPT
Graph Analytics for big data
PDF
GraphTech Ecosystem - part 2: Graph Analytics
PDF
Apache Spark Side of Funnels
PPTX
GraphQL & DGraph with Go
PPTX
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
PDF
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
PDF
Graph computation
PDF
GraphTech Ecosystem - part 1: Graph Databases
PPTX
Introduction to DGraph - A Graph Database
PDF
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
PPTX
Spark Concepts - Spark SQL, Graphx, Streaming
PDF
Make your PySpark Data Fly with Arrow!
PDF
Graph Analytics in Spark
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
PPTX
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
PDF
Hugfr SPARK & RIAK -20160114_hug_france
PDF
How Graph Databases started the Multi Model revolution
PDF
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB 3.9 - Further Powering Graphs at Scale
Graph Analytics for big data
GraphTech Ecosystem - part 2: Graph Analytics
Apache Spark Side of Funnels
GraphQL & DGraph with Go
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Graph computation
GraphTech Ecosystem - part 1: Graph Databases
Introduction to DGraph - A Graph Database
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
Spark Concepts - Spark SQL, Graphx, Streaming
Make your PySpark Data Fly with Arrow!
Graph Analytics in Spark
Common Strategies for Improving Performance on Your Delta Lakehouse
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
Hugfr SPARK & RIAK -20160114_hug_france
How Graph Databases started the Multi Model revolution
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Ad

Similar to ArangoML Pipeline Cloud - Managed Machine Learning Metadata (20)

PDF
Machine learning at scale challenges and solutions
PDF
Managing the Complete Machine Learning Lifecycle with MLflow
PDF
Democratization of Data @Indix
PDF
mlflow: Accelerating the End-to-End ML lifecycle
PDF
Multiplatform Spark solution for Graph datasources by Javier Dominguez
PDF
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
PDF
DevOps for DataScience
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
PDF
Monitoring AI with AI
PDF
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
PDF
Use of standards and related issues in predictive analytics
PPTX
databricks ml flow demonstration using automatic features engineering
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
PPTX
How Cloud is Affecting Data Scientists
 
PPTX
Serverless machine learning architectures at Helixa
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
PDF
TensorFlow 16: Building a Data Science Platform
PDF
Spark and machine learning in microservices architecture
Machine learning at scale challenges and solutions
Managing the Complete Machine Learning Lifecycle with MLflow
Democratization of Data @Indix
mlflow: Accelerating the End-to-End ML lifecycle
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
DevOps for DataScience
MLFlow: Platform for Complete Machine Learning Lifecycle
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Monitoring AI with AI
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Use of standards and related issues in predictive analytics
databricks ml flow demonstration using automatic features engineering
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
How Cloud is Affecting Data Scientists
 
Serverless machine learning architectures at Helixa
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
TensorFlow 16: Building a Data Science Platform
Spark and machine learning in microservices architecture
Ad

More from ArangoDB Database (20)

PPTX
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
PPTX
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
PPTX
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
PDF
GraphSage vs Pinsage #InsideArangoDB
PDF
Getting Started with ArangoDB Oasis
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
PDF
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
PDF
Webinar: What to expect from ArangoDB Oasis
PDF
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
PDF
3.5 webinar
PDF
Webinar: How native multi model works in ArangoDB
PDF
An introduction to multi-model databases
PDF
The Computer Science Behind a modern Distributed Database
PDF
Fishing Graphs in a Hadoop Data Lake
PDF
An E-commerce App in action built on top of a Multi-model Database
PDF
Creating Fault Tolerant Services on Mesos
PDF
Handling Billions of Edges in a Graph Database
PDF
Introduction to Foxx by our community member Iskandar Soesman @ikandars
PDF
Polyglot Persistence & Multi-Model Databases
PDF
Deep dive into the native multi model database ArangoDB
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
GraphSage vs Pinsage #InsideArangoDB
Getting Started with ArangoDB Oasis
Hacktoberfest 2020 - Intro to Knowledge Graphs
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
Webinar: What to expect from ArangoDB Oasis
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
3.5 webinar
Webinar: How native multi model works in ArangoDB
An introduction to multi-model databases
The Computer Science Behind a modern Distributed Database
Fishing Graphs in a Hadoop Data Lake
An E-commerce App in action built on top of a Multi-model Database
Creating Fault Tolerant Services on Mesos
Handling Billions of Edges in a Graph Database
Introduction to Foxx by our community member Iskandar Soesman @ikandars
Polyglot Persistence & Multi-Model Databases
Deep dive into the native multi model database ArangoDB

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
A Presentation on Artificial Intelligence
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
sap open course for s4hana steps from ECC to s4
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
A Presentation on Artificial Intelligence
Spectral efficient network and resource selection model in 5G networks
Assigned Numbers - 2025 - Bluetooth® Document
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing

ArangoML Pipeline Cloud - Managed Machine Learning Metadata