SlideShare a Scribd company logo
USING DOCKER FOR DATA 
SCIENCE
RECAP
WHY DOCKER 
Portable environment 
Isolated between projects 
Stateless 
Fast local file access 
Hetrogenous
GET DOCKER 
https://docs.docker.com/installation/ 
boot2docker .dmg or .exe 
apt-get install docker.io ...
RUN SCIPYSERVER 
$ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
ipython/scipyserver 
https://localhost:443 
https://{boot2docker ip}:443
CREATE DATA-ONLY CONTAINERS 
$ docker run  
-d  
-v ~/notebooks:/notebooks  
--name notebooks_container  
ubuntu 
echo notebooks 
$ docker run -d -v ~/data:/data --name data_container ubuntu echo
MOUNT DATA-ONLY CONTAINERS 
$ docker stop dev_notebook 
$ docker rm dev_notebook 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
ipython/scipyserver
CREATE A DOCKERFILE 
FROM ipython/scipyserver 
MAINTAINER Calvin Giles <calvin.giles@gmail.com> 
COPY requirements.txt /requirements.txt 
RUN pip2 install -r /requirements.txt 
RUN pip3 install -r /requirements.txt 
$ docker build  
-t calvingiles/ds-notebook  
. 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
THIS TIME 
Creating and connecting to local database containers 
Tweaking the boot2docker vm memory from 2GB to 8 (or 
more...) 
Automated builds with github linking 
Forget everything and use fig
CREATE LOCAL DATABASE CONTAINERS 
$ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu 
$ docker run -d --name=dev_postgres postgres 
$ docker run -d --name=dev_mongo mongo 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--link dev_postgres:dev_postgres 
--link dev_mongo:dev_mongo 
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB 
Either: 
$ boot2docker delete 
$ boot2docker init -m 5555 
... lots of output ... 
$ boot2docker info 
{ ... "Memory":5555 ...} 
Or (doesn't loose non-host data persistence): 
$ VBoxManage modifyvm boot2docker-vm --memory 5555 
$ boot2docker stop 
$ boot2docker start 
$ boot2docker info 
{ ... "Memory":5555 ...}
AUTOMATED BUILDS WITH GITHUB LINKING 
Commit Dockerfile, requirements.txt etc. to a github 
repo 
Add an "Automated Buld" on 
docker hub 
Select the repo and accept defaults 
Check the "Build Details" for your repo build to finish 
$ docker run <dockername>/<reponame>
FORGET EVERYTHING AND USE FIG 
http://www.fig.sh/install.html 
$ curl -L https://github.com/docker/fig/releases/download/ 
1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig 
$ chmod +x ~/bin/fig
FIG.YML -- DATA 
notebooks: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/notebooks:/notebooks/analysis" 
data: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/data:/data/analysis" 
...
FIG.YML -- POSTGRES 
... 
devpostgresdata: 
command: echo created 
image: busybox 
volumes: 
- /var/lib/postgresql/data 
devpostgres: 
environment: 
- POSTGRES_PASSWORD 
image: postgres 
links: 
ports: 
- "5432:5432" 
volumes_from: 
- devpostgresdata 
...
FIG.YML -- NOTEBOOK SERVER 
... 
ds_server: 
environment: 
- PASSWORD 
image: calvingiles/data-science-environment 
links: 
- devpostgres:postgres 
ports: 
- "443:8888" 
volumes_from: 
- notebooks 
- data
FIG UP 
In the same directory as fig.yml: 
$ fig rm 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
HERE'S ONE I MADE EARLIER 
$ curl -L http://goo.gl/rW47v3 > fig.yml 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
NEXT TIME 
Linking to private git repositories 
Lessons learnt from using fig 
Resizing boot2docker volume (to fix "no space left on device") 
Fixing "Error response from daemon: client and server don't 
have same version" 
TLS and CA certs to fix "Your connection is not private" 
Whatever other pain I have had to deal with before then 
Whatever pain you feel -- let me know @cavingiles
MORE? 
Docker: 
http://docs.docker.com/userguide/ 
http://docs.docker.com/reference/commandline/cli/ 
Fig: 
http://www.fig.sh/ 
ipython docker images: 
https://registry.hub.docker.com/repos/ipython/ 
my docker image: 
https://github.com/calvingiles/data-science-environment 
https://registry.hub.docker.com/u/calvingiles/data-science-environment/ 
fig.yml gist: 
http://goo.gl/rW47v3
ABOUT ME 
Calvin Giles 
Data Scientist at Adthena 
PyData Meetup Organiser 
untangleconsulting.io 
calvin.giles@gmail.com 
@calvingiles on twitter, github, docker hub (and many more)

More Related Content

What's hot (18)

PDF
Manage WordPress with Awesome using wp cli
GetSource
 
PPTX
2009 cluster user training
Chris Dwan
 
DOCX
Hadoop installation
habeebulla g
 
PDF
Ops for everyone - John Britton
Devopsdays
 
PDF
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
George Boobyer
 
PDF
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
nvpuppet
 
PDF
Medicine show2 Drupal Bristol Camp 2015
George Boobyer
 
PPTX
Drupal from scratch
Rovic Honrado
 
PDF
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
PROIDEA
 
PDF
Ansible, Simplicity, and the Zen of Python
toddmowen
 
PDF
Ansible - Swiss Army Knife Orchestration
bcoca
 
PDF
PuppetCamp SEA 1 - Use of Puppet
Walter Heck
 
PDF
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
PDF
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
Puppet
 
PDF
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
Puppet
 
PDF
Docker & FieldAware
Jakub Jarosz
 
PDF
Configuration surgery with Augeas (OggCamp 12)
Dominic Cleal
 
PDF
AnsibleFest 2014 - Role Tips and Tricks
jimi-c
 
Manage WordPress with Awesome using wp cli
GetSource
 
2009 cluster user training
Chris Dwan
 
Hadoop installation
habeebulla g
 
Ops for everyone - John Britton
Devopsdays
 
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
George Boobyer
 
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
nvpuppet
 
Medicine show2 Drupal Bristol Camp 2015
George Boobyer
 
Drupal from scratch
Rovic Honrado
 
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
PROIDEA
 
Ansible, Simplicity, and the Zen of Python
toddmowen
 
Ansible - Swiss Army Knife Orchestration
bcoca
 
PuppetCamp SEA 1 - Use of Puppet
Walter Heck
 
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
Puppet
 
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
Puppet
 
Docker & FieldAware
Jakub Jarosz
 
Configuration surgery with Augeas (OggCamp 12)
Dominic Cleal
 
AnsibleFest 2014 - Role Tips and Tricks
jimi-c
 

Viewers also liked (20)

PDF
Using python and docker for data science
Calvin Giles
 
PDF
BIG DATA サービス と ツール
Ngoc Dao
 
PDF
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Romeo Kienzler
 
PDF
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Roberto Hashioka
 
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
PDF
Growing the Mesos Ecosystem
Mesosphere Inc.
 
PDF
Time Series Processing with Solr and Spark
Josef Adersberger
 
PDF
Overview of DataStax OpsCenter
DataStax
 
PPTX
High Performance Processing of Streaming Data
Geoffrey Fox
 
PPTX
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
PDF
Data analysis with Pandas and Spark
Felix Crisan
 
PDF
The basics of fluentd
Treasure Data, Inc.
 
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
PDF
Fluentd and Kafka
N Masahiro
 
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PPTX
Hadoop on Docker
Rakesh Saha
 
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Data Con LA
 
PPTX
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
PDF
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
Using python and docker for data science
Calvin Giles
 
BIG DATA サービス と ツール
Ngoc Dao
 
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Romeo Kienzler
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Roberto Hashioka
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Growing the Mesos Ecosystem
Mesosphere Inc.
 
Time Series Processing with Solr and Spark
Josef Adersberger
 
Overview of DataStax OpsCenter
DataStax
 
High Performance Processing of Streaming Data
Geoffrey Fox
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
Data analysis with Pandas and Spark
Felix Crisan
 
The basics of fluentd
Treasure Data, Inc.
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
Fluentd and Kafka
N Masahiro
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Hadoop on Docker
Rakesh Saha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Data Con LA
 
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
Ad

Similar to Using docker for data science - part 2 (20)

PDF
Data Science Workflows using Docker Containers
Aly Sivji
 
PDF
Docker 1.9 Workshop
{code}
 
PDF
Docker Containers- Data Engineers' Arsenal.pdf
gr6336192
 
PPTX
Introduction to Docker
皓鈞 張
 
PPTX
Docker DANS workshop
vty
 
PDF
Learning Docker with Thomas
Thomas Tong, FRM, PMP
 
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
Jérôme Petazzoni
 
PDF
Docker primer and tips
Samuel Chow
 
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
PDF
The Docker Ecosystem
Dmitry Skaredov
 
PPTX
Learn enough Docker to be dangerous
David Tan
 
PDF
Docker.io
Ladislav Prskavec
 
PDF
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
PDF
Docker Volumes - Everything about docker Volumes
ninita397
 
PPTX
Docker training
Kiran Kumar
 
PDF
Docker for Ruby Developers
Aptible
 
PDF
PDXPortland - Dockerize Django
Hannes Hapke
 
PDF
Dockerize a Django app elegantly
frentrup
 
PDF
Docker for Deep Learning (Andrea Panizza)
MeetupDataScienceRoma
 
Data Science Workflows using Docker Containers
Aly Sivji
 
Docker 1.9 Workshop
{code}
 
Docker Containers- Data Engineers' Arsenal.pdf
gr6336192
 
Introduction to Docker
皓鈞 張
 
Docker DANS workshop
vty
 
Learning Docker with Thomas
Thomas Tong, FRM, PMP
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Jérôme Petazzoni
 
Docker primer and tips
Samuel Chow
 
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
The Docker Ecosystem
Dmitry Skaredov
 
Learn enough Docker to be dangerous
David Tan
 
Deploying deep learning models with Docker and Kubernetes
PetteriTeikariPhD
 
Docker Volumes - Everything about docker Volumes
ninita397
 
Docker training
Kiran Kumar
 
Docker for Ruby Developers
Aptible
 
PDXPortland - Dockerize Django
Hannes Hapke
 
Dockerize a Django app elegantly
frentrup
 
Docker for Deep Learning (Andrea Panizza)
MeetupDataScienceRoma
 
Ad

Recently uploaded (20)

PPTX
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
 
PPTX
For my supp to finally picking supp that work
necas19388
 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
PPTX
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
 
PDF
AI Software Development Process, Strategies and Challenges
Net-Craft.com
 
PDF
Difference Between Kubernetes and Docker .pdf
Kindlebit Solutions
 
PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
 
PDF
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
PPTX
ManageIQ - Sprint 264 Review - Slide Deck
ManageIQ
 
PDF
Rewards and Recognition (2).pdf
ethan Talor
 
PDF
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
 
PPTX
ERP - FICO Presentation BY BSL BOKARO STEEL LIMITED.pptx
ravisranjan
 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
PPTX
Android Notifications-A Guide to User-Facing Alerts in Android .pptx
Nabin Dhakal
 
PDF
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
 
PDF
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
PDF
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PDF
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
 
For my supp to finally picking supp that work
necas19388
 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
 
AI Software Development Process, Strategies and Challenges
Net-Craft.com
 
Difference Between Kubernetes and Docker .pdf
Kindlebit Solutions
 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
 
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
ManageIQ - Sprint 264 Review - Slide Deck
ManageIQ
 
Rewards and Recognition (2).pdf
ethan Talor
 
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
 
ERP - FICO Presentation BY BSL BOKARO STEEL LIMITED.pptx
ravisranjan
 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
 
Android Notifications-A Guide to User-Facing Alerts in Android .pptx
Nabin Dhakal
 
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
 
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Laboratory Workflows Digitalized and live in 90 days with Scifeon´s SAPPA P...
info969686
 

Using docker for data science - part 2

  • 1. USING DOCKER FOR DATA SCIENCE
  • 3. WHY DOCKER Portable environment Isolated between projects Stateless Fast local file access Hetrogenous
  • 4. GET DOCKER https://docs.docker.com/installation/ boot2docker .dmg or .exe apt-get install docker.io ...
  • 5. RUN SCIPYSERVER $ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  • 6. CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo
  • 7. MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  • 8. CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <[email protected]> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 9. THIS TIME Creating and connecting to local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Automated builds with github linking Forget everything and use fig
  • 10. CREATE LOCAL DATABASE CONTAINERS $ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu $ docker run -d --name=dev_postgres postgres $ docker run -d --name=dev_mongo mongo $ docker run -d -e "PASSWORD=YourPassword?" --link dev_postgres:dev_postgres --link dev_mongo:dev_mongo --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 11. TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB Either: $ boot2docker delete $ boot2docker init -m 5555 ... lots of output ... $ boot2docker info { ... "Memory":5555 ...} Or (doesn't loose non-host data persistence): $ VBoxManage modifyvm boot2docker-vm --memory 5555 $ boot2docker stop $ boot2docker start $ boot2docker info { ... "Memory":5555 ...}
  • 12. AUTOMATED BUILDS WITH GITHUB LINKING Commit Dockerfile, requirements.txt etc. to a github repo Add an "Automated Buld" on docker hub Select the repo and accept defaults Check the "Build Details" for your repo build to finish $ docker run <dockername>/<reponame>
  • 13. FORGET EVERYTHING AND USE FIG http://www.fig.sh/install.html $ curl -L https://github.com/docker/fig/releases/download/ 1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig $ chmod +x ~/bin/fig
  • 14. FIG.YML -- DATA notebooks: command: echo created image: busybox volumes: - "~/Google Drive/notebooks:/notebooks/analysis" data: command: echo created image: busybox volumes: - "~/Google Drive/data:/data/analysis" ...
  • 15. FIG.YML -- POSTGRES ... devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD image: postgres links: ports: - "5432:5432" volumes_from: - devpostgresdata ...
  • 16. FIG.YML -- NOTEBOOK SERVER ... ds_server: environment: - PASSWORD image: calvingiles/data-science-environment links: - devpostgres:postgres ports: - "443:8888" volumes_from: - notebooks - data
  • 17. FIG UP In the same directory as fig.yml: $ fig rm $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 18. HERE'S ONE I MADE EARLIER $ curl -L http://goo.gl/rW47v3 > fig.yml $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 19. NEXT TIME Linking to private git repositories Lessons learnt from using fig Resizing boot2docker volume (to fix "no space left on device") Fixing "Error response from daemon: client and server don't have same version" TLS and CA certs to fix "Your connection is not private" Whatever other pain I have had to deal with before then Whatever pain you feel -- let me know @cavingiles
  • 20. MORE? Docker: http://docs.docker.com/userguide/ http://docs.docker.com/reference/commandline/cli/ Fig: http://www.fig.sh/ ipython docker images: https://registry.hub.docker.com/repos/ipython/ my docker image: https://github.com/calvingiles/data-science-environment https://registry.hub.docker.com/u/calvingiles/data-science-environment/ fig.yml gist: http://goo.gl/rW47v3
  • 21. ABOUT ME Calvin Giles Data Scientist at Adthena PyData Meetup Organiser untangleconsulting.io [email protected] @calvingiles on twitter, github, docker hub (and many more)