USING DOCKER FOR DATA 
SCIENCE
RECAP
WHY DOCKER 
Portable environment 
Isolated between projects 
Stateless 
Fast local file access 
Hetrogenous
GET DOCKER 
https://docs.docker.com/installation/ 
boot2docker .dmg or .exe 
apt-get install docker.io ...
RUN SCIPYSERVER 
$ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
ipython/scipyserver 
https://localhost:443 
https://{boot2docker ip}:443
CREATE DATA-ONLY CONTAINERS 
$ docker run  
-d  
-v ~/notebooks:/notebooks  
--name notebooks_container  
ubuntu 
echo notebooks 
$ docker run -d -v ~/data:/data --name data_container ubuntu echo
MOUNT DATA-ONLY CONTAINERS 
$ docker stop dev_notebook 
$ docker rm dev_notebook 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
ipython/scipyserver
CREATE A DOCKERFILE 
FROM ipython/scipyserver 
MAINTAINER Calvin Giles <calvin.giles@gmail.com> 
COPY requirements.txt /requirements.txt 
RUN pip2 install -r /requirements.txt 
RUN pip3 install -r /requirements.txt 
$ docker build  
-t calvingiles/ds-notebook  
. 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
THIS TIME 
Creating and connecting to local database containers 
Tweaking the boot2docker vm memory from 2GB to 8 (or 
more...) 
Automated builds with github linking 
Forget everything and use fig
CREATE LOCAL DATABASE CONTAINERS 
$ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu 
$ docker run -d --name=dev_postgres postgres 
$ docker run -d --name=dev_mongo mongo 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--link dev_postgres:dev_postgres 
--link dev_mongo:dev_mongo 
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB 
Either: 
$ boot2docker delete 
$ boot2docker init -m 5555 
... lots of output ... 
$ boot2docker info 
{ ... "Memory":5555 ...} 
Or (doesn't loose non-host data persistence): 
$ VBoxManage modifyvm boot2docker-vm --memory 5555 
$ boot2docker stop 
$ boot2docker start 
$ boot2docker info 
{ ... "Memory":5555 ...}
AUTOMATED BUILDS WITH GITHUB LINKING 
Commit Dockerfile, requirements.txt etc. to a github 
repo 
Add an "Automated Buld" on 
docker hub 
Select the repo and accept defaults 
Check the "Build Details" for your repo build to finish 
$ docker run <dockername>/<reponame>
FORGET EVERYTHING AND USE FIG 
http://www.fig.sh/install.html 
$ curl -L https://github.com/docker/fig/releases/download/ 
1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig 
$ chmod +x ~/bin/fig
FIG.YML -- DATA 
notebooks: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/notebooks:/notebooks/analysis" 
data: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/data:/data/analysis" 
...
FIG.YML -- POSTGRES 
... 
devpostgresdata: 
command: echo created 
image: busybox 
volumes: 
- /var/lib/postgresql/data 
devpostgres: 
environment: 
- POSTGRES_PASSWORD 
image: postgres 
links: 
ports: 
- "5432:5432" 
volumes_from: 
- devpostgresdata 
...
FIG.YML -- NOTEBOOK SERVER 
... 
ds_server: 
environment: 
- PASSWORD 
image: calvingiles/data-science-environment 
links: 
- devpostgres:postgres 
ports: 
- "443:8888" 
volumes_from: 
- notebooks 
- data
FIG UP 
In the same directory as fig.yml: 
$ fig rm 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
HERE'S ONE I MADE EARLIER 
$ curl -L http://goo.gl/rW47v3 > fig.yml 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
NEXT TIME 
Linking to private git repositories 
Lessons learnt from using fig 
Resizing boot2docker volume (to fix "no space left on device") 
Fixing "Error response from daemon: client and server don't 
have same version" 
TLS and CA certs to fix "Your connection is not private" 
Whatever other pain I have had to deal with before then 
Whatever pain you feel -- let me know @cavingiles
MORE? 
Docker: 
http://docs.docker.com/userguide/ 
http://docs.docker.com/reference/commandline/cli/ 
Fig: 
http://www.fig.sh/ 
ipython docker images: 
https://registry.hub.docker.com/repos/ipython/ 
my docker image: 
https://github.com/calvingiles/data-science-environment 
https://registry.hub.docker.com/u/calvingiles/data-science-environment/ 
fig.yml gist: 
http://goo.gl/rW47v3
ABOUT ME 
Calvin Giles 
Data Scientist at Adthena 
PyData Meetup Organiser 
untangleconsulting.io 
calvin.giles@gmail.com 
@calvingiles on twitter, github, docker hub (and many more)

More Related Content

PDF
Docker for data science
PDF
Docker @ Data Science Meetup
PDF
Docker, c'est bonheur !
PDF
Docker Demo @ IuK Seminar
PDF
Configuration Surgery with Augeas
PPTX
2012 coscup - Build your PHP application on Heroku
PDF
Puppet at Opera Sofware - PuppetCamp Oslo 2013
PDF
Shared Object images in Docker: What you need is what you want.
Docker for data science
Docker @ Data Science Meetup
Docker, c'est bonheur !
Docker Demo @ IuK Seminar
Configuration Surgery with Augeas
2012 coscup - Build your PHP application on Heroku
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Shared Object images in Docker: What you need is what you want.

What's hot (18)

PDF
Manage WordPress with Awesome using wp cli
PPTX
2009 cluster user training
DOCX
Hadoop installation
PDF
Ops for everyone - John Britton
PDF
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
PDF
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
PDF
Medicine show2 Drupal Bristol Camp 2015
PPTX
Drupal from scratch
PDF
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
PDF
Ansible, Simplicity, and the Zen of Python
PDF
Ansible - Swiss Army Knife Orchestration
PDF
PuppetCamp SEA 1 - Use of Puppet
PDF
Top Node.js Metrics to Watch
PDF
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
PDF
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
PDF
Docker & FieldAware
PDF
Configuration surgery with Augeas (OggCamp 12)
PDF
AnsibleFest 2014 - Role Tips and Tricks
Manage WordPress with Awesome using wp cli
2009 cluster user training
Hadoop installation
Ops for everyone - John Britton
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
Medicine show2 Drupal Bristol Camp 2015
Drupal from scratch
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
Ansible, Simplicity, and the Zen of Python
Ansible - Swiss Army Knife Orchestration
PuppetCamp SEA 1 - Use of Puppet
Top Node.js Metrics to Watch
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
Docker & FieldAware
Configuration surgery with Augeas (OggCamp 12)
AnsibleFest 2014 - Role Tips and Tricks
Ad

Viewers also liked (20)

PDF
Using python and docker for data science
PDF
BIG DATA サービス と ツール
PDF
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
PDF
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
PDF
Growing the Mesos Ecosystem
PDF
Time Series Processing with Solr and Spark
PDF
Overview of DataStax OpsCenter
PPTX
High Performance Processing of Streaming Data
PPTX
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
PDF
Data analysis with Pandas and Spark
PDF
The basics of fluentd
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
PDF
Fluentd and Kafka
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
PPTX
Hadoop on Docker
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
PPTX
I Heart Log: Real-time Data and Apache Kafka
PDF
Data processing platforms with SMACK: Spark and Mesos internals
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using python and docker for data science
BIG DATA サービス と ツール
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Growing the Mesos Ecosystem
Time Series Processing with Solr and Spark
Overview of DataStax OpsCenter
High Performance Processing of Streaming Data
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data analysis with Pandas and Spark
The basics of fluentd
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Fluentd and Kafka
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Hadoop on Docker
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
I Heart Log: Real-time Data and Apache Kafka
Data processing platforms with SMACK: Spark and Mesos internals
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Ad

Similar to Using docker for data science - part 2 (20)

PDF
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
PDF
Django로 만든 웹 애플리케이션 도커라이징하기 + 도커 컴포즈로 개발 환경 구축하기
PPTX
Docker workshop DevOpsDays Amsterdam 2014
PDF
Deploying Plone and Volto, the Hard Way
PDF
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
PDF
Docker - from development to production (PHPNW 2017-09-05)
PDF
Docker, the Future of DevOps
PPTX
Docker for Web Developers: A Sneak Peek
PDF
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PDF
Challenges of container configuration
PDF
Keep it simple web development stack
PDF
Docker perl build
PDF
Troubleshooting Tips from a Docker Support Engineer
PDF
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
PDF
From Docker Run To Docker Compose
PDF
PDF
Py conkr 20150829_docker-python
PDF
Py conkr 20150829_docker-python
PDF
Into The Box 2018 Going live with commandbox and docker
PDF
Going live with BommandBox and docker Into The Box 2018
파이썬 개발환경 구성하기의 끝판왕 - Docker Compose
Django로 만든 웹 애플리케이션 도커라이징하기 + 도커 컴포즈로 개발 환경 구축하기
Docker workshop DevOpsDays Amsterdam 2014
Deploying Plone and Volto, the Hard Way
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
Docker - from development to production (PHPNW 2017-09-05)
Docker, the Future of DevOps
Docker for Web Developers: A Sneak Peek
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
Challenges of container configuration
Keep it simple web development stack
Docker perl build
Troubleshooting Tips from a Docker Support Engineer
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, Docker
From Docker Run To Docker Compose
Py conkr 20150829_docker-python
Py conkr 20150829_docker-python
Into The Box 2018 Going live with commandbox and docker
Going live with BommandBox and docker Into The Box 2018

Recently uploaded (20)

PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Cybersecurity: Protecting the Digital World
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PPTX
Introduction to Windows Operating System
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PDF
Guide to Food Delivery App Development.pdf
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PPTX
Computer Software - Technology and Livelihood Education
PDF
Website Design Services for Small Businesses.pdf
PDF
AI Guide for Business Growth - Arna Softech
PPTX
most interesting chapter in the world ppt
BoxLang Dynamic AWS Lambda - Japan Edition
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Cybersecurity: Protecting the Digital World
iTop VPN Crack Latest Version Full Key 2025
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
Introduction to Windows Operating System
Download Adobe Photoshop Crack 2025 Free
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
CCleaner 6.39.11548 Crack 2025 License Key
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Guide to Food Delivery App Development.pdf
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
How Tridens DevSecOps Ensures Compliance, Security, and Agility
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
Computer Software - Technology and Livelihood Education
Website Design Services for Small Businesses.pdf
AI Guide for Business Growth - Arna Softech
most interesting chapter in the world ppt

Using docker for data science - part 2

  • 1. USING DOCKER FOR DATA SCIENCE
  • 3. WHY DOCKER Portable environment Isolated between projects Stateless Fast local file access Hetrogenous
  • 4. GET DOCKER https://docs.docker.com/installation/ boot2docker .dmg or .exe apt-get install docker.io ...
  • 5. RUN SCIPYSERVER $ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  • 6. CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo
  • 7. MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  • 8. CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <[email protected]> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 9. THIS TIME Creating and connecting to local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Automated builds with github linking Forget everything and use fig
  • 10. CREATE LOCAL DATABASE CONTAINERS $ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu $ docker run -d --name=dev_postgres postgres $ docker run -d --name=dev_mongo mongo $ docker run -d -e "PASSWORD=YourPassword?" --link dev_postgres:dev_postgres --link dev_mongo:dev_mongo --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 11. TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB Either: $ boot2docker delete $ boot2docker init -m 5555 ... lots of output ... $ boot2docker info { ... "Memory":5555 ...} Or (doesn't loose non-host data persistence): $ VBoxManage modifyvm boot2docker-vm --memory 5555 $ boot2docker stop $ boot2docker start $ boot2docker info { ... "Memory":5555 ...}
  • 12. AUTOMATED BUILDS WITH GITHUB LINKING Commit Dockerfile, requirements.txt etc. to a github repo Add an "Automated Buld" on docker hub Select the repo and accept defaults Check the "Build Details" for your repo build to finish $ docker run <dockername>/<reponame>
  • 13. FORGET EVERYTHING AND USE FIG http://www.fig.sh/install.html $ curl -L https://github.com/docker/fig/releases/download/ 1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig $ chmod +x ~/bin/fig
  • 14. FIG.YML -- DATA notebooks: command: echo created image: busybox volumes: - "~/Google Drive/notebooks:/notebooks/analysis" data: command: echo created image: busybox volumes: - "~/Google Drive/data:/data/analysis" ...
  • 15. FIG.YML -- POSTGRES ... devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD image: postgres links: ports: - "5432:5432" volumes_from: - devpostgresdata ...
  • 16. FIG.YML -- NOTEBOOK SERVER ... ds_server: environment: - PASSWORD image: calvingiles/data-science-environment links: - devpostgres:postgres ports: - "443:8888" volumes_from: - notebooks - data
  • 17. FIG UP In the same directory as fig.yml: $ fig rm $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 18. HERE'S ONE I MADE EARLIER $ curl -L http://goo.gl/rW47v3 > fig.yml $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 19. NEXT TIME Linking to private git repositories Lessons learnt from using fig Resizing boot2docker volume (to fix "no space left on device") Fixing "Error response from daemon: client and server don't have same version" TLS and CA certs to fix "Your connection is not private" Whatever other pain I have had to deal with before then Whatever pain you feel -- let me know @cavingiles
  • 20. MORE? Docker: http://docs.docker.com/userguide/ http://docs.docker.com/reference/commandline/cli/ Fig: http://www.fig.sh/ ipython docker images: https://registry.hub.docker.com/repos/ipython/ my docker image: https://github.com/calvingiles/data-science-environment https://registry.hub.docker.com/u/calvingiles/data-science-environment/ fig.yml gist: http://goo.gl/rW47v3
  • 21. ABOUT ME Calvin Giles Data Scientist at Adthena PyData Meetup Organiser untangleconsulting.io [email protected] @calvingiles on twitter, github, docker hub (and many more)