SlideShare a Scribd company logo
Deploying Data Science with
Docker and AWS
Audience: Cambridge AWS Meetup Group
Presenter: Matt McDonnell, Data Scientist at Metail
Date: 9th June 2016
Context
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespoke Analysis
- Productionised Science
What?
Goal: Moving laptop analyses onto a server
Turn :
<types>run_analysis.sh<presses enter>
… analysis script retrieves data from DB, Looker, web, etc. …
… runs analysis …
… outputs results as csv, png, etc. to local hard disk …
<gets back command prompt>
Into :
Automated process running on a server
Why?
• Production scheduled task e.g. Firm Wide Metrics daily processing
• Make use of more powerful Amazon Web Services (AWS) cloud resources
for large scale analysis
• Ease of deployment for Data Science analysts
• Build consistent development environment
How?
• Containerize applications and runtime using Docker to produce images
• Store images on AWS Elastic Container Registry (ECR)
• Run images either locally, or Amazon Elastic Container Service (ECS)
• Use AWS Lambda functions to trigger scheduled tasks (or react to events)
What is Docker?
“Docker containers wrap up a piece of software in a complete
filesystem that contains everything it needs to run: code, runtime,
system tools, system libraries – anything you can install on a server. This
guarantees that it will always run the same, regardless of the
environment it is running in.” -- https://www.docker.com/what-docker
Public code: store Dockerfile on GitHub, use Travis to automatically
build image on DockerHub
Private code: private Dockerfile, build locally, push image to AWS Elastic
Container Registry
Example application: retrieve market data
PyAnalysis
Application code built on PCR image
https://github.com/mattmcd/PyAnalysis
PCR: Python Component Runtime
Base Docker image
https://github.com/mattmcd/PCR
Where? Amazon Web Services Cloud
• Elastic Container Service (ECS)
• Defines the task that runs the container
• Runs tasks on a cluster of EC2 nodes
• EC2 instance set up to act as node
• Needs to be an AWS ECS optimized AMI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
• Needs an IAM Role that has:
• AmazonEC2ContainerServiceforEC2Role policy attached
• Policies to allow access to any AWS resources needed e.g. S3
• Lambda function to trigger ECS task
• cron equivalent by using CloudWatch scheduled events
EC2 Instance Security Group
EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
EC2 Instance AMI
Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
EC2 Instance Details
Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
EC2 Instance IAM Role
Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
ECS Task
ECS task retrieves image and runs it
Lambda function
Use the lambda-canary blueprint as a basis for cron job equivalents
Lambda function
cron job equivalent via CloudWatch scheduled event
Lambda Function
Simple Lambda function to run task on ECS
Lambda function IAM role
AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
Demo / Q&A
Blog posts
• ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me)
• ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me)
Code samples
• https://github.com/mattmcd/PyAnalysis
• https://github.com/mattmcd/PCR
Docker images
• mattmcd/pyanalysis
• mattmcd/pcr
Me
• Twitter @mattmcd
• Email matt@metail.com or matt@matt-mcdonnell.com

More Related Content

PDF
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
PDF
Serverless Stream Processing with Bill Bejeck
PDF
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
PDF
Container orchestration k8s azure kubernetes services
PPTX
Azure virtual machine-network
PPTX
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
PDF
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
PDF
Moving Quickly with Data Services in the Cloud
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Serverless Stream Processing with Bill Bejeck
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
Container orchestration k8s azure kubernetes services
Azure virtual machine-network
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Moving Quickly with Data Services in the Cloud

What's hot (20)

PDF
Long running aws lambda - Joel Schuweiler, Minneapolis
PPTX
PPTX
Azure AKS
PDF
Kubernetes in Azure
PPTX
Doing Azure With PowerShell
PDF
Kube London May 2018
PDF
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
PPTX
AWS Kinesis
PDF
SoCal NodeJS Meetup 20170215_aws_lambda
PDF
Azure kubernetes service (aks) part 3
PDF
Apache JClouds
ODP
Walk-through: Amazon ECS
PPTX
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
PPTX
Major Container Platform Comparison
PPTX
Serverless Apps with Open Whisk
PDF
Amazon Web Services
PPTX
Binary Studio Academy 2016. MS Azure. Cloud hosting.
PPTX
TugaIT 2016 - Docker and the world of “containerized" environments​
PPTX
Deploy Elasticsearch Cluster on Kubernetes
PPTX
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Long running aws lambda - Joel Schuweiler, Minneapolis
Azure AKS
Kubernetes in Azure
Doing Azure With PowerShell
Kube London May 2018
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
AWS Kinesis
SoCal NodeJS Meetup 20170215_aws_lambda
Azure kubernetes service (aks) part 3
Apache JClouds
Walk-through: Amazon ECS
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
Major Container Platform Comparison
Serverless Apps with Open Whisk
Amazon Web Services
Binary Studio Academy 2016. MS Azure. Cloud hosting.
TugaIT 2016 - Docker and the world of “containerized" environments​
Deploy Elasticsearch Cluster on Kubernetes
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Ad

Viewers also liked (9)

PDF
Docker @ Data Science Meetup
PDF
Agile deployment predictive analytics on hadoop
PDF
PMML Execution of R Built Predictive Solutions
PDF
Pattern: PMML for Cascading and Hadoop
PDF
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
PDF
Using python and docker for data science
PPT
PMML - Predictive Model Markup Language
PDF
Docker for data science
PPT
Reproducible bioinformatics pipelines with Docker and Anduril
Docker @ Data Science Meetup
Agile deployment predictive analytics on hadoop
PMML Execution of R Built Predictive Solutions
Pattern: PMML for Cascading and Hadoop
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using python and docker for data science
PMML - Predictive Model Markup Language
Docker for data science
Reproducible bioinformatics pipelines with Docker and Anduril
Ad

Similar to Deploying Data Science with Docker and AWS (20)

PDF
Running Docker Containers on AWS
PDF
ECS and ECR deep dive
PPTX
AWS ECS Meetup Talentica
PPTX
Introduction to AWS and Docker on ECS
PDF
[Games on AWS 2019] AWS 입문자를 위한 초단기 레벨업 트랙 | AWS 레벨업 하기! : 컨테이너 - 김세호 AWS 솔루션...
PDF
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
PDF
Getting started with Amazon ECS
PPTX
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
PDF
[AWS Container Service] Getting Started with Cloud Map, App Mesh and Firecracker
PPTX
Containers State of the Union I AWS Dev Day 2018
PDF
A 60-minute tour of AWS Compute (November 2016)
PDF
EFS_Integration.pdf
PDF
AWS Workshop 102
PPTX
Tech connect aws
PPTX
Amazon Container Services
PPTX
AWS SSA Webinar 12 - Getting started on AWS with Containers
PPTX
Introduction to Amazon EC2 Container Service and setting up build pipeline wi...
PPTX
ECS and Docker at Okta
PPTX
Getting Started With Docker on AWS
PDF
Containers on AWS - State of the Union
Running Docker Containers on AWS
ECS and ECR deep dive
AWS ECS Meetup Talentica
Introduction to AWS and Docker on ECS
[Games on AWS 2019] AWS 입문자를 위한 초단기 레벨업 트랙 | AWS 레벨업 하기! : 컨테이너 - 김세호 AWS 솔루션...
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Getting started with Amazon ECS
Deep Dive on Amazon Elastic Container Service (ECS) I AWS Dev Day 2018
[AWS Container Service] Getting Started with Cloud Map, App Mesh and Firecracker
Containers State of the Union I AWS Dev Day 2018
A 60-minute tour of AWS Compute (November 2016)
EFS_Integration.pdf
AWS Workshop 102
Tech connect aws
Amazon Container Services
AWS SSA Webinar 12 - Getting started on AWS with Containers
Introduction to Amazon EC2 Container Service and setting up build pipeline wi...
ECS and Docker at Okta
Getting Started With Docker on AWS
Containers on AWS - State of the Union

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Programs and apps: productivity, graphics, security and other tools
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine Learning_overview_presentation.pptx
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25-Week II
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm

Deploying Data Science with Docker and AWS

  • 1. Deploying Data Science with Docker and AWS Audience: Cambridge AWS Meetup Group Presenter: Matt McDonnell, Data Scientist at Metail Date: 9th June 2016
  • 2. Context Lots of event stream data Many AWS components Outputs: - Business Intelligence - Bespoke Analysis - Productionised Science
  • 3. What? Goal: Moving laptop analyses onto a server Turn : <types>run_analysis.sh<presses enter> … analysis script retrieves data from DB, Looker, web, etc. … … runs analysis … … outputs results as csv, png, etc. to local hard disk … <gets back command prompt> Into : Automated process running on a server
  • 4. Why? • Production scheduled task e.g. Firm Wide Metrics daily processing • Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis • Ease of deployment for Data Science analysts • Build consistent development environment How? • Containerize applications and runtime using Docker to produce images • Store images on AWS Elastic Container Registry (ECR) • Run images either locally, or Amazon Elastic Container Service (ECS) • Use AWS Lambda functions to trigger scheduled tasks (or react to events)
  • 5. What is Docker? “Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.” -- https://www.docker.com/what-docker Public code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHub Private code: private Dockerfile, build locally, push image to AWS Elastic Container Registry
  • 6. Example application: retrieve market data PyAnalysis Application code built on PCR image https://github.com/mattmcd/PyAnalysis PCR: Python Component Runtime Base Docker image https://github.com/mattmcd/PCR
  • 7. Where? Amazon Web Services Cloud • Elastic Container Service (ECS) • Defines the task that runs the container • Runs tasks on a cluster of EC2 nodes • EC2 instance set up to act as node • Needs to be an AWS ECS optimized AMI https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html • Needs an IAM Role that has: • AmazonEC2ContainerServiceforEC2Role policy attached • Policies to allow access to any AWS resources needed e.g. S3 • Lambda function to trigger ECS task • cron equivalent by using CloudWatch scheduled events
  • 8. EC2 Instance Security Group EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
  • 9. EC2 Instance AMI Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
  • 10. EC2 Instance Details Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
  • 11. EC2 Instance IAM Role Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
  • 12. ECS Task ECS task retrieves image and runs it
  • 13. Lambda function Use the lambda-canary blueprint as a basis for cron job equivalents
  • 14. Lambda function cron job equivalent via CloudWatch scheduled event
  • 15. Lambda Function Simple Lambda function to run task on ECS
  • 16. Lambda function IAM role AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
  • 17. Demo / Q&A Blog posts • ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me) • ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me) Code samples • https://github.com/mattmcd/PyAnalysis • https://github.com/mattmcd/PCR Docker images • mattmcd/pyanalysis • mattmcd/pcr Me • Twitter @mattmcd • Email [email protected] or [email protected]