SlideShare a Scribd company logo
DevOps and HPC:
Saudi Aramco HPC use case
Walid A. Shaari 20th April 2016
Ahmed Bu-khamsin
2
References in this document to any specific commercial products, process, or
service by trade name, trademark, manufacturer, or otherwise, does not
necessarily constitute or imply its endorsement, recommendation, or favoring
by Saudi Aramco or Saudi Aramco HPC group. The ideas and findings of authors
expressed in any slides or other material should not be construed as an official
Saudi Aramco or HPC team position and shall not be used for advertising or
product endorsement purposes. Information contained in this document is
published in the interest of scientific and technical information exchange.
DISCLAIMER OF ENDORSEMENT
27/10/2014
3
DevOps
Cultural movement or practice that
emphasizes the collaboration and
communication of both Application
Developers and Operations
professionals.
Development
Business
Operations
adaptive
automated
agile
4
Business Drives
o Optimization
Effective data center(s) resources utilization:
• Utilization of systems, storage, network, or services.
• Better use of employees time and skills.
o Growth ( N x R x P )
Increasing Infrastructure scale
• N: number of managed nodes/clusters/environments
• R: number of applications(business roles)
• P: number of technical services (technology profiles)
5
Popular DevOps Tools
Docker
Mesos
GIT Puppet
6
Data Center blueprints
7
Script
Packages
Files
Services Mounts Security
Cluster Deployment
8
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
Script
Pack ag es
Files Servic
es
Mo
un
ts
Securi
ty
• Different Hardware
• Different Sizes
• Different Users
• Different Operating Systems
9
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Script
Packag
es
Fi l es Ser
vi c
es
M
o
u
n
t
s
Se c
u rit
y
Common Tasks:
Apply security patches
Add new storage
Upgrade the OS
Install new packages
Common Issues:
Scalability issue
Lack of history
No team collaboration
No drift control
Long development and
test cycle
10
• Do it DevOps way
- Infrastructure as code
• Definition of Infrastructure as code:
"Enable the reconstruction of the business from nothing but a source code
repository, an application data backup, and bare metal resources"
Solution
11
• Domain Specific Language:
- To describe the infrastructure desired state
• Data Store:
- To store the configuration specifications and other data
• Control System:
- To deploy the code and apply the required configuration changes
• Versioning Control System
- To keep history
- enforce workflow and peer review
- Team collaboration
Configuration Management Tools
12
Puppet
• Open-source IT automation framework
• Framework to simplify and automate system configuration and provisioning
• Replaces ssh-for loops and scripts
• Hundreds of configuration modules available for download
• Supports many Linux distributions, Windows, storage and network devices
13
• Hardware Delivery
• Power Up and Network Connectivity
• OS Installation
• Aramco Customization
• Benchmarking
• Application Testing
• Production
HP CMU . IBM xCat . Dell Bright
Where Puppet Fits
Cluster Deployment Project Plan
14
Benefits
• Speeds up clusters deployment From days to hours
- Shorter development cycle
- Same code is used for deployment and compliance
- Code Reuse
15
Benefits
Contribution During Puppet Deployment Project
Contribution During First Deployment Project
Contribution During Second Deployment Project
November 13 2014 - April 22 2015
Commits statistic for
production
697 commits during 160
days
Average 4.4 commits per day
Contributed by 9 authors
16
Benefits
• Automatic and continuous deployment
- Classify the cluster to the right type and Puppet does the rest
17
Benefits
• Advanced reporting capabilities
• Self healing and drift control
• Baseline configuration compliance
18
Benefits
• Version control and development workflow
• Team Collaboration
Production
Bug-fix
New feature
Merge
Request
Merge
Request
19
git Branches and Commits
20
How Pervasive is Configuration Management?
ASM
21
Traditional HPC Cluster Management tools
https://www.flickr.com/photos/vrogy/514733529
22
Provisioning
Workload
Scheduler
& Metrics
System
(user land, kernel modules, devices)
Bare metalBootstrapping
Coniguration
Orchestration
consistency
Provisioningactivity
puppet,
Ansible,
Chef
Grid Engine
SLURM
TORQUE/MOAB
Mesos /Swarm/Nomad
puppet,
Chef
Ansible
foreman
Razor
Digital-rebar
Ironic
Virtual
image
Container
HPC OPSWeb/Cloud OPS
HPC workload runs on
the cloud
25%
24
Which workloads and frameworks are running on
OpenStack?
Source : https://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf
25
HPC in non bare-metal Experimental? Is it Mature?
Vendor trends
26
Next Generation Provisioning
Puppet
Razor Ironic
• No vendor lock: Open Source availability
• Environments Agnostic
• bare-metal, virtual image, and containers
• Use open standards
• Ipmi2, ipxe, dhcp, REST, https
• Handles end to end application provisioning
• Better integration with other tools
• configuration management, CMDB, Monitoring
• Programmable
• On-demand provisioning
• Policy/Model based
27
Data Center current state
SchedulerSchedulerScheduler
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Cluster Management A
Cluster Management B
Cluster Management C
0%
50%
100%
28
Data Center
Breaking the Silos
SchedulerSchedulerScheduler
MetaScheduler
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
29
Data Center
Efficient Secure Allocation of Resources
VC3
BigData
VC1
Infra
VC2
HPC
SchedulerSchedulerScheduler
DataCenterScheduler
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
Jobs
2nd Generation Cluster Management
30
Containers
Container encapsulate an application completely with all of its
software dependencies into a standardized unit for software portable
across different platforms*
https://www.docker.com/what-docker
31
Containers Potential Benefits to HPC
o High performing
o Lightweight
o Portable, could solve software packaging, configuration, and delivery
o Host Kernel and system drivers visibility
o Composable
o Targets better scalable monitoring, logging, and security
o Private in-house repositories
o Workforce Separation of concerns (e.g. Operations, Development, Security, Users)
o Builds on mature agile application lifecycle management
o Empowers application support, and developers
o Holistic, yet modular ECO system
o Schedulers, and cluster managers
(Traditional e.g. LSF, UGE, Moab, and Slurm)
(Modern: Mesos, Kubernetes, nextflow)
32
Docker Performance
http://www.theregister.co.uk/2014/08/18/docker_kicks_kvms_butt_in_ibm_tests
33
NVIDIA Example use case
https://github.com/NVIDIA/nvidia-docker
34
Host possible workload
Tiny Core Linux (VM)
Docker Engine
Bin/libs
Enterprise Linux Distribution
Service
RHEL7
HPCtask
HPCtask
HPCtask
HPCtask
AlpineMicroService
MicroService
MicroService
MicroService
Ubuntu
Bigdata
Alpine
Redis
Kibana
Logstash
Elasticsearc
35
HPC Host Reality
RHEL7
HPCTask
HPCTask
HPCTAsk
HPCTask
Bin/Libs
HPC service
Docker Engine
Docker capable OS
Bin/Libs
HPC service
Bin/Libs
HPC service
Docker Engine
Docker capable OS
Docker Engine
Docker capable OS
Bin/Libs
HPC Job 3
Docker Engine
Docker capable OS
Docker Engine
Docker capable OS
Bin/Libs
HPC Job 3
Bin/Libs
HPC Job 3
Container Cluster Management/orchestration
36
Possible HPC Challenges
o Change of processes, and mindset
o Linux kernel requirements
o Maturity of the cluster management and scheduling solution
o Keeping up with the containers eco system
o Extremely fast moving target
o Several architectural and fundamental decisions to make
o Memory deduplication
o Necessity of automated tool-chains
“development, integration, and delivery workflows”
o Security
Trusted container libraries
37
Thank you
38
Extra Slides
27/10/2014
39
• http://www.meetup.com/Docker-Riyadh/
• http://www.meetup.com/Docker-Dhahran/
Saudi Docker meetups
27/10/2014
40
Mesos
§ Mature, Open Source Apache Project
§ Cluster Resource Manager
§ Scalable to 10,000s of nodes
§ Fault tolerant, no single point of failure
§ Multi-tenancy with strong resource isolation
§ Improved resource utilization
41
Mesos workload schedulers “Frameworks”
42
43
File system Layers
44
Devil in the details
ssh
mpi
Scheduler
Init
musl glibc
Docker Engine
Docker capable OS
Bin/Libs
HPC service

More Related Content

What's hot (20)

PPTX
Système de recommandations de films
Ibn Tofail University
 
PPTX
Chp1 - Introduction aux ERP
Lilia Sfaxi
 
PDF
[오픈소스컨설팅]Scouter 설치 및 사용가이드(JBoss)
Ji-Woong Choi
 
PDF
So You Want to Write a Connector?
confluent
 
PDF
Apache Kafka, Un système distribué de messagerie hautement performant
ALTIC Altic
 
PDF
Sapo Microservices Architecture
Khôi Nguyễn Minh
 
PDF
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Matt Butcher
 
PDF
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
Hyunmin Lee
 
PDF
Fault Tolerance 패턴
YoungSu Son
 
PPTX
Introduction aux systèmes répartis
Heithem Abbes
 
PDF
Les outils numériques de la veille pour les professeurs documentalistes
Laetitia Ory
 
PPT
Les MéThodes Agiles
guesta206aa87
 
PDF
[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어
NHN FORWARD
 
PDF
Cours Big Data Chap5
Amal Abid
 
PPTX
Performance Testing using Loadrunner
hmfive
 
PDF
Chapitre 4 heuristiques et méta heuristiques
Sana Aroussi
 
PDF
VictoriaMetrics 2023 Roadmap
VictoriaMetrics
 
PDF
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
ODP
La gestion de fichier
PLATEL Carl
 
PPTX
Go micro framework to build microservices
TechMaster Vietnam
 
Système de recommandations de films
Ibn Tofail University
 
Chp1 - Introduction aux ERP
Lilia Sfaxi
 
[오픈소스컨설팅]Scouter 설치 및 사용가이드(JBoss)
Ji-Woong Choi
 
So You Want to Write a Connector?
confluent
 
Apache Kafka, Un système distribué de messagerie hautement performant
ALTIC Altic
 
Sapo Microservices Architecture
Khôi Nguyễn Minh
 
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Matt Butcher
 
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
Hyunmin Lee
 
Fault Tolerance 패턴
YoungSu Son
 
Introduction aux systèmes répartis
Heithem Abbes
 
Les outils numériques de la veille pour les professeurs documentalistes
Laetitia Ory
 
Les MéThodes Agiles
guesta206aa87
 
[2018] NHN 모니터링의 현재와 미래 for 인프라 엔지니어
NHN FORWARD
 
Cours Big Data Chap5
Amal Abid
 
Performance Testing using Loadrunner
hmfive
 
Chapitre 4 heuristiques et méta heuristiques
Sana Aroussi
 
VictoriaMetrics 2023 Roadmap
VictoriaMetrics
 
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
La gestion de fichier
PLATEL Carl
 
Go micro framework to build microservices
TechMaster Vietnam
 

Viewers also liked (20)

PDF
Docker Dhahran Nov 2016 meetup
Walid Shaari
 
PDF
Streamlining HPC Workloads with Containers
Dustin Kirkland
 
PDF
[Container world 2017] The Questions You're Afraid to Ask about Containers
Dustin Kirkland
 
PPTX
HPC Top 5 Stories: March 22, 2017
NVIDIA
 
PDF
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
PDF
HPC Storage Appliances for the Enterpris
Intel IT Center
 
PDF
Packaging Software, Puppet Labs Style - PuppetConf 2014
Puppet
 
PDF
Docker for HPC in a Nutshell
inside-BigData.com
 
PPS
Cell phone safety by Waleed Al-Shemamry (ARAMCO)
Dr Ghaiath Hussein
 
PDF
Reproducible Computational Pipelines with Docker and Nextflow
inside-BigData.com
 
PDF
Optimizing Lustre and GPFS with DDN
inside-BigData.com
 
PDF
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
PPT
GPFS - graphical intro
Alex Balk
 
PDF
How a Business Executive Led the Implementation of Agile, Lean & CI/CD
Em Campbell-Pretty
 
PPTX
APIs as your digital connector
Nuwan Bandara
 
PDF
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Walid Shaari
 
PPTX
Creating Packages that Run Anywhere with Chef Habitat
Nell Shamrell-Harrington
 
PDF
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 
PPTX
Containers and HPC
Olli-Pekka Lehto
 
PPTX
Exploring the Momentum: The Intersection of AI and HPC
NVIDIA
 
Docker Dhahran Nov 2016 meetup
Walid Shaari
 
Streamlining HPC Workloads with Containers
Dustin Kirkland
 
[Container world 2017] The Questions You're Afraid to Ask about Containers
Dustin Kirkland
 
HPC Top 5 Stories: March 22, 2017
NVIDIA
 
HPC in a Box - Docker Workshop at ISC 2015
inside-BigData.com
 
HPC Storage Appliances for the Enterpris
Intel IT Center
 
Packaging Software, Puppet Labs Style - PuppetConf 2014
Puppet
 
Docker for HPC in a Nutshell
inside-BigData.com
 
Cell phone safety by Waleed Al-Shemamry (ARAMCO)
Dr Ghaiath Hussein
 
Reproducible Computational Pipelines with Docker and Nextflow
inside-BigData.com
 
Optimizing Lustre and GPFS with DDN
inside-BigData.com
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
GPFS - graphical intro
Alex Balk
 
How a Business Executive Led the Implementation of Agile, Lean & CI/CD
Em Campbell-Pretty
 
APIs as your digital connector
Nuwan Bandara
 
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Walid Shaari
 
Creating Packages that Run Anywhere with Chef Habitat
Nell Shamrell-Harrington
 
Trends towards the merge of HPC + Big Data systems
Igor José F. Freitas
 
Containers and HPC
Olli-Pekka Lehto
 
Exploring the Momentum: The Intersection of AI and HPC
NVIDIA
 
Ad

Similar to What HPC can learn from DevOps? (20)

PDF
HP Helion Webinar #4 - Open stack the magic pill
BeMyApp
 
PDF
Infrastructure-as-Code with Puppet Enterprise in the Cloud - PuppetConf 2014
Puppet
 
PDF
DevOps - Top Trends In 2019
Vikash Karuna
 
PDF
HPC HUB - Virtual Supercomputer on Demand
Vilgelm Bitner
 
PPTX
Industrialization, be fast be furious!
Patrick Morin
 
PPTX
DevOps State of the Union 2015
Ernest Mueller
 
PPTX
Technical Presentation - Self-Managed Cloud Solution.PPTX
Nguyen Ho long
 
PPTX
HP Helion OpenStack and Professional Services
Matthew Farina
 
PPTX
DevOps: a story about automation, open source & the Cloud
Adrian Todorov
 
PDF
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
Ignacio M. Llorente
 
PDF
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
OpenNebula Project
 
PDF
Introduction to DevOps and the Practical Use Cases at Credit OK
Kriangkrai Chaonithi
 
PDF
4 hp converged_cloud
openstackindia
 
PDF
Evolution of unix environments and the road to faster deployments
Rakuten Group, Inc.
 
PDF
HPC on OpenStack
Erich Birngruber
 
PDF
Evolving Infrastructure and Management for Business Agility
Red Hat India Pvt. Ltd.
 
PDF
Planning open stack-poc
Vietnam Open Infrastructure User Group
 
PPTX
Automating Software Development Life Cycle - A DevOps Approach
Akshaya Mahapatra
 
PPTX
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Pierre GRANDIN
 
PPTX
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Arun Kumar
 
HP Helion Webinar #4 - Open stack the magic pill
BeMyApp
 
Infrastructure-as-Code with Puppet Enterprise in the Cloud - PuppetConf 2014
Puppet
 
DevOps - Top Trends In 2019
Vikash Karuna
 
HPC HUB - Virtual Supercomputer on Demand
Vilgelm Bitner
 
Industrialization, be fast be furious!
Patrick Morin
 
DevOps State of the Union 2015
Ernest Mueller
 
Technical Presentation - Self-Managed Cloud Solution.PPTX
Nguyen Ho long
 
HP Helion OpenStack and Professional Services
Matthew Farina
 
DevOps: a story about automation, open source & the Cloud
Adrian Todorov
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
Ignacio M. Llorente
 
ISC Cloud 2013 - Cloud Architectures for HPC – Industry Case Studies
OpenNebula Project
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Kriangkrai Chaonithi
 
4 hp converged_cloud
openstackindia
 
Evolution of unix environments and the road to faster deployments
Rakuten Group, Inc.
 
HPC on OpenStack
Erich Birngruber
 
Evolving Infrastructure and Management for Business Agility
Red Hat India Pvt. Ltd.
 
Automating Software Development Life Cycle - A DevOps Approach
Akshaya Mahapatra
 
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Pierre GRANDIN
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Arun Kumar
 
Ad

More from Walid Shaari (14)

PDF
Towards-cloud-native-HPC.pdf
Walid Shaari
 
PDF
Aws ug dxb 2021 container series iv
Walid Shaari
 
PDF
Open hybrid cloud
Walid Shaari
 
PDF
Okd wg kubecon marathon azure & vsphere
Walid Shaari
 
PDF
K8s architecture meetup2- k8saraby
Walid Shaari
 
PDF
Pydata 2020 containers meetup
Walid Shaari
 
PPTX
Dammam aws user group meetup
Walid Shaari
 
PPTX
Bahrain ch9 introduction to docker 5th birthday
Walid Shaari
 
PDF
IAU workshop 2018 day one
Walid Shaari
 
PDF
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Walid Shaari
 
PDF
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
PDF
Kick starting Network Automation
Walid Shaari
 
PDF
Docker Dhahran November 2017 meetup
Walid Shaari
 
PDF
Docker 101 @KACST Saudi HPC 2016
Walid Shaari
 
Towards-cloud-native-HPC.pdf
Walid Shaari
 
Aws ug dxb 2021 container series iv
Walid Shaari
 
Open hybrid cloud
Walid Shaari
 
Okd wg kubecon marathon azure & vsphere
Walid Shaari
 
K8s architecture meetup2- k8saraby
Walid Shaari
 
Pydata 2020 containers meetup
Walid Shaari
 
Dammam aws user group meetup
Walid Shaari
 
Bahrain ch9 introduction to docker 5th birthday
Walid Shaari
 
IAU workshop 2018 day one
Walid Shaari
 
Containers - Portable, repeatable user-oriented application delivery. Build, ...
Walid Shaari
 
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
Kick starting Network Automation
Walid Shaari
 
Docker Dhahran November 2017 meetup
Walid Shaari
 
Docker 101 @KACST Saudi HPC 2016
Walid Shaari
 

Recently uploaded (20)

PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PPTX
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
DoS Attack vs DDoS Attack_ The Silent Wars of the Internet.pdf
CyberPro Magazine
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Redefining Work in the Age of AI - What to expect? How to prepare? Why it mat...
Malinda Kapuruge
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Reimaginando la Ciberdefensa: De Copilots a Redes de Agentes
Cristian Garcia G.
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
Kubernetes - Architecture & Components.pdf
geethak285
 

What HPC can learn from DevOps?

  • 1. DevOps and HPC: Saudi Aramco HPC use case Walid A. Shaari 20th April 2016 Ahmed Bu-khamsin
  • 2. 2 References in this document to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Saudi Aramco or Saudi Aramco HPC group. The ideas and findings of authors expressed in any slides or other material should not be construed as an official Saudi Aramco or HPC team position and shall not be used for advertising or product endorsement purposes. Information contained in this document is published in the interest of scientific and technical information exchange. DISCLAIMER OF ENDORSEMENT 27/10/2014
  • 3. 3 DevOps Cultural movement or practice that emphasizes the collaboration and communication of both Application Developers and Operations professionals. Development Business Operations adaptive automated agile
  • 4. 4 Business Drives o Optimization Effective data center(s) resources utilization: • Utilization of systems, storage, network, or services. • Better use of employees time and skills. o Growth ( N x R x P ) Increasing Infrastructure scale • N: number of managed nodes/clusters/environments • R: number of applications(business roles) • P: number of technical services (technology profiles)
  • 8. 8 Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty Script Pack ag es Files Servic es Mo un ts Securi ty • Different Hardware • Different Sizes • Different Users • Different Operating Systems
  • 9. 9 Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Script Packag es Fi l es Ser vi c es M o u n t s Se c u rit y Common Tasks: Apply security patches Add new storage Upgrade the OS Install new packages Common Issues: Scalability issue Lack of history No team collaboration No drift control Long development and test cycle
  • 10. 10 • Do it DevOps way - Infrastructure as code • Definition of Infrastructure as code: "Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal resources" Solution
  • 11. 11 • Domain Specific Language: - To describe the infrastructure desired state • Data Store: - To store the configuration specifications and other data • Control System: - To deploy the code and apply the required configuration changes • Versioning Control System - To keep history - enforce workflow and peer review - Team collaboration Configuration Management Tools
  • 12. 12 Puppet • Open-source IT automation framework • Framework to simplify and automate system configuration and provisioning • Replaces ssh-for loops and scripts • Hundreds of configuration modules available for download • Supports many Linux distributions, Windows, storage and network devices
  • 13. 13 • Hardware Delivery • Power Up and Network Connectivity • OS Installation • Aramco Customization • Benchmarking • Application Testing • Production HP CMU . IBM xCat . Dell Bright Where Puppet Fits Cluster Deployment Project Plan
  • 14. 14 Benefits • Speeds up clusters deployment From days to hours - Shorter development cycle - Same code is used for deployment and compliance - Code Reuse
  • 15. 15 Benefits Contribution During Puppet Deployment Project Contribution During First Deployment Project Contribution During Second Deployment Project November 13 2014 - April 22 2015 Commits statistic for production 697 commits during 160 days Average 4.4 commits per day Contributed by 9 authors
  • 16. 16 Benefits • Automatic and continuous deployment - Classify the cluster to the right type and Puppet does the rest
  • 17. 17 Benefits • Advanced reporting capabilities • Self healing and drift control • Baseline configuration compliance
  • 18. 18 Benefits • Version control and development workflow • Team Collaboration Production Bug-fix New feature Merge Request Merge Request
  • 20. 20 How Pervasive is Configuration Management? ASM
  • 21. 21 Traditional HPC Cluster Management tools https://www.flickr.com/photos/vrogy/514733529
  • 22. 22 Provisioning Workload Scheduler & Metrics System (user land, kernel modules, devices) Bare metalBootstrapping Coniguration Orchestration consistency Provisioningactivity puppet, Ansible, Chef Grid Engine SLURM TORQUE/MOAB Mesos /Swarm/Nomad puppet, Chef Ansible foreman Razor Digital-rebar Ironic Virtual image Container HPC OPSWeb/Cloud OPS
  • 23. HPC workload runs on the cloud 25%
  • 24. 24 Which workloads and frameworks are running on OpenStack? Source : https://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf
  • 25. 25 HPC in non bare-metal Experimental? Is it Mature? Vendor trends
  • 26. 26 Next Generation Provisioning Puppet Razor Ironic • No vendor lock: Open Source availability • Environments Agnostic • bare-metal, virtual image, and containers • Use open standards • Ipmi2, ipxe, dhcp, REST, https • Handles end to end application provisioning • Better integration with other tools • configuration management, CMDB, Monitoring • Programmable • On-demand provisioning • Policy/Model based
  • 27. 27 Data Center current state SchedulerSchedulerScheduler Jobs Jobs Jobs Jobs Jobs Jobs Jobs Jobs Jobs Cluster Management A Cluster Management B Cluster Management C 0% 50% 100%
  • 28. 28 Data Center Breaking the Silos SchedulerSchedulerScheduler MetaScheduler Jobs Jobs Jobs Jobs Jobs Jobs Jobs Jobs
  • 29. 29 Data Center Efficient Secure Allocation of Resources VC3 BigData VC1 Infra VC2 HPC SchedulerSchedulerScheduler DataCenterScheduler Jobs Jobs Jobs Jobs Jobs Jobs Jobs Jobs 2nd Generation Cluster Management
  • 30. 30 Containers Container encapsulate an application completely with all of its software dependencies into a standardized unit for software portable across different platforms* https://www.docker.com/what-docker
  • 31. 31 Containers Potential Benefits to HPC o High performing o Lightweight o Portable, could solve software packaging, configuration, and delivery o Host Kernel and system drivers visibility o Composable o Targets better scalable monitoring, logging, and security o Private in-house repositories o Workforce Separation of concerns (e.g. Operations, Development, Security, Users) o Builds on mature agile application lifecycle management o Empowers application support, and developers o Holistic, yet modular ECO system o Schedulers, and cluster managers (Traditional e.g. LSF, UGE, Moab, and Slurm) (Modern: Mesos, Kubernetes, nextflow)
  • 33. 33 NVIDIA Example use case https://github.com/NVIDIA/nvidia-docker
  • 34. 34 Host possible workload Tiny Core Linux (VM) Docker Engine Bin/libs Enterprise Linux Distribution Service RHEL7 HPCtask HPCtask HPCtask HPCtask AlpineMicroService MicroService MicroService MicroService Ubuntu Bigdata Alpine Redis Kibana Logstash Elasticsearc
  • 35. 35 HPC Host Reality RHEL7 HPCTask HPCTask HPCTAsk HPCTask Bin/Libs HPC service Docker Engine Docker capable OS Bin/Libs HPC service Bin/Libs HPC service Docker Engine Docker capable OS Docker Engine Docker capable OS Bin/Libs HPC Job 3 Docker Engine Docker capable OS Docker Engine Docker capable OS Bin/Libs HPC Job 3 Bin/Libs HPC Job 3 Container Cluster Management/orchestration
  • 36. 36 Possible HPC Challenges o Change of processes, and mindset o Linux kernel requirements o Maturity of the cluster management and scheduling solution o Keeping up with the containers eco system o Extremely fast moving target o Several architectural and fundamental decisions to make o Memory deduplication o Necessity of automated tool-chains “development, integration, and delivery workflows” o Security Trusted container libraries
  • 40. 40 Mesos § Mature, Open Source Apache Project § Cluster Resource Manager § Scalable to 10,000s of nodes § Fault tolerant, no single point of failure § Multi-tenancy with strong resource isolation § Improved resource utilization
  • 41. 41 Mesos workload schedulers “Frameworks”
  • 42. 42
  • 44. 44 Devil in the details ssh mpi Scheduler Init musl glibc Docker Engine Docker capable OS Bin/Libs HPC service