SlideShare a Scribd company logo
2
Most read
5
Most read
19
Most read
APRIL, 2023
DataOps
The Future of Data Management - Embracing Agility,
Collaboration, and Automation
Agenda
2
Introductions
DevOps to DataOps
CI/CD for Data Products
Orchestration, Testing and Monitoring
Questions
Jeewan Singh
Senior Principal,
Data Analytics
Tomy Rhymond
Principal- Cloud Lead
Technology Enablement
3
About Us.
So…. what is DevOps, really???
DevOps is a cultural movement to:
• Improve Collaboration
• Automate operations (aka the “plumbing”)
• Increase the rate of deployment
• Improve quality and security
What
 Source Control
 CI/CD
 Infrastructure Automation (IAC)
 Automated Test and Validation
 Design for Scalability
 Use the Cloud
How
Why
Spend more time on valuable work
… and have more fun!
Continuous Deployment Of Databases : Part 1
Data and Analytics professional face unique challenges for
automation
State Rolling back Other
Testing
Down Time
Application code is
stateless
Database contains
valuable business data
Change structure and
data without loss
Hand crafting release
scripts is error-prone
Application servers are
easy to swap in/out
Database servers are
very difficult to swap
in/out (even in cluster)
Can sometimes swap
databases or tables
in/out
Applications easy to roll
back from source
control
Databases must be
explicitly backed up and
restored
Very time-consuming
Database unavailable
during restore
Application code is easy
to test with unit tests
Unit testing for
databases is challenging
Unit testing requires test
data generation and
management which gets
complicated quickly
Configuration changes
deployed via CI/CD
Most often only DBAs
touch the database
(control)
Prod databases don’t
match source control
(drift)
Database change
management is difficult
6
These Roadblocks add friction, prevent automation, and
slow adoption of DataOps best practices
Fragile Column Mappings
Embedded Credentials
Hard-coded connections
Black-Box SaaS
GUI-Only Tools
5 Critical Mindset Changes
 Business Requirements are Static
“Our job is to meet the agreed business requirements.”
 Single-Developer, Individual Ownership
“Someone will email me if it breaks.”
 UAT Testing Approach
“We will run some tests before we launch.”
 Everything Manual
“No time to build the automation yet.”
 Demos at End of Project
“Creating demos take time.”
Traditional Mindset DevOps Mindset
 Business Requirements are Fluid
“We aren’t doing right if we assume requirements are static.”
 Multiple Developers, Team Ownership
“Someone else may have to fix this if it breaks.”
 Continuous Testing Approach
“We wrote the tests before we started developing.”
 Mostly Automated
“No time to waste on manual stuff.”
 Demos Daily or Weekly
“Continual feedback is critical to success.”
8
DataOps is a collaborative and automated approach to
managing the entire lifecycle of data, from its creation to
its deletion, in a way that ensures that data is trustworthy,
accurate, and readily available to the right people at the
right time.
PEOPLE PROCESS
TECHNOL
OGY
DataOps Collaboration
Product
Owner/Architect
Operations/
Administration
Chief Data Officer
Data
Analysts
Data
Scientist
Data
Engineer
10
DataOps is an approach to data analytics and data-driven decision
making that follows the agile methodology of continuous
improvement.
Source
Data
Data
Ingestions
Data
Engineering
Data
Analytics
Business
Users
DataOps
CI/CD Orchestration Testing Monitoring
11
DataOps practices are an investment whose dividends
increase with time and experience
Increased speed of delivery
from improved processes
End-to-end efficient data
form automated pipelines
with feedback loops
Improved productivity and
collaboration from
empowered developers
Better business outcomes
from happier customers
Secure and compliant data
from automated, data
quality checks, masking,
tokenization and more.
Reduced mean time to
resolution (MTTR) from shift-
left quality approach
Increased data reliability
and resiliency
Developer empowerment with the
DevOps culture that promote
collaboration and ownership &
accountability
12
DataOps Principles
Analytics is code.
Differences can be spotted easily and
are all committed to the code repo.
Orchestrate.
When everything is automated, we
never have to choose between delivery
new features and performing manual
maintenance.
Make it reproducible.
The code runs the same way every time.
There is no state to manage and there are no
“two ways” to run it which might produce
different results.
Disposable environments.
There’s no such things as data loss. At any
time, the production environment can be
recycled, and a new environment can be spun
up automatically.
DataOps Maturity Model
CI/CD for Data
Products
Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
What do we mean when we say “CI/CD”?
CI/CD Definitions
Continuous Integration (CI)
is a software engineering practice in which
developers integrate code into a shared
repository several times a day in order to
obtain rapid feedback of the feasibility of that
code. CI enables automated build and
testing so that teams can rapidly work on a
single project together.
Continuous Deployment (also
CD)
is the process by which qualified changes in
software code or architecture are deployed
to production as soon as they are ready and
without human intervention.
Continuous Delivery (CD)
is a software engineering practice in which
teams develop, build, test, and release
software in short cycles. It depends on
automation at every stage so that cycles can
be both quick and reliable.
Developing with
CI/CD commit
commit
commit
commit
commit
main
branch
dev
branch
Pull
Request
✔
✔
✔
❌
Rebuild a
“Beta” Copy
of DW
Auto-Publish
to Production
DW
❌
Refreshed daily/hourly
1. Continuous Integration (CI) Testing:
Automatic or with every commit!
2. Continuous Delivery (CD):
New changes automatically delivered in beta!
3. Continuous Deployment (also CD):
New features and fixes delivered
to customers automatically!
✔ ❌
 1) Store all your files in source control.
 2) Create a full deployment script.
 3) Create a text file pointing to your
deployment script.
CI/CDGettingStartedChecklist
Orchestration, Testing
and Monitoring
18
DataOps Compared to DevOps
Develop Build Test Deploy Run
CI CD
Sandbox Develop Orchestrate Test Deploy
Orchestrate
Monitor
CI
CD
©4/13/23
Slalom. All Rights Reserved. Proprietary and Confidential. 19
Modern Cloud Data Reference Architecture
Data Pipeline Orchestration and Monitoring
Security: Authorization & Authentication
Continuous Integration, Continuous Deployment (CI/CD)
End-User
Manufacturer
Management Team
Internal Analytics
Teams
External Users
Data Source Layer
External
Unstructured Data
Loyalty
E-Commerce
POS Technology
Patient Support Program
Wholesale Distribution
Vistex JDA MBA Anzio
SoloChain MSA
Maple CMSV2
PharmaClick
POS
Reflex POS
Tulip MagicBox
Guardian
Rewards
Uniprix
Rewards
Proxim
Rewards
Newsletter LMS NPS / Survey
IQVIA Nielsen Health Canada
Program
Participation
First Data Bank
IQ DataSmart UniBi
Website /
Facebook
Email
(Dialogue)
Mobile Apps
UniSante
ProxiSante
PTS (db)
Proxim POS Cyberlog ICN
General Pharmacy
Operations Team
Data Lake
Raw Zone
Processed Zone
Curated Zone
Data
Ingestion
Batch Ingestion
• Cloud based ETL
• Event driven f(x)
• Rest APIs
Streaming Ingestion
• Real-time ingestion
• IoT Devices
Machine Learning
(Predictions & Recommendations)
Feature
Generation
Model
Development
Model
Deployment
Model
Monitoring
Central Data Storage
Data Warehouse
Transformation
&
Business
Rules
Data Governance and Access
Data Access Layer Governance Layer Management Layer
Centralized Policies
Data Quality Monitoring
Data Lineage & Metadata
Data Catalog
Consistent Controls
Security Policy Enforcement
Data
Tokenization
&
Masking
Patient Data Hub
Facts
Dimensions
Aggregates
Views
Merge & Match
Deduplication
Enrichment
Specialty Pharmacy
Operations Team
Consumption Layer
Operational Reports
• Warehouse & Specialty
• Store Sales & Growth
• Kiosk Reports
External Data Portal
• Neilsen Data
• External Kiosk
• SharePoint
Sandbox Environment
• Ad-hoc data analysis
• Raw data analysis
• Merging / curating data
sets
Analytical Dashboard
• Manufacturer Insights
• Patient Insights
• Pharmacy Insights
API Apps
• LifeLabs Apps
• Loyalty Program Apps
• Etc.
VPN
Patient / Customer
Data Governance
SMEs
SIR
DLD
RX Technology
Kroll
Reflex RX
Fillware
Compliance
Cube
AssysteRx
PharmaClick RX
Applied
Robotics
Ubik
Data Warehouses
GCP E-
commerce
RelayHealth
Hub
SAP
BeWell
Diem
Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
Orchestrated,Test and Monitor
Orchestrate
• Both Infrastructure as code and data
pipeline code with single pipeline
• Composer (GCP), Airflow, Azure Data
Factory (Azure), DBT, DataOps.live,
Informatica, Mattilion, Stitch, AWS Data
Pipeline
Monitor
• Cloud Resources
• GCP Monitoring, CloudWatch,
Azure Monitor, Datadog
• Data pipelines
• Respective tools, native cloud
monitoring dashboards
• Data Quality
• ETL tools, manual tools on top of
data platforms
Test
• At the end of the pipeline run
• DBT, DataOps.live, Google Dataform,
Boomi, Informatica, Matillion, Great
Expectations, TSQLT
21
From ETL
to ELTP
Extract
Load
Transform
Publish
Extract
Transform
Load
Extract
Load
Transform
Publish
Benefits of ELT over ETL:
• non-destructive updates
• improved stability and recoverability
“Publish” step signals that data is available
and ready for downstream subscribers, may
involve shipping a copy of the data into the
data lake, replicating to multiple redshift
clusters, populating BI models, or similar
actions.
22
At the core of DataOps is your organization’s information
architecture
• How well you know your data?
• Do you trust your data?
• Are you able to quickly detect errors?
• Can you make changes incrementally without
“breaking” your entire data pipeline?
Critical areas below can transform your data
pipeline:
• Data Curation services
• Metadata Management
• Data Governance
• Master Data Management
• Self-Service interaction
Thank You.
Questions?

More Related Content

PPTX
ODSC May 2019 - The DataOps Manifesto
PPTX
Screw DevOps, Let's Talk DataOps
PDF
Data Mesh for Dinner
PDF
Introdution to Dataops and AIOps (or MLOps)
PDF
DataOps - The Foundation for Your Agile Data Architecture
PPTX
powerbi-presentation.pptx
PDF
Modern Data architecture Design
PDF
Implementing Effective Data Governance
ODSC May 2019 - The DataOps Manifesto
Screw DevOps, Let's Talk DataOps
Data Mesh for Dinner
Introdution to Dataops and AIOps (or MLOps)
DataOps - The Foundation for Your Agile Data Architecture
powerbi-presentation.pptx
Modern Data architecture Design
Implementing Effective Data Governance

What's hot (20)

PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PPTX
Building a modern data warehouse
PDF
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
PDF
Databricks Delta Lake and Its Benefits
PPTX
Databricks Platform.pptx
PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Free Training: How to Build a Lakehouse
PPTX
Databricks Fundamentals
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PPTX
Introduction to Azure Databricks
PDF
Modernizing to a Cloud Data Architecture
PPTX
Microsoft Fabric Introduction
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PDF
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
PPTX
Azure Synapse Analytics Overview (r2)
PPTX
Snowflake Datawarehouse Architecturing
PDF
Databricks Overview for MLOps
PDF
Data Discovery at Databricks with Amundsen
PPTX
Introduction to Data Engineering
PDF
The ABCs of Treating Data as Product
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Building a modern data warehouse
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Databricks Delta Lake and Its Benefits
Databricks Platform.pptx
DW Migration Webinar-March 2022.pptx
Free Training: How to Build a Lakehouse
Databricks Fundamentals
Architect’s Open-Source Guide for a Data Mesh Architecture
Introduction to Azure Databricks
Modernizing to a Cloud Data Architecture
Microsoft Fabric Introduction
Modern Data Warehousing with the Microsoft Analytics Platform System
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Azure Synapse Analytics Overview (r2)
Snowflake Datawarehouse Architecturing
Databricks Overview for MLOps
Data Discovery at Databricks with Amundsen
Introduction to Data Engineering
The ABCs of Treating Data as Product
Ad

Similar to DataOps , cbuswaw April '23 (20)

PPTX
What is DevOps?
PDF
2022 Trends in Enterprise Analytics
PPTX
DevOps 101 - IBM Impact 2014
PPTX
Quality 4.0 and reimagining quality
PDF
Digital Disruption with DevOps - Reference Architecture Overview
PDF
IBM Collaborative Lifecycle Management Solution for DevOps v6
PPTX
SplunkLive! London 2016 Splunk for Devops
PPTX
How SQL Change Automation helps you deliver value faster
PDF
Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0
PPTX
Back To Basics
PDF
How to Automate your Enterprise Application / ERP Testing
PPTX
Data summit connect fall 2020 - rise of data ops
PDF
Using Lean Thinking to Identify and Address Delivery Pipeline Bottlenecks
PDF
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
PDF
Continuous Integration and Continuous Delivery on Azure
PDF
A DevOps adoption playbook- achieving business value at scale
PPTX
Big Data and Analytics
PPTX
Big Data and Analytics
PPTX
Enterprise DevOps and the Modern Mainframe Webcast Presentation
PDF
451 Research: Data Is the Key to Friction in DevOps
What is DevOps?
2022 Trends in Enterprise Analytics
DevOps 101 - IBM Impact 2014
Quality 4.0 and reimagining quality
Digital Disruption with DevOps - Reference Architecture Overview
IBM Collaborative Lifecycle Management Solution for DevOps v6
SplunkLive! London 2016 Splunk for Devops
How SQL Change Automation helps you deliver value faster
Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0
Back To Basics
How to Automate your Enterprise Application / ERP Testing
Data summit connect fall 2020 - rise of data ops
Using Lean Thinking to Identify and Address Delivery Pipeline Bottlenecks
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Continuous Integration and Continuous Delivery on Azure
A DevOps adoption playbook- achieving business value at scale
Big Data and Analytics
Big Data and Analytics
Enterprise DevOps and the Modern Mainframe Webcast Presentation
451 Research: Data Is the Key to Friction in DevOps
Ad

More from Jason Packer (20)

PDF
CBUSDAW - Ash Lewis - Reducing LLM Hallucinations
PDF
CBUSDAW April 2025 - Predicting and Preventing Homelessness
PDF
Landing Page A/B Testing with Melanie Bowles
PDF
CBUSDAW Oct 2024 - Geo Testing with Sanjay Tamrakar
PDF
Columbus Data & Analytics Wednesdays - June 2024
PDF
Third Party Cookies: Columbus DAW March 2024
PDF
Cbuswaw October '23, Marketing Mix Modeling
PDF
Generative AI and SEO
PDF
Google Analytics Alternatives
PDF
Google Analytics Alternatives
PDF
Web Analytics Wednesday April 2020 - Customer Journey Mapping
PPTX
Introduction to Factor Analysis
PDF
Product Analytics at Web Analytics Wednesday
PPTX
Columbus Web Analytics Wednesday September 2019
PDF
How to Present Test Results to Inspire Action
PPTX
Sentiment analysis
PDF
CBUSWAW - October 2017 Alain Stephan
PDF
Attribution 101
PDF
CBUSWAW presentation July 2016
PPTX
CBUSWAW presentation May 2016
CBUSDAW - Ash Lewis - Reducing LLM Hallucinations
CBUSDAW April 2025 - Predicting and Preventing Homelessness
Landing Page A/B Testing with Melanie Bowles
CBUSDAW Oct 2024 - Geo Testing with Sanjay Tamrakar
Columbus Data & Analytics Wednesdays - June 2024
Third Party Cookies: Columbus DAW March 2024
Cbuswaw October '23, Marketing Mix Modeling
Generative AI and SEO
Google Analytics Alternatives
Google Analytics Alternatives
Web Analytics Wednesday April 2020 - Customer Journey Mapping
Introduction to Factor Analysis
Product Analytics at Web Analytics Wednesday
Columbus Web Analytics Wednesday September 2019
How to Present Test Results to Inspire Action
Sentiment analysis
CBUSWAW - October 2017 Alain Stephan
Attribution 101
CBUSWAW presentation July 2016
CBUSWAW presentation May 2016

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
project resource management chapter-09.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Chapter 5: Probability Theory and Statistics
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
August Patch Tuesday
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Approach and Philosophy of On baking technology
PDF
A comparative analysis of optical character recognition models for extracting...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TLE Review Electricity (Electricity).pptx
Zenith AI: Advanced Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
project resource management chapter-09.pdf
A Presentation on Touch Screen Technology
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid model detection and classification of lung cancer
Building Integrated photovoltaic BIPV_UPV.pdf
Getting Started with Data Integration: FME Form 101
Chapter 5: Probability Theory and Statistics
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Encapsulation_ Review paper, used for researhc scholars
Heart disease approach using modified random forest and particle swarm optimi...
August Patch Tuesday
Enhancing emotion recognition model for a student engagement use case through...
Approach and Philosophy of On baking technology
A comparative analysis of optical character recognition models for extracting...

DataOps , cbuswaw April '23

  • 1. APRIL, 2023 DataOps The Future of Data Management - Embracing Agility, Collaboration, and Automation
  • 2. Agenda 2 Introductions DevOps to DataOps CI/CD for Data Products Orchestration, Testing and Monitoring Questions
  • 3. Jeewan Singh Senior Principal, Data Analytics Tomy Rhymond Principal- Cloud Lead Technology Enablement 3 About Us.
  • 4. So…. what is DevOps, really??? DevOps is a cultural movement to: • Improve Collaboration • Automate operations (aka the “plumbing”) • Increase the rate of deployment • Improve quality and security What  Source Control  CI/CD  Infrastructure Automation (IAC)  Automated Test and Validation  Design for Scalability  Use the Cloud How Why Spend more time on valuable work … and have more fun!
  • 5. Continuous Deployment Of Databases : Part 1 Data and Analytics professional face unique challenges for automation State Rolling back Other Testing Down Time Application code is stateless Database contains valuable business data Change structure and data without loss Hand crafting release scripts is error-prone Application servers are easy to swap in/out Database servers are very difficult to swap in/out (even in cluster) Can sometimes swap databases or tables in/out Applications easy to roll back from source control Databases must be explicitly backed up and restored Very time-consuming Database unavailable during restore Application code is easy to test with unit tests Unit testing for databases is challenging Unit testing requires test data generation and management which gets complicated quickly Configuration changes deployed via CI/CD Most often only DBAs touch the database (control) Prod databases don’t match source control (drift) Database change management is difficult
  • 6. 6 These Roadblocks add friction, prevent automation, and slow adoption of DataOps best practices Fragile Column Mappings Embedded Credentials Hard-coded connections Black-Box SaaS GUI-Only Tools
  • 7. 5 Critical Mindset Changes  Business Requirements are Static “Our job is to meet the agreed business requirements.”  Single-Developer, Individual Ownership “Someone will email me if it breaks.”  UAT Testing Approach “We will run some tests before we launch.”  Everything Manual “No time to build the automation yet.”  Demos at End of Project “Creating demos take time.” Traditional Mindset DevOps Mindset  Business Requirements are Fluid “We aren’t doing right if we assume requirements are static.”  Multiple Developers, Team Ownership “Someone else may have to fix this if it breaks.”  Continuous Testing Approach “We wrote the tests before we started developing.”  Mostly Automated “No time to waste on manual stuff.”  Demos Daily or Weekly “Continual feedback is critical to success.”
  • 8. 8 DataOps is a collaborative and automated approach to managing the entire lifecycle of data, from its creation to its deletion, in a way that ensures that data is trustworthy, accurate, and readily available to the right people at the right time. PEOPLE PROCESS TECHNOL OGY
  • 10. 10 DataOps is an approach to data analytics and data-driven decision making that follows the agile methodology of continuous improvement. Source Data Data Ingestions Data Engineering Data Analytics Business Users DataOps CI/CD Orchestration Testing Monitoring
  • 11. 11 DataOps practices are an investment whose dividends increase with time and experience Increased speed of delivery from improved processes End-to-end efficient data form automated pipelines with feedback loops Improved productivity and collaboration from empowered developers Better business outcomes from happier customers Secure and compliant data from automated, data quality checks, masking, tokenization and more. Reduced mean time to resolution (MTTR) from shift- left quality approach Increased data reliability and resiliency Developer empowerment with the DevOps culture that promote collaboration and ownership & accountability
  • 12. 12 DataOps Principles Analytics is code. Differences can be spotted easily and are all committed to the code repo. Orchestrate. When everything is automated, we never have to choose between delivery new features and performing manual maintenance. Make it reproducible. The code runs the same way every time. There is no state to manage and there are no “two ways” to run it which might produce different results. Disposable environments. There’s no such things as data loss. At any time, the production environment can be recycled, and a new environment can be spun up automatically.
  • 15. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration What do we mean when we say “CI/CD”? CI/CD Definitions Continuous Integration (CI) is a software engineering practice in which developers integrate code into a shared repository several times a day in order to obtain rapid feedback of the feasibility of that code. CI enables automated build and testing so that teams can rapidly work on a single project together. Continuous Deployment (also CD) is the process by which qualified changes in software code or architecture are deployed to production as soon as they are ready and without human intervention. Continuous Delivery (CD) is a software engineering practice in which teams develop, build, test, and release software in short cycles. It depends on automation at every stage so that cycles can be both quick and reliable.
  • 16. Developing with CI/CD commit commit commit commit commit main branch dev branch Pull Request ✔ ✔ ✔ ❌ Rebuild a “Beta” Copy of DW Auto-Publish to Production DW ❌ Refreshed daily/hourly 1. Continuous Integration (CI) Testing: Automatic or with every commit! 2. Continuous Delivery (CD): New changes automatically delivered in beta! 3. Continuous Deployment (also CD): New features and fixes delivered to customers automatically! ✔ ❌  1) Store all your files in source control.  2) Create a full deployment script.  3) Create a text file pointing to your deployment script. CI/CDGettingStartedChecklist
  • 18. 18 DataOps Compared to DevOps Develop Build Test Deploy Run CI CD Sandbox Develop Orchestrate Test Deploy Orchestrate Monitor CI CD
  • 19. ©4/13/23 Slalom. All Rights Reserved. Proprietary and Confidential. 19 Modern Cloud Data Reference Architecture Data Pipeline Orchestration and Monitoring Security: Authorization & Authentication Continuous Integration, Continuous Deployment (CI/CD) End-User Manufacturer Management Team Internal Analytics Teams External Users Data Source Layer External Unstructured Data Loyalty E-Commerce POS Technology Patient Support Program Wholesale Distribution Vistex JDA MBA Anzio SoloChain MSA Maple CMSV2 PharmaClick POS Reflex POS Tulip MagicBox Guardian Rewards Uniprix Rewards Proxim Rewards Newsletter LMS NPS / Survey IQVIA Nielsen Health Canada Program Participation First Data Bank IQ DataSmart UniBi Website / Facebook Email (Dialogue) Mobile Apps UniSante ProxiSante PTS (db) Proxim POS Cyberlog ICN General Pharmacy Operations Team Data Lake Raw Zone Processed Zone Curated Zone Data Ingestion Batch Ingestion • Cloud based ETL • Event driven f(x) • Rest APIs Streaming Ingestion • Real-time ingestion • IoT Devices Machine Learning (Predictions & Recommendations) Feature Generation Model Development Model Deployment Model Monitoring Central Data Storage Data Warehouse Transformation & Business Rules Data Governance and Access Data Access Layer Governance Layer Management Layer Centralized Policies Data Quality Monitoring Data Lineage & Metadata Data Catalog Consistent Controls Security Policy Enforcement Data Tokenization & Masking Patient Data Hub Facts Dimensions Aggregates Views Merge & Match Deduplication Enrichment Specialty Pharmacy Operations Team Consumption Layer Operational Reports • Warehouse & Specialty • Store Sales & Growth • Kiosk Reports External Data Portal • Neilsen Data • External Kiosk • SharePoint Sandbox Environment • Ad-hoc data analysis • Raw data analysis • Merging / curating data sets Analytical Dashboard • Manufacturer Insights • Patient Insights • Pharmacy Insights API Apps • LifeLabs Apps • Loyalty Program Apps • Etc. VPN Patient / Customer Data Governance SMEs SIR DLD RX Technology Kroll Reflex RX Fillware Compliance Cube AssysteRx PharmaClick RX Applied Robotics Ubik Data Warehouses GCP E- commerce RelayHealth Hub SAP BeWell Diem
  • 20. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration Orchestrated,Test and Monitor Orchestrate • Both Infrastructure as code and data pipeline code with single pipeline • Composer (GCP), Airflow, Azure Data Factory (Azure), DBT, DataOps.live, Informatica, Mattilion, Stitch, AWS Data Pipeline Monitor • Cloud Resources • GCP Monitoring, CloudWatch, Azure Monitor, Datadog • Data pipelines • Respective tools, native cloud monitoring dashboards • Data Quality • ETL tools, manual tools on top of data platforms Test • At the end of the pipeline run • DBT, DataOps.live, Google Dataform, Boomi, Informatica, Matillion, Great Expectations, TSQLT
  • 21. 21 From ETL to ELTP Extract Load Transform Publish Extract Transform Load Extract Load Transform Publish Benefits of ELT over ETL: • non-destructive updates • improved stability and recoverability “Publish” step signals that data is available and ready for downstream subscribers, may involve shipping a copy of the data into the data lake, replicating to multiple redshift clusters, populating BI models, or similar actions.
  • 22. 22 At the core of DataOps is your organization’s information architecture • How well you know your data? • Do you trust your data? • Are you able to quickly detect errors? • Can you make changes incrementally without “breaking” your entire data pipeline? Critical areas below can transform your data pipeline: • Data Curation services • Metadata Management • Data Governance • Master Data Management • Self-Service interaction