SlideShare a Scribd company logo
Nathan Bijnens
Manager, Belux CSU Data Team
Data Mesh in Azure
using Cloud Scale Analytics
What we’ve heard
To spend less time preparing
data
Robust data governance
Platform to actionable Insights to
the business
Ability to increase the value of
hidden data
Improve Operational Efficiency
Ideally, organizations
want to have…..
Reduce cost of data engineering
Need for Frictionless
Data Governance
Difficult to balance
access and data
protection
Data and Analytics
Operationalization
Enable Lines of Businesses
Poor data quality
Disparate systems
and data silos
Too slow moving
from data to decision
Barriers
to
achieve
business
outcomes
Unified ecosystem
Project prioritization
Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.
We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration
Business Drivers
•Lack of data
ownership
Lack of data quality
Difficult to see
interdependencies
Model conflicts
across business
concerns
Tremendous effort
for integration and
coordination leads
to bypasses
Business and IT
work in silos
Disconnect
between the data
producers and data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
technical
dependencies
Small changes
become risky due
to unexpected
consequences
Technical
ownership rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.
Problems with Existing Architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities into one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized Architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations to create harmonized data
• Central platform serves as large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers
Transformational Trends in the Data Landscape
Massive increase of computing power, driven
by hardware innovation (SSD storage, in-
memory storage, GPU advances) lets us move
data to compute faster.
Cloud and APIs make it easier to integrate.
Software & Platform as a Service (SaaS, PaaS)
offerings push the connectivity and API usage
even further.
Explosion of tools
New (open source) concepts are introduced,
such as NoSQL database types, block chain,
new database designs, distributed models
(Hadoop), new analytical methods, etc.
Exponential growth of data, especially external
data sources like open and social data.
Internal, external, structured, and unstructured
data are all used to deliver additional insights.
Eco-system connectivity
Exponential growth of data
Increase of computing power
Stronger regulatory requirements, such as
GDPR and BCBS 239, are coming into effect
worldwide. Data quality and lineage become
more important every day.
Increased regulatory attention
The read/write ratio has changed due to more
intensive data consumption: data is read more
often, there is increased real-time consumption
and more searches are performed.
Increase of read/write ratio
Data as a Product
Data as a Product
Data is no
longer a
side-effect,
it’s a product.
Who are my
"customers"?
What do my
"customers"
need?
Are they
happy with
the data? Are
they using it?
How do I let
my
"customers"
know my
data exists?
What is in it
for the
"customer"?
Data Product Owner
Domain
Data
Product
Owner
Data
Engineer
Software
Developer
Infra
Engineer
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Data Product Properties
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
• Overview of product in central data catalog
• Provide easy discoverability
Discoverable
• Help users access the product
programmatically
Addressable
• Data Product Owners provide monitored SLOs
• Data is cleansed and up to standard
Trustworthy
• Minimal friction for data engineers and
scientists to use the data
Self-describing
• Open standards for harmonization
• Field type formatting
Interoperable
• Access control policies
• Use SSO and RBAC
Secure
Data Mesh
Data Mesh
Data Mesh is a new decentralized
socio-technical approach to
managing data, designed to work
with organizational complexity and
continuous growth. It enables large
organizations to get value from their
data, at scale, through reusability,
analytics and ML. It is building on the
Domain Driven Design methodology.
Data
Mesh
Domain
Driven
Design
Domain
Zones
Data
Products
Consumed
by other
Domains
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Centralized Implementation is not working!
GSR
Finance
HR
Travel
Sales
Clinical Ops
Centralized Platform
LOBs are the SMEs and Shared
Service team is not able to cope
up with the projects
Datasets sprawls
Competing needs within the
organization
• IT needs to standardize
• LOBs need to implement analytics
Primitive Data Strategy
Introduction to Data Domains
Search
Keywords
Promotions
Top
Selling
Products
Orders
Customer
Profiles
Data Products
Integration
Services
Operational
Systems
Marketing
Domain
Customer Services
Domain
Order Management
Domain
• A domain is a collection of people, typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains.
• Ensure data is accessible, usable, available, and meets the quality criteria defined.
• Evolve data products based on user feedback and retire data products when they become irrelevant.
Domain Zones
Engineering
Finance HR Innovation
Program 1 Operations
Management zone
Data products
Data Domains
Microsoft Enterprise Data Mesh
Domain Zone
Domain Zone
Environment for each LOB
LOBs: Implement Data Services
• ex: Exploration Service, Data Order System
LOBs: Build and Share Data Products
• ex: Sales Forecast, Clean Room Performance
Automated using templates
• security, integration, monitoring, etc
E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Licensing & Usage
Observation & Monitoring
Domain Architecture
Shift towards Domain Ownership
A new type of eco-system architecture, which shifts to the left towards a modern distributed
architecture that enables domain-specific data and data products, empowering each domain to handle
their own data pipelines.
Supporting governance and domain-agnostic platform infrastructure
Data Providers Data Product
Data Providers Data Product
Data Providers Data Product
Source-oriented
Domains
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumption-oriented
Domains
Domain Zones
Data Products
Domain Zone
HR
Recruitment
Time Tracking
Employee Value
And Performance
Training and
Development
Engagement and
Retention
Engineering Operations
New Project :
Digital Twin
Clean Room
Personnel
• Map your data domains organically, during the onboarding of data
providers and consumers.
• Reference your business capabilities (e.g., strategy and processes) while
mapping your data domains.
• Isolate your data domains and enable communication through data
products like APIs or events.
• Create and document a shared, ubiquitous language that different domains
can use to communicate.
• Determine boundaries for both business and technical granularity.
Data Domain Considerations from the Field
Enterprise Scale for
Analytics
Cloud-scale Analytics Framework
Enterprise Scale: Azure Landing Zones
The main purpose of a “Landing Zone”
is to ensure that when a workload
lands on Azure, the required
“plumbing” is already in place,
providing greater agility and
compliance with enterprise security
and governance requirements.
Data Management Landing Zone
Data Management Landing Zone
Business Glossary
Data Discovery
SLAs Business Rules Ref. Data Mgmt.
Master Record Mgmt.
Data Policy
Access Governance
Loss Prevention
Privacy Operations
Risk Assessment
Repository for Data
Models
Integration
API Documentation
Automation for provisioning landing zones, data
integrations, and products
Pre-configured network and monitoring setup Standard images for deploying analytics and AI services
Azure Subscription Azure Policy
Data Landing Zone
Core
Networking Shared
Products
Ingest and
Processing
Upload
Data Lake
Services
Metadata
Services
Preconfigured
network and
monitoring setup
Data lake configured
with layers and
connectivity
Spark and
scheduling
engines
Blobs where 3rd parties
can upload their data
Scanners for data
governance/metadata
required by landing
zone
Analytics engines for
exploratory analytics
Data
Integration
Data
Integration #
Data Integration Teams are responsible for the ingestion of data to a
read data source. The data shouldn’t have any data transformation
applied apart from data quality checks and data type verification.
Data
Integration #
Pull SAP Data into
Landing Zone #
Streaming interface
to pull data from
heat sensors
Data
Products
Data Product
#
Data Product #
Financial Reporting
pulling Customers and
Sales together
Streaming Machine
Data from Read Data
Source
Data products fulfil a specific business need using data. Data products
manage, organize, and make sense of data across domains and
present the insights gained from the data products.
A data product is a result from one or many data integrations and/or
other data products.
Infrastructure
as
Code
Azure Event
Hubs
Azure Data
Lake Store Gen2
Storing read-optimized
domain data
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data
Product
Team Data Onboarding Team
Data Integration
Synapse
Analytics
Data
Product
Team
Data
Product
Team
Data
Product
Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data
Engineering
Team
Data Management
Landing Zone
Data Governance
Team
Azure Purview
Data Lake Services
Azure Data
Factory
Transforming into read-
optimized data products
Data Integration
Data Integration
Data Landing Zone
Azure Databricks
Shared Service
Data-driven
applications
Data Product
Data Product
Example Reference Architecture for Data Mesh in a Small Company
Data Product
Optimize Existing Implementation Patterns
Take a new approach to data management that supports and evolves with your strategy.
The data management and analytics scenario supports a range of patterns to
build on your current data infrastructure, to help you modernize and scale from where you are.
Data Warehouse Data Lake Data Lakehouse Data Mesh
Data Fabric
Integrating your DWH in a Data Mesh
 From be-all end-all to yet another Data Product in your mesh
 Ownership based on your preference
 DWH is data product on its own: managed by one data product team
 DWH serves as "wrapper" for multiple data products: managed by multiple teams
 DWH consumes data from multiple Data Products
 Multiple Data Products consume data from DWH
Agile Data Management
Enforce data governance and security.

Serve data as a product rather than a byproduct.

Provide an ecosystem of data products.

Create data domains to serve lines of business.

Empower teams to drive analytics solutions that deliver value to the business.

Modernize your teams and operations.

Prepare your company to:
Multi Organization
Data Mesh
Contoso
Managem
ent zone
Data products
Data Domains
Multi Organization Data Mesh
Finance
HR
Organization
Contoso
Managem
ent zone
Finance
HR
Contoso
Managem
ent zone
Finance
HR
Interested in
learning more?
Reach out to
Nathan.Bijnens@microsoft.com
Links
DDD
Best Practice - An Introduction To Domain-Driven Design | Microsoft Docs
Introduction into Domain-Driven Design (DDD) (jannikwempe.com)
IBM Automation Event-Driven Reference Architecture – Domain Driven Design (ibm-
cloud-architecture.github.io)
Data Mesh
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
(martinfowler.com)
Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes
Beyond the Data Lake - Databricks

More Related Content

PDF
[XConf Brasil 2020] Data mesh
PDF
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
PDF
How to govern and secure a Data Mesh?
PDF
Evolution from EDA to Data Mesh: Data in Motion
PPTX
Data Mesh using Microsoft Fabric
PDF
Time to Talk about Data Mesh
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
[XConf Brasil 2020] Data mesh
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
How to govern and secure a Data Mesh?
Evolution from EDA to Data Mesh: Data in Motion
Data Mesh using Microsoft Fabric
Time to Talk about Data Mesh
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021

What's hot (20)

PDF
PDF
Data Mesh Part 4 Monolith to Mesh
PDF
Data Mesh for Dinner
PPTX
Data Lakehouse Symposium | Day 4
PDF
Modernizing to a Cloud Data Architecture
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PDF
Five Things to Consider About Data Mesh and Data Governance
PPTX
Free Training: How to Build a Lakehouse
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PPTX
Modernize & Automate Analytics Data Pipelines
PDF
Getting Started with Delta Lake on Databricks
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Azure Synapse Analytics Overview (r2)
PDF
Improving Data Literacy Around Data Architecture
PDF
The ABCs of Treating Data as Product
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
DW Migration Webinar-March 2022.pptx
PDF
Introduction SQL Analytics on Lakehouse Architecture
PDF
Owning Your Own (Data) Lake House
Data Mesh Part 4 Monolith to Mesh
Data Mesh for Dinner
Data Lakehouse Symposium | Day 4
Modernizing to a Cloud Data Architecture
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Five Things to Consider About Data Mesh and Data Governance
Free Training: How to Build a Lakehouse
Data Architecture Strategies: Data Architecture for Digital Transformation
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Modernize & Automate Analytics Data Pipelines
Getting Started with Delta Lake on Databricks
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Azure Synapse Analytics Overview (r2)
Improving Data Literacy Around Data Architecture
The ABCs of Treating Data as Product
Introducing the Snowflake Computing Cloud Data Warehouse
DW Migration Webinar-March 2022.pptx
Introduction SQL Analytics on Lakehouse Architecture
Owning Your Own (Data) Lake House
Ad

Similar to Data Mesh in Azure using Cloud Scale Analytics (WAF) (20)

PDF
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
PDF
Data Mesh at CMC Markets: Past, Present and Future
PDF
Data Mesh Delivering Datadriven Value At Scale 3rd Edition Zhamak Dehghani
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PDF
Agile Mumbai 27-28th Sep 2024 | Tailoring Datamesh Principles for Organizatio...
PDF
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
PDF
pwc-data-mesh.pdf
PPTX
Building the enterprise data architecture
PPTX
ANIn Pune July 2024 | Bootstrapping Data Mesh for a Complex Enterprise by Bal...
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PDF
Data Con LA 2022 - Self-Service Success and Data Products
PPTX
Data Domain-Driven Design
PPTX
Data Mesh Implementation - a practical journey
PDF
Data Mesh in Action (MEAP V04) Jacek Majchrzak
PDF
BD_Architecture and Charateristics.pptx.pdf
PDF
data-mesh_whitepaper_dec2021.pdf
PPTX
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
PPTX
DataPlatform.pptx
PDF
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh Delivering Datadriven Value At Scale 3rd Edition Zhamak Dehghani
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Agile Mumbai 27-28th Sep 2024 | Tailoring Datamesh Principles for Organizatio...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
pwc-data-mesh.pdf
Building the enterprise data architecture
ANIn Pune July 2024 | Bootstrapping Data Mesh for a Complex Enterprise by Bal...
Enabling a Data Mesh Architecture with Data Virtualization
Data Con LA 2022 - Self-Service Success and Data Products
Data Domain-Driven Design
Data Mesh Implementation - a practical journey
Data Mesh in Action (MEAP V04) Jacek Majchrzak
BD_Architecture and Charateristics.pptx.pdf
data-mesh_whitepaper_dec2021.pdf
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
DataPlatform.pptx
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Ad

More from Nathan Bijnens (19)

PPTX
AI Revolution unleashed with AI Foundry at AI Tour Brussels
PPTX
AI Agents, such as Autogen at Tide Sprint
PPTX
Large Language Models vs Small Language Models
PPTX
Dataminds - ML in Production
PPTX
Azure Databricks & Spark @ Techorama 2018
PPTX
Big Data Expo '18 - Microsoft AI
PPTX
Spark on Azure, a gentle introduction (nov 2015)
PPTX
Cloudera, Azure and Big Data at Cloudera Meetup '17
PPTX
Microsoft AI at SAI '17
PPTX
Microsoft Advanced Analytics @ Data Science Ghent '16
PDF
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
PDF
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
PPTX
a real-time architecture using Hadoop and Storm at Devoxx
PDF
A real-time architecture using Hadoop and Storm @ JAX London
PDF
A real-time architecture using Hadoop and Storm @ BigData.be
PDF
Microsoft Big Data @ SQLUG 2013
PDF
A real time architecture using Hadoop and Storm @ FOSDEM 2013
PPTX
Getting more out of your big data
PDF
Hadoop Pig: MapReduce the easy way!
AI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Agents, such as Autogen at Tide Sprint
Large Language Models vs Small Language Models
Dataminds - ML in Production
Azure Databricks & Spark @ Techorama 2018
Big Data Expo '18 - Microsoft AI
Spark on Azure, a gentle introduction (nov 2015)
Cloudera, Azure and Big Data at Cloudera Meetup '17
Microsoft AI at SAI '17
Microsoft Advanced Analytics @ Data Science Ghent '16
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
a real-time architecture using Hadoop and Storm at Devoxx
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ BigData.be
Microsoft Big Data @ SQLUG 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Getting more out of your big data
Hadoop Pig: MapReduce the easy way!

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Business Analytics and business intelligence.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Leprosy and NLEP programme community medicine
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
.pdf is not working space design for the following data for the following dat...
Introduction-to-Cloud-ComputingFinal.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Analytics and business intelligence.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Leprosy and NLEP programme community medicine
SAP 2 completion done . PRESENTATION.pptx
modul_python (1).pptx for professional and student
Optimise Shopper Experiences with a Strong Data Estate.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx

Data Mesh in Azure using Cloud Scale Analytics (WAF)

  • 1. Nathan Bijnens Manager, Belux CSU Data Team Data Mesh in Azure using Cloud Scale Analytics
  • 2. What we’ve heard To spend less time preparing data Robust data governance Platform to actionable Insights to the business Ability to increase the value of hidden data Improve Operational Efficiency Ideally, organizations want to have….. Reduce cost of data engineering Need for Frictionless Data Governance Difficult to balance access and data protection Data and Analytics Operationalization Enable Lines of Businesses Poor data quality Disparate systems and data silos Too slow moving from data to decision Barriers to achieve business outcomes Unified ecosystem Project prioritization
  • 3. Every application that creates data, needs and will have a database Application A Application B Consequently, when we have two applications, we hypothesize that each application has its own ‘database’. When there is interoperability between these two applications, we expect data to be transferred from one application to the other. Every application, at least in the context of data management, that creates data, needs and will have a database. Even stateless applications that create data have “databases”. In these scenarios the database typically sits in the RAM or in a temp file.
  • 4. We can’t escape from data integration Application A Application B The ‘always’ required data transformation lies in the fact that an application database schema is designed to meet the application’s specific requirements. Since the requirements differ from application to application, the schemas are expected to be different and data integration is always required when moving data around. A crucial aspect when it comes to data transfer is that data integration is always right around the corner. Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data integration* dilemma. Data integration
  • 5. Business Drivers •Lack of data ownership Lack of data quality Difficult to see interdependencies Model conflicts across business concerns Tremendous effort for integration and coordination leads to bypasses Business and IT work in silos Disconnect between the data producers and data consumers Central team becomes the bottleneck Difficult to apply policy and governance Hard to see technical dependencies Small changes become risky due to unexpected consequences Technical ownership rather than data ownership Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi- disciplinary organizations.
  • 6. Problems with Existing Architectures There’s a deep assumption that centralization is the solution to data management. This includes centralizing all data and management activities into one central team, building one data platform, using one ETL framework, using one canonical model, etc. Transactional Sources Analytical Consumers Centralized Architecture • Single team with centralized knowledge and book of work • Centralized pipelines for all extraction / ingestion activities • Centralized transformations to create harmonized data • Central platform serves as large integration database: all execution and analysis is done on the same platform Data providers Data consumers Central engineering team Transactional Sources Transactional Sources Analytical Consumers Analytical Consumers
  • 7. Transformational Trends in the Data Landscape Massive increase of computing power, driven by hardware innovation (SSD storage, in- memory storage, GPU advances) lets us move data to compute faster. Cloud and APIs make it easier to integrate. Software & Platform as a Service (SaaS, PaaS) offerings push the connectivity and API usage even further. Explosion of tools New (open source) concepts are introduced, such as NoSQL database types, block chain, new database designs, distributed models (Hadoop), new analytical methods, etc. Exponential growth of data, especially external data sources like open and social data. Internal, external, structured, and unstructured data are all used to deliver additional insights. Eco-system connectivity Exponential growth of data Increase of computing power Stronger regulatory requirements, such as GDPR and BCBS 239, are coming into effect worldwide. Data quality and lineage become more important every day. Increased regulatory attention The read/write ratio has changed due to more intensive data consumption: data is read more often, there is increased real-time consumption and more searches are performed. Increase of read/write ratio
  • 8. Data as a Product
  • 9. Data as a Product Data is no longer a side-effect, it’s a product. Who are my "customers"? What do my "customers" need? Are they happy with the data? Are they using it? How do I let my "customers" know my data exists? What is in it for the "customer"?
  • 10. Data Product Owner Domain Data Product Owner Data Engineer Software Developer Infra Engineer How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 11. Data Product Properties How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani • Overview of product in central data catalog • Provide easy discoverability Discoverable • Help users access the product programmatically Addressable • Data Product Owners provide monitored SLOs • Data is cleansed and up to standard Trustworthy • Minimal friction for data engineers and scientists to use the data Self-describing • Open standards for harmonization • Field type formatting Interoperable • Access control policies • Use SSO and RBAC Secure
  • 13. Data Mesh Data Mesh is a new decentralized socio-technical approach to managing data, designed to work with organizational complexity and continuous growth. It enables large organizations to get value from their data, at scale, through reusability, analytics and ML. It is building on the Domain Driven Design methodology. Data Mesh Domain Driven Design Domain Zones Data Products Consumed by other Domains How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 14. Centralized Implementation is not working! GSR Finance HR Travel Sales Clinical Ops Centralized Platform LOBs are the SMEs and Shared Service team is not able to cope up with the projects Datasets sprawls Competing needs within the organization • IT needs to standardize • LOBs need to implement analytics Primitive Data Strategy
  • 15. Introduction to Data Domains Search Keywords Promotions Top Selling Products Orders Customer Profiles Data Products Integration Services Operational Systems Marketing Domain Customer Services Domain Order Management Domain • A domain is a collection of people, typically organized around a common business purpose. • Create and serve data products to other domains and end users, independently from other domains. • Ensure data is accessible, usable, available, and meets the quality criteria defined. • Evolve data products based on user feedback and retire data products when they become irrelevant.
  • 16. Domain Zones Engineering Finance HR Innovation Program 1 Operations Management zone Data products Data Domains Microsoft Enterprise Data Mesh
  • 17. Domain Zone Domain Zone Environment for each LOB LOBs: Implement Data Services • ex: Exploration Service, Data Order System LOBs: Build and Share Data Products • ex: Sales Forecast, Clean Room Performance Automated using templates • security, integration, monitoring, etc
  • 18. E N T E R P R I S E R E Q U I R E M E N T S Security & Privacy Governance & Compliance Availability & Recovery Performance & Scalability Skills & Training Licensing & Usage Observation & Monitoring Domain Architecture
  • 19. Shift towards Domain Ownership A new type of eco-system architecture, which shifts to the left towards a modern distributed architecture that enables domain-specific data and data products, empowering each domain to handle their own data pipelines. Supporting governance and domain-agnostic platform infrastructure Data Providers Data Product Data Providers Data Product Data Providers Data Product Source-oriented Domains Consumer- specific Transformation Data Consumer Consumer- specific Transformation Data Consumer Consumer- specific Transformation Data Consumer Consumption-oriented Domains
  • 20. Domain Zones Data Products Domain Zone HR Recruitment Time Tracking Employee Value And Performance Training and Development Engagement and Retention Engineering Operations New Project : Digital Twin Clean Room Personnel
  • 21. • Map your data domains organically, during the onboarding of data providers and consumers. • Reference your business capabilities (e.g., strategy and processes) while mapping your data domains. • Isolate your data domains and enable communication through data products like APIs or events. • Create and document a shared, ubiquitous language that different domains can use to communicate. • Determine boundaries for both business and technical granularity. Data Domain Considerations from the Field
  • 23. Enterprise Scale: Azure Landing Zones The main purpose of a “Landing Zone” is to ensure that when a workload lands on Azure, the required “plumbing” is already in place, providing greater agility and compliance with enterprise security and governance requirements.
  • 24. Data Management Landing Zone Data Management Landing Zone Business Glossary Data Discovery SLAs Business Rules Ref. Data Mgmt. Master Record Mgmt. Data Policy Access Governance Loss Prevention Privacy Operations Risk Assessment Repository for Data Models Integration API Documentation Automation for provisioning landing zones, data integrations, and products Pre-configured network and monitoring setup Standard images for deploying analytics and AI services Azure Subscription Azure Policy
  • 25. Data Landing Zone Core Networking Shared Products Ingest and Processing Upload Data Lake Services Metadata Services Preconfigured network and monitoring setup Data lake configured with layers and connectivity Spark and scheduling engines Blobs where 3rd parties can upload their data Scanners for data governance/metadata required by landing zone Analytics engines for exploratory analytics Data Integration Data Integration # Data Integration Teams are responsible for the ingestion of data to a read data source. The data shouldn’t have any data transformation applied apart from data quality checks and data type verification. Data Integration # Pull SAP Data into Landing Zone # Streaming interface to pull data from heat sensors Data Products Data Product # Data Product # Financial Reporting pulling Customers and Sales together Streaming Machine Data from Read Data Source Data products fulfil a specific business need using data. Data products manage, organize, and make sense of data across domains and present the insights gained from the data products. A data product is a result from one or many data integrations and/or other data products. Infrastructure as Code
  • 26. Azure Event Hubs Azure Data Lake Store Gen2 Storing read-optimized domain data Data Product Team Data Product Team Data Product Team Data Product Team Data Onboarding Team Data Integration Synapse Analytics Data Product Team Data Product Team Data Product Team Real-time applications, operational systems Self-service BI, semantic models Analytical applications Data Engineering Team Data Management Landing Zone Data Governance Team Azure Purview Data Lake Services Azure Data Factory Transforming into read- optimized data products Data Integration Data Integration Data Landing Zone Azure Databricks Shared Service Data-driven applications Data Product Data Product Example Reference Architecture for Data Mesh in a Small Company Data Product
  • 27. Optimize Existing Implementation Patterns Take a new approach to data management that supports and evolves with your strategy. The data management and analytics scenario supports a range of patterns to build on your current data infrastructure, to help you modernize and scale from where you are. Data Warehouse Data Lake Data Lakehouse Data Mesh Data Fabric
  • 28. Integrating your DWH in a Data Mesh  From be-all end-all to yet another Data Product in your mesh  Ownership based on your preference  DWH is data product on its own: managed by one data product team  DWH serves as "wrapper" for multiple data products: managed by multiple teams  DWH consumes data from multiple Data Products  Multiple Data Products consume data from DWH
  • 29. Agile Data Management Enforce data governance and security.  Serve data as a product rather than a byproduct.  Provide an ecosystem of data products.  Create data domains to serve lines of business.  Empower teams to drive analytics solutions that deliver value to the business.  Modernize your teams and operations.  Prepare your company to:
  • 31. Contoso Managem ent zone Data products Data Domains Multi Organization Data Mesh Finance HR Organization Contoso Managem ent zone Finance HR Contoso Managem ent zone Finance HR
  • 33. Links DDD Best Practice - An Introduction To Domain-Driven Design | Microsoft Docs Introduction into Domain-Driven Design (DDD) (jannikwempe.com) IBM Automation Event-Driven Reference Architecture – Domain Driven Design (ibm- cloud-architecture.github.io) Data Mesh How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes Beyond the Data Lake - Databricks