SlideShare a Scribd company logo
Rakesh Suresh
Jainik Vora
Transforming Data Processing with
Kubernetes: Journey Towards a
Self-Serve Data Mesh
Nov 6, 2023
© 2023 Intuit Inc. All rights reserved. 2
Speaker Introduction
Jainik Vora
Sr. Staff Software Engineer
Rakesh Suresh
Sr. Staff Software Engineer
jainiksvora
rakeshsuresh
3
© 2023 Intuit Inc. All rights reserved.
Agenda
Intuit
About Intuit and its mission
Data Mesh
What is Data Mesh and the problems it addresses
Data Lake & Data Mesh
How Intuit implements data mesh with a real example
Intuit’s Data Mesh Concepts
Foundational concepts defined for Data Mesh
Self-Serve Data Processing on Kubernetes
Architecture of Batch and Stream Processing Platform
©2022 Intuit Inc. All rights reserved. 4
100%
services on
Modern SaaS
65B
machine learning
predictions per day
24k
financial institutions
[+50 crypto]
$560B
money moved
3.6B
requests during
peak season (no
customer failures)
Data
Integration
Fintech
Infrastructure
Identity
AI
Infrastructure
Modern Dev
Experience
AI-driven expert platform
Intuit is leading the way in building an AI-native development platform using cloud native open source
technology. We’re committed to building tools that scale and giving back to the open source community.
©2022 Intuit Inc. All rights reserved.
We believe in open source
and open collaboration
bit.ly/intuit-oss
Created, open-sourced,
used, and maintained
by Intuit
Recipient of the
End User Award
in 2019 & 2022
End user of Cloud
Native and mobile
open source tech
© 2023 Intuit Inc. All rights reserved. 6
Data Mesh
© 2023 Intuit Inc. All rights reserved. 7
What is Data Mesh?
A data mesh is a decentralized data
architecture that organizes data by a
specific business domain.
Instead of data acting as a
by-product of a process, it becomes
the product, where data producers
act as data product owners.
© 2023 Intuit Inc. All rights reserved. 10
Why Data Mesh?
Improve value
of Data
Smart Product
Experiences
using Data
Power AI
Power Generative AI Applications
like Intuit Assist
Reduce time
to discover &
access Data
Serve variety
of Data
Personas
© 2023 Intuit Inc. All rights reserved. 11
Data Mesh Principles
Lorem ipsum
congue
Data Mesh
Domain
Driven
Ownership
Data
Product
Data Access
Self Serve
Infrastructure
© 2023 Intuit Inc. All rights reserved. 12
Data Lake & Data
Mesh
Small Business Owner has unpaid
Invoices
The small business owner logins to quickbooks and
realizes there are unpaid invoices from customers.
Options provided by the system:
◆ (A) The system reminds the owner about unpaid invoices
◆ (B) The system also offers an add-on feature that
automatically sends invoice reminders to customers
Small Biz & Quickbooks
© 2023 Intuit Inc. All rights reserved. 14
Invoice Business Technical Requirement
❖ Notification: Track &
Remind Business
Owner and their
Customer
❖ Get unpaid invoices
by Business
❖ Get unpaid invoices
for each Customer
grouped by
Business
© 2023 Intuit Inc. All rights reserved. 15
Traditional Data Lake Architecture
© 2023 Intuit Inc. All rights reserved. 16
Traditional Data Lake Architecture for Invoice
© 2023 Intuit Inc. All rights reserved. 17
Invoice is our Data Domain
© 2023 Intuit Inc. All rights reserved. 18
Lets ask some questions on Invoice Data..
◆ How do I find Invoice data for my use
case?
◆ Who is the domain expert for Invoice
data?
◆ What is the schema of the Invoice data?
◆ Where is Invoice data located for
consumption?
◆ How can I get access to Invoice data
and who can approve?
◆ Is there derived data from Invoice? How
do I derive data from Invoice?
© 2023 Intuit Inc. All rights reserved. 19
Data Mesh
Concepts for Intuit
© 2023 Intuit Inc. All rights reserved. 20
Organization & Discovery of Data
How do I find Invoice data for my use case?
Data Map
Organization of data using domain, sub-domain and
bounded context
© 2023 Intuit Inc. All rights reserved. 21
Organization & Discovery of Data
How do I find Invoice data for my use case?
Data Map
Organization of data using domain, sub-domain and
bounded context
Data Product
Foundational unit of data map, organized by data map
© 2023 Intuit Inc. All rights reserved. 22
Ownership of Data
Who is the domain expert for Invoice data?
Data Product
Consolidates essential information to enable data
consumers
Data Steward
Defines the data product and responsible for its contract
© 2023 Intuit Inc. All rights reserved. 23
Data Contract
What is the schema of the Invoice data?
Semantic Model
Consolidates essential modeling and schema
information enabling data consumers understand the
data
SLA
Defines the data product and responsible for its contract
like data quality, data freshness etc…
© 2023 Intuit Inc. All rights reserved. 24
Data Ports
Where is Invoice data located for consumption?
Data Assets
Provides location and medium through which data can
be consumed
Tag
Additional context for optimal discovery
© 2023 Intuit Inc. All rights reserved. 25
Access & Governance
How can I get access to Invoice data and who can approve?
Access Control List
Track explicit read and write access control
Access approved by Data Steward
© 2023 Intuit Inc. All rights reserved. 26
Data Concepts - Data Processing & Lineage
Is there derived data from Invoice? How do I derive data from Invoice?
© 2023 Intuit Inc. All rights reserved. 27
Self-Serve
Data Processing on
Kubernetes
© 2023 Intuit Inc. All rights reserved. 28
Scope of Data Processing At Intuit
Scale
2000+ users
100,000+ pipelines (batch
and streaming)
Variety of Users
Data Engineers
Data Scientists
Machine Learning Engineers
Data Analysts
Variety of Use Cases
Type
◆ Batch
◆ Streaming
Categories
◆ Model Training & Feature Gen
◆ Derivation & Enrichments
◆ Data Movement
© 2023 Intuit Inc. All rights reserved. 29
Self Serve Data Processing as Paved Path
Operate & Monitor
◆ Log forwarding & Metrics
reporting
◆ Alert & Notification
◆ DR Failover & Failback
Provision & Deploy
◆ Infrastructure provisioning
◆ Deployment of processing
artifacts
◆ Data Map Registration
◆ Lineage
Author & Define
◆ Authoring tools geared towards
user persona and expertise
◆ Access to input and output
◆ Scheduling & Orchestration
© 2023 Intuit Inc. All rights reserved. 30
Batch Processing Platform Architecture
© 2023 Intuit Inc. All rights reserved. 31
Stream Processing Platform Architecture
© 2023 Intuit Inc. All rights reserved. 32
Kubernetes Power Data Processing
Intuit Kubernetes Service
◆ Core Infrastructure Layer
◆ Runs
– Control plane APIs
– Processing jobs
Argo Workflow & Events
◆ Scheduling & Orchestration
◆ Deployment Workflow
© 2023 Intuit Inc. All rights reserved. 33
Learn more
Data Mesh
◆ Data Mesh Principles & Logical Architecture by Zhamak Dehghani
◆ Intuit’s Data Mesh Strategy
◆ Intuit’s Data Mesh Concepts
Data Processing
◆ How Intuit Built Stream Processing Platform with Flink
◆ Large Scale Batch Processing with Argo Workflow and Events
Q&A
© 2023 Intuit Inc. All rights reserved. 34

More Related Content

PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PDF
In-Network Distributed Analytics on Data-Centric IoT Network for BI-Service A...
PDF
What Is an IT Infrastructure_ Types and Components.pdf
PDF
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
PDF
Real Time Condition Monitoring with IoT.pdf
PPTX
Powering the Internet of Things with Apache Hadoop
PDF
FIWARE Tech Summit - FIWARE Overview and Description of GEs
PDF
Compliance and Zero Trust Ambient Mesh
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
In-Network Distributed Analytics on Data-Centric IoT Network for BI-Service A...
What Is an IT Infrastructure_ Types and Components.pdf
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
Real Time Condition Monitoring with IoT.pdf
Powering the Internet of Things with Apache Hadoop
FIWARE Tech Summit - FIWARE Overview and Description of GEs
Compliance and Zero Trust Ambient Mesh

Similar to Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh (20)

PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PDF
Building the Internet of Everything
PDF
Real Time Condition Monitoring with IoT.pdf
PPTX
SwiftAnt - Business Intelligence Presentation
PPTX
SwiftAnt - Business Intelligence Presentation
PDF
ML & Data Processing for Industrial IoT with InfluxDB
PDF
Rethinking the Database in the IoT Era
PDF
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
PDF
Effective IoT System on Openstack
PDF
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
PDF
A comprehensive guide on Data Engineering for IoT-1.pdf
PDF
Solving Manufacturing Challenges with Time Series Data.pdf
PDF
The Role of Cloud Providers in IoT Services
PPTX
[DSC DACH 24] Ship data faster with dbt - Sean McIntyre
PPTX
2016 Cloud Unbound Briefing
PPT
intra-mart Accel series 2024 Spring updates_En
PPTX
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
PDF
What is InitVerse.pdf
PPTX
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Building the Internet of Everything
Real Time Condition Monitoring with IoT.pdf
SwiftAnt - Business Intelligence Presentation
SwiftAnt - Business Intelligence Presentation
ML & Data Processing for Industrial IoT with InfluxDB
Rethinking the Database in the IoT Era
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
Effective IoT System on Openstack
IRJET- Integration of Cloud Computing and Big Data for Detecting the Black Mo...
A comprehensive guide on Data Engineering for IoT-1.pdf
Solving Manufacturing Challenges with Time Series Data.pdf
The Role of Cloud Providers in IoT Services
[DSC DACH 24] Ship data faster with dbt - Sean McIntyre
2016 Cloud Unbound Briefing
intra-mart Accel series 2024 Spring updates_En
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
What is InitVerse.pdf
11-Module-4 Opportunities and Challenges, Architectures for convergence,Data ...
Ad

More from DoKC (20)

PDF
Distributed Vector Databases - What, Why, and How
PDF
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
PDF
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
PDF
The State of Stateful on Kubernetes
PDF
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
PDF
Make Your Kafka Cluster Production-Ready
PDF
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
PDF
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
PDF
The Kubernetes Native Database
PDF
ING Data Services hosted on ICHP DoK Amsterdam 2023
PDF
Implementing data and databases on K8s within the Dutch government
PDF
StatefulSets in K8s - DoK Talks #154
PDF
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
PDF
Analytics with Apache Superset and ClickHouse - DoK Talks #151
PPTX
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
PDF
Evaluating Cloud Native Storage Vendors - DoK Talks #147
PDF
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
PDF
We will Dok You! - The journey to adopt stateful workloads on k8s
PPTX
Mastering MongoDB on Kubernetes, the power of operators
PDF
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Distributed Vector Databases - What, Why, and How
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
The State of Stateful on Kubernetes
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Make Your Kafka Cluster Production-Ready
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
The Kubernetes Native Database
ING Data Services hosted on ICHP DoK Amsterdam 2023
Implementing data and databases on K8s within the Dutch government
StatefulSets in K8s - DoK Talks #154
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
We will Dok You! - The journey to adopt stateful workloads on k8s
Mastering MongoDB on Kubernetes, the power of operators
Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Develo...
Ad

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Modernizing your data center with Dell and AMD
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
KodekX | Application Modernization Development
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
NewMind AI Monthly Chronicles - July 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
madgavkar20181017ppt McKinsey Presentation.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
Modernizing your data center with Dell and AMD
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Transforming Manufacturing operations through Intelligent Integrations
KodekX | Application Modernization Development
Advanced Soft Computing BINUS July 2025.pdf
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
NewMind AI Monthly Chronicles - July 2025

Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh

  • 1. Rakesh Suresh Jainik Vora Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Data Mesh Nov 6, 2023
  • 2. © 2023 Intuit Inc. All rights reserved. 2 Speaker Introduction Jainik Vora Sr. Staff Software Engineer Rakesh Suresh Sr. Staff Software Engineer jainiksvora rakeshsuresh
  • 3. 3 © 2023 Intuit Inc. All rights reserved. Agenda Intuit About Intuit and its mission Data Mesh What is Data Mesh and the problems it addresses Data Lake & Data Mesh How Intuit implements data mesh with a real example Intuit’s Data Mesh Concepts Foundational concepts defined for Data Mesh Self-Serve Data Processing on Kubernetes Architecture of Batch and Stream Processing Platform
  • 4. ©2022 Intuit Inc. All rights reserved. 4 100% services on Modern SaaS 65B machine learning predictions per day 24k financial institutions [+50 crypto] $560B money moved 3.6B requests during peak season (no customer failures) Data Integration Fintech Infrastructure Identity AI Infrastructure Modern Dev Experience AI-driven expert platform Intuit is leading the way in building an AI-native development platform using cloud native open source technology. We’re committed to building tools that scale and giving back to the open source community.
  • 5. ©2022 Intuit Inc. All rights reserved. We believe in open source and open collaboration bit.ly/intuit-oss Created, open-sourced, used, and maintained by Intuit Recipient of the End User Award in 2019 & 2022 End user of Cloud Native and mobile open source tech
  • 6. © 2023 Intuit Inc. All rights reserved. 6 Data Mesh
  • 7. © 2023 Intuit Inc. All rights reserved. 7 What is Data Mesh?
  • 8. A data mesh is a decentralized data architecture that organizes data by a specific business domain.
  • 9. Instead of data acting as a by-product of a process, it becomes the product, where data producers act as data product owners.
  • 10. © 2023 Intuit Inc. All rights reserved. 10 Why Data Mesh? Improve value of Data Smart Product Experiences using Data Power AI Power Generative AI Applications like Intuit Assist Reduce time to discover & access Data Serve variety of Data Personas
  • 11. © 2023 Intuit Inc. All rights reserved. 11 Data Mesh Principles Lorem ipsum congue Data Mesh Domain Driven Ownership Data Product Data Access Self Serve Infrastructure
  • 12. © 2023 Intuit Inc. All rights reserved. 12 Data Lake & Data Mesh
  • 13. Small Business Owner has unpaid Invoices The small business owner logins to quickbooks and realizes there are unpaid invoices from customers. Options provided by the system: ◆ (A) The system reminds the owner about unpaid invoices ◆ (B) The system also offers an add-on feature that automatically sends invoice reminders to customers Small Biz & Quickbooks
  • 14. © 2023 Intuit Inc. All rights reserved. 14 Invoice Business Technical Requirement ❖ Notification: Track & Remind Business Owner and their Customer ❖ Get unpaid invoices by Business ❖ Get unpaid invoices for each Customer grouped by Business
  • 15. © 2023 Intuit Inc. All rights reserved. 15 Traditional Data Lake Architecture
  • 16. © 2023 Intuit Inc. All rights reserved. 16 Traditional Data Lake Architecture for Invoice
  • 17. © 2023 Intuit Inc. All rights reserved. 17 Invoice is our Data Domain
  • 18. © 2023 Intuit Inc. All rights reserved. 18 Lets ask some questions on Invoice Data.. ◆ How do I find Invoice data for my use case? ◆ Who is the domain expert for Invoice data? ◆ What is the schema of the Invoice data? ◆ Where is Invoice data located for consumption? ◆ How can I get access to Invoice data and who can approve? ◆ Is there derived data from Invoice? How do I derive data from Invoice?
  • 19. © 2023 Intuit Inc. All rights reserved. 19 Data Mesh Concepts for Intuit
  • 20. © 2023 Intuit Inc. All rights reserved. 20 Organization & Discovery of Data How do I find Invoice data for my use case? Data Map Organization of data using domain, sub-domain and bounded context
  • 21. © 2023 Intuit Inc. All rights reserved. 21 Organization & Discovery of Data How do I find Invoice data for my use case? Data Map Organization of data using domain, sub-domain and bounded context Data Product Foundational unit of data map, organized by data map
  • 22. © 2023 Intuit Inc. All rights reserved. 22 Ownership of Data Who is the domain expert for Invoice data? Data Product Consolidates essential information to enable data consumers Data Steward Defines the data product and responsible for its contract
  • 23. © 2023 Intuit Inc. All rights reserved. 23 Data Contract What is the schema of the Invoice data? Semantic Model Consolidates essential modeling and schema information enabling data consumers understand the data SLA Defines the data product and responsible for its contract like data quality, data freshness etc…
  • 24. © 2023 Intuit Inc. All rights reserved. 24 Data Ports Where is Invoice data located for consumption? Data Assets Provides location and medium through which data can be consumed Tag Additional context for optimal discovery
  • 25. © 2023 Intuit Inc. All rights reserved. 25 Access & Governance How can I get access to Invoice data and who can approve? Access Control List Track explicit read and write access control Access approved by Data Steward
  • 26. © 2023 Intuit Inc. All rights reserved. 26 Data Concepts - Data Processing & Lineage Is there derived data from Invoice? How do I derive data from Invoice?
  • 27. © 2023 Intuit Inc. All rights reserved. 27 Self-Serve Data Processing on Kubernetes
  • 28. © 2023 Intuit Inc. All rights reserved. 28 Scope of Data Processing At Intuit Scale 2000+ users 100,000+ pipelines (batch and streaming) Variety of Users Data Engineers Data Scientists Machine Learning Engineers Data Analysts Variety of Use Cases Type ◆ Batch ◆ Streaming Categories ◆ Model Training & Feature Gen ◆ Derivation & Enrichments ◆ Data Movement
  • 29. © 2023 Intuit Inc. All rights reserved. 29 Self Serve Data Processing as Paved Path Operate & Monitor ◆ Log forwarding & Metrics reporting ◆ Alert & Notification ◆ DR Failover & Failback Provision & Deploy ◆ Infrastructure provisioning ◆ Deployment of processing artifacts ◆ Data Map Registration ◆ Lineage Author & Define ◆ Authoring tools geared towards user persona and expertise ◆ Access to input and output ◆ Scheduling & Orchestration
  • 30. © 2023 Intuit Inc. All rights reserved. 30 Batch Processing Platform Architecture
  • 31. © 2023 Intuit Inc. All rights reserved. 31 Stream Processing Platform Architecture
  • 32. © 2023 Intuit Inc. All rights reserved. 32 Kubernetes Power Data Processing Intuit Kubernetes Service ◆ Core Infrastructure Layer ◆ Runs – Control plane APIs – Processing jobs Argo Workflow & Events ◆ Scheduling & Orchestration ◆ Deployment Workflow
  • 33. © 2023 Intuit Inc. All rights reserved. 33 Learn more Data Mesh ◆ Data Mesh Principles & Logical Architecture by Zhamak Dehghani ◆ Intuit’s Data Mesh Strategy ◆ Intuit’s Data Mesh Concepts Data Processing ◆ How Intuit Built Stream Processing Platform with Flink ◆ Large Scale Batch Processing with Argo Workflow and Events
  • 34. Q&A © 2023 Intuit Inc. All rights reserved. 34