SlideShare a Scribd company logo
BIG DATA MODELING
Hans Hultgren
RMDC Fall 2016
Welcome
• Big Data1
• Data Modeling2
• Big Data Modeling3
AGENDA
Session Objectives
• Big Data Fundamentals
– Components of Big Data
– Structure & Schemas
– Tools & Architecture
• Data Modeling
– Integration & History
– Data Warehousing & BI
– Conceptual to Physical
• Big Data Modeling
– Focus on Meaning
• Ensemble Modeling
– The Blended Architecture
BIG DATA
Big Data
“Huge” Data Volumes
n-Structured & Very Complex
Streaming & Shape-Shifting
Typical Data
v v
v v
v v
v v
Typical Data Big Data
A
B
C
Big Data
• Volume
Huge Volumes of Data
• Velocity
Drinking from a Fire Hose
• Variety
n-Structured Data
• Veracity
Quality, Accuracy, Reliability, Trustworthiness
• Value
Business Value and Value Potential
Big Data Architecture
• To deal with the features of Big Data,
supporting architectural components are
based on:
–Data distribution, and
–Late Binding of Schemas
KVP
Modeling and Understanding
• Schema on Write
• Schema on Read
• Dismantled Schema on Write
• Schema on Focus
• Schema on Leverage
9
LOAD
MODEL APPLY
EXPLORE
Modeling and Understanding
• Big Data
Possibilities
10
LOAD
MODEL APPLY
EXPLORE
Inconvenient Truth about BIG DATA
http://community.embarcadero.com/blogs/entry/the-hidden-elephant-in-big-data-modeling
DATA MODELING
Data Modeling
Mans Search for Meaning…
• Conceptual Modeling
• Logical Modeling
• Information Modeling
• Physical Data Modeling
Ensemble Modeling™
14
All the parts of a thing taken together, so that
each part is considered only in relation to the whole.
• The constellation of component parts acts as a whole.
• With Ensemble Modeling the Core Business Concepts that we define and
model are represented as a whole – an ensemble – including all of the
component parts. An Ensemble is typically based on all things defining a
Core Business Concept that can be uniquely and specifically said for one
instance of that Concept.
E M F
Forms of Modeling & Ensemble
15
Ensemble
Anchor Focal Point Data Vault
DV2.02G
Hyper Agility
Temporal
6NF, etc.
Matter
EDW
Data
Mart
Data
Mart
Data
Mart
ERP
Acctg
Sales
3NF Dimensional
E M F
The Data Vault Ensemble
16
• The Data Vault Ensemble conforms to a single key – embodied
in the Hub construct.
• The component parts for the Data Vault Ensemble include:
– Hub The Natural Business Key
– Link The Natural Business Relationships
– Satellite All Context, Descriptive Data and History
Ensemble means thinking differently
17
Customer
Customer
• The minimal construct then for an “entity”
such as “Customer” is now (in data vault) a
Hub with a set of Satellites
Applying data vault modeling pattern
18
Data Vault Ensemble Modeling Process
1) Identify and Model the Core Business Concepts
• Business Interviews is at the heart of this step
What do you do? What are the main things you work with?
• Find best/target Natural Business Key
19
Data Vault Ensemble Modeling Process
2) Identify and Model the Natural Business Relationships
• Specific Unique Relationships
• Be considerate of the Unit of Work and Grain
20
Data Vault Ensemble Modeling Process
3) Analyze and Design the Context Satellites
• Consider Rate of Change, Type of Data
and also the Sources
21
BIG DATA
MODELING
Logical business model
• Leveraged for all logical
model needs including
the data warehouse, big
data lake, master data
management (MDM) and
operational integration
initiatives
• Closely aligned to DV
physical model
Ensemble Logical Form ( )
23
Customer
Region Store
Sale
Vendor
Product
Sale LI
Employee
Customer
Region
Store
Sale
Vendor
Product
Sale LI
Employee
Customer
Region
Store
Sale
Vendor
Product
Sale LI
Employee
Ensemble Logical Form
24
Customer
Region
Store
Sale
Vendor
Product
Sale LI
Employee
ELF Modeling maintained in:
* Metadata
* Logical Data Model
* Data Modeling Tools
* Virtual Schemas
* Other Tools or Artifacts
Map to Context Data stored in:
* JSON Docs
* XML (w/ XSD or Not)
* Blobs (Free Form Text)
* Big Data Platforms
* Hadoop
* In the Cloud
Three Paths for Modeling
Structured / Known
• CBC
• NBR
• Attribution
• Columns
Results in a backbone
model with attributes
in defined columns
N-Structured / NVP
• CBC
• NBR
• Attribution
Results in a backbone
modes with
known/expected
attribute names/tags
N-Structured / KVP
• CBC
• NBR
Results in a backbone
model with capacity
to capture unknown
attribution either
named/tagged or not
APPLYING THE ENSEMBLE
Integration
across
Platforms
Expanded Applications
Customer
Region
Store
Sale
Vendor
Product
Sale LI
Employee
Summary
Ensemble in the Big Data World
• Conceptual Modeling
• Logical Modeling
• Information Modeling
• Physical Data Modeling
• Integration Platform
+
+
+
-
+ + +
Links and Information
CDVDM Training & Certification
www.GeneseeAcademy.com
gohansgo
Hans@GeneseeAcademy.com
HansHultgren.WordPress.com
HansHultgren
Online, On-Demand Video Lessons
DataVaultAcademy.com
DataVaultAcademy
29
e-Book: Book:
ModelingtheAgile DataWarehousewithDataVault ModelingtheAgile DataWarehousewithDataVault

More Related Content

PPTX
Quantum Computing.pptx
PPTX
Azure Synapse Analytics Overview (r2)
PPTX
Agile Data Engineering - Intro to Data Vault Modeling (2016)
PPTX
Data Vault and DW2.0
PPTX
Data Vault Overview
PPTX
Introduction to Redis
PDF
Relational vs Non Relational Databases
PDF
SAP PM Plant Maintenance Overview
Quantum Computing.pptx
Azure Synapse Analytics Overview (r2)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Data Vault and DW2.0
Data Vault Overview
Introduction to Redis
Relational vs Non Relational Databases
SAP PM Plant Maintenance Overview

What's hot (20)

PPTX
Relational and non relational database 7
PPTX
Intro to Data Vault 2.0 on Snowflake
PPTX
Data mining , Knowledge Discovery Process, Classification
PPTX
Hadoop Distributed File System
PPTX
Data Mining Technique - CRISP-DM
PDF
Data Modeling for Big Data
PDF
Let’s get to know Snowflake
PDF
PPTX
OLAP & DATA WAREHOUSE
PPT
Data Warehousing and Data Mining
PPTX
An Overview of Apache Cassandra
PPT
Hive(ppt)
PPTX
A brief history of "big data"
PDF
Data modeling for the business
PPT
Data warehousing and online analytical processing
PPTX
Dbms architecture
PDF
Intro to Neo4j and Graph Databases
PDF
Big Data
PDF
Data Mesh for Dinner
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
Relational and non relational database 7
Intro to Data Vault 2.0 on Snowflake
Data mining , Knowledge Discovery Process, Classification
Hadoop Distributed File System
Data Mining Technique - CRISP-DM
Data Modeling for Big Data
Let’s get to know Snowflake
OLAP & DATA WAREHOUSE
Data Warehousing and Data Mining
An Overview of Apache Cassandra
Hive(ppt)
A brief history of "big data"
Data modeling for the business
Data warehousing and online analytical processing
Dbms architecture
Intro to Neo4j and Graph Databases
Big Data
Data Mesh for Dinner
Introducing the Snowflake Computing Cloud Data Warehouse
Ad

Similar to Big Data Modeling (20)

PDF
Business Intelligence and Multidimensional Database
PPT
OLAP Cubes in Datawarehousing
PDF
Data Warehouse approaches with Dynamics AX
PDF
Data Vault Introduction
PPT
Data Warehousing, Data Mining & Data Visualisation
PPTX
Lesson 3 - The Kimbal Lifecycle.pptx
PPTX
Module 1.2: Data Warehousing Fundamentals.pptx
PPTX
L’architettura di classe enterprise di nuova generazione
PDF
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
PPTX
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
PDF
The final frontier v3
PDF
The final frontier
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
PPTX
Unit 2- Data Warehouse Logical Design.pptx
PDF
Data Warehouse Logical Design Guide
PDF
Application Middleware Overview
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PPTX
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
PPTX
How to Survive as a Data Architect in a Polyglot Database World
PDF
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Business Intelligence and Multidimensional Database
OLAP Cubes in Datawarehousing
Data Warehouse approaches with Dynamics AX
Data Vault Introduction
Data Warehousing, Data Mining & Data Visualisation
Lesson 3 - The Kimbal Lifecycle.pptx
Module 1.2: Data Warehousing Fundamentals.pptx
L’architettura di classe enterprise di nuova generazione
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
The final frontier v3
The final frontier
L’architettura di Classe Enterprise di Nuova Generazione
Unit 2- Data Warehouse Logical Design.pptx
Data Warehouse Logical Design Guide
Application Middleware Overview
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
How to Survive as a Data Architect in a Polyglot Database World
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
annual-report-2024-2025 original latest.
PDF
Lecture1 pattern recognition............
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Transcultural that can help you someday.
PDF
Introduction to Data Science and Data Analysis
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
oil_refinery_comprehensive_20250804084928 (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Analytics and business intelligence.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Clinical guidelines as a resource for EBP(1).pdf
annual-report-2024-2025 original latest.
Lecture1 pattern recognition............
Qualitative Qantitative and Mixed Methods.pptx
Transcultural that can help you someday.
Introduction to Data Science and Data Analysis
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Big Data Modeling

  • 1. BIG DATA MODELING Hans Hultgren RMDC Fall 2016
  • 3. • Big Data1 • Data Modeling2 • Big Data Modeling3 AGENDA
  • 4. Session Objectives • Big Data Fundamentals – Components of Big Data – Structure & Schemas – Tools & Architecture • Data Modeling – Integration & History – Data Warehousing & BI – Conceptual to Physical • Big Data Modeling – Focus on Meaning • Ensemble Modeling – The Blended Architecture
  • 6. Big Data “Huge” Data Volumes n-Structured & Very Complex Streaming & Shape-Shifting Typical Data v v v v v v v v Typical Data Big Data A B C
  • 7. Big Data • Volume Huge Volumes of Data • Velocity Drinking from a Fire Hose • Variety n-Structured Data • Veracity Quality, Accuracy, Reliability, Trustworthiness • Value Business Value and Value Potential
  • 8. Big Data Architecture • To deal with the features of Big Data, supporting architectural components are based on: –Data distribution, and –Late Binding of Schemas KVP
  • 9. Modeling and Understanding • Schema on Write • Schema on Read • Dismantled Schema on Write • Schema on Focus • Schema on Leverage 9 LOAD MODEL APPLY EXPLORE
  • 10. Modeling and Understanding • Big Data Possibilities 10 LOAD MODEL APPLY EXPLORE
  • 11. Inconvenient Truth about BIG DATA http://community.embarcadero.com/blogs/entry/the-hidden-elephant-in-big-data-modeling
  • 13. Data Modeling Mans Search for Meaning… • Conceptual Modeling • Logical Modeling • Information Modeling • Physical Data Modeling
  • 14. Ensemble Modeling™ 14 All the parts of a thing taken together, so that each part is considered only in relation to the whole. • The constellation of component parts acts as a whole. • With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts. An Ensemble is typically based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept. E M F
  • 15. Forms of Modeling & Ensemble 15 Ensemble Anchor Focal Point Data Vault DV2.02G Hyper Agility Temporal 6NF, etc. Matter EDW Data Mart Data Mart Data Mart ERP Acctg Sales 3NF Dimensional E M F
  • 16. The Data Vault Ensemble 16 • The Data Vault Ensemble conforms to a single key – embodied in the Hub construct. • The component parts for the Data Vault Ensemble include: – Hub The Natural Business Key – Link The Natural Business Relationships – Satellite All Context, Descriptive Data and History
  • 17. Ensemble means thinking differently 17 Customer Customer • The minimal construct then for an “entity” such as “Customer” is now (in data vault) a Hub with a set of Satellites
  • 18. Applying data vault modeling pattern 18
  • 19. Data Vault Ensemble Modeling Process 1) Identify and Model the Core Business Concepts • Business Interviews is at the heart of this step What do you do? What are the main things you work with? • Find best/target Natural Business Key 19
  • 20. Data Vault Ensemble Modeling Process 2) Identify and Model the Natural Business Relationships • Specific Unique Relationships • Be considerate of the Unit of Work and Grain 20
  • 21. Data Vault Ensemble Modeling Process 3) Analyze and Design the Context Satellites • Consider Rate of Change, Type of Data and also the Sources 21
  • 23. Logical business model • Leveraged for all logical model needs including the data warehouse, big data lake, master data management (MDM) and operational integration initiatives • Closely aligned to DV physical model Ensemble Logical Form ( ) 23 Customer Region Store Sale Vendor Product Sale LI Employee Customer Region Store Sale Vendor Product Sale LI Employee Customer Region Store Sale Vendor Product Sale LI Employee
  • 24. Ensemble Logical Form 24 Customer Region Store Sale Vendor Product Sale LI Employee ELF Modeling maintained in: * Metadata * Logical Data Model * Data Modeling Tools * Virtual Schemas * Other Tools or Artifacts Map to Context Data stored in: * JSON Docs * XML (w/ XSD or Not) * Blobs (Free Form Text) * Big Data Platforms * Hadoop * In the Cloud
  • 25. Three Paths for Modeling Structured / Known • CBC • NBR • Attribution • Columns Results in a backbone model with attributes in defined columns N-Structured / NVP • CBC • NBR • Attribution Results in a backbone modes with known/expected attribute names/tags N-Structured / KVP • CBC • NBR Results in a backbone model with capacity to capture unknown attribution either named/tagged or not
  • 28. Summary Ensemble in the Big Data World • Conceptual Modeling • Logical Modeling • Information Modeling • Physical Data Modeling • Integration Platform + + + - + + +
  • 29. Links and Information CDVDM Training & Certification www.GeneseeAcademy.com gohansgo [email protected] HansHultgren.WordPress.com HansHultgren Online, On-Demand Video Lessons DataVaultAcademy.com DataVaultAcademy 29 e-Book: Book: ModelingtheAgile DataWarehousewithDataVault ModelingtheAgile DataWarehousewithDataVault