SlideShare a Scribd company logo
Analyzing Semi-Structured
Data at Volume in the Cloud
Kevin Bair
Solution Architect
Kevin.Bair@snowflake.net
Topics this presentation will cover
1.  Structured vs. Semi-Structured
2.  ETL / Data Pipeline Architecture
3.  Analytics on Semi-Structured Data
Clickstream Demo
4.  Analyzing Structured with Semi-Structured Data
Twitter feed Demo
5.  Time permitting…..Cloud Big Data / Data Warehousing
2
3
Surge in cloud
spending and
supporting
technology
(IDC)
Of workloads will
be processed In
cloud data centers
(Cisco)
Data in the cloud today is
expected to grow in the
next two years.
(Gigaom)
Today’s data: big, complex, moving to cloud
Structured data and Semi-
Structured data
•  Transactional data
•  Relational
•  Fixed schema
•  OLTP / OLAP
•  Machine-generated
•  Non-relational
•  Varying schema
•  Most common in cloud
environments
What does Semi Structured
mean?
•  Data that may be of any type
•  Data that is variable in length (arrays)
•  Structure that can rapidly and unpredictably
change
•  Usually Self Describing
•  Examples
•  XML
•  AVRO
•  JSON
XML Example
<?xml version="1.0" encoding="UTF-8"?> 
 

<breakfast_menu> 


<food> 


<name>Belgian Waffles</name> 


<price>$5.95</price> 


<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>



<calories>650</calories> 


</food> 


<food> 


<name>Strawberry Belgian Waffles</name> 


<price>$7.95</price> 


<description>Light Belgian waffles covered with strawberries and whipped cream</description>



<calories>900</calories> 


</food> 


<food> 


<name>Berry-Berry Belgian Waffles</name> 


<price>$8.95</price> 


<description>Light Belgian waffles covered with an assortment of fresh berries and
whipped cream</description> 


<calories>900</calories> 


</food> 

</breakfast_menu>
JSON Example
{	
  
	
  	
  	
  	
  "custkey":	
  "450002",	
  
	
  	
  	
  	
  "useragent":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "devicetype":	
  "pc",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "experience":	
  "browser",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "platform":	
  "windows"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "pagetype":	
  "home",	
  
	
  	
  	
  	
  "productline":	
  "television",	
  
	
  	
  	
  	
  "customerprofile":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "age":	
  20,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "gender":	
  "male",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "customerinterests":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "movies",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "fashion",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "music"	
  
	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  }	
  
}	
  
Avro Example
Schema
Data
}
}
JSON
Binary
Why is this so hard for a
traditional Relational DBMS?
•  Pre-defined Schema
•  Store in Character Large Object (CLOB) data
type
•  Constantly Changing
•  Inefficient to Query
Data Warehousing
•  Complex: manage hardware, data
distribution, indexes, …
•  Limited elasticity: forklift upgrades,
data redistribution, downtime
•  Costly: overprovisioning, significant
care & feeding
Hadoop
•  Complex: specialized skills, new tools
•  Limited elasticity: data
redistribution, resource contention
•  Not a data warehouse: batch-
oriented, limited optimization,
incomplete security
Current architectures can’t keep up
10
Source
Website
Logs
Operational
Systems
External
Providers
Stream
Data
Stage
S3
•  10TB
Data
Lake
Hadoop
•  30 TB
Stage
S3
•  5 TB
•  Summary
EDW
MPP
•  10 TB Disk
Data Pipeline / Data Lake Architecture – “ETL”
ETL vs ELT for Big Data
•  Think more strategically about file formats, size,
storage methods, standards
•  Processing Power – Tools vs “Services”
•  Pipeline – Where should the analysis occur?
•  Platform
•  Unlimited Processing Power
•  Contention for resources
•  Support SQL for both Schema-on-write and Schema-
on-read with full “indexing” for Structured / Semi-
Structured
•  Compress, Clone metadata, don’t replicate…
Source
Website Logs
Operational
Systems
External
Providers
Stream Data
Stage
S3
• 10TB
EDW
Snowflake
• 2 TB Disk
Data Pipeline / Snowflake Architecture – “ELT”
Demo Scenarios
•  Clickstream Analysis (load JSON, multi-table insert)
•  Which Product Category is most clicked on?
•  Which Product line does the customer self identify as
having the most interest in?
•  Twitter Feed (Join Structured and Semi-Structured)
•  From our twitter campaign, is there a correlation
between twitter volume and sales?
Clickstream Example
{	
  
	
  	
  	
  	
  "custkey":	
  "450002",	
  
	
  	
  	
  	
  "useragent":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "devicetype":	
  "pc",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "experience":	
  "browser",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "platform":	
  "windows"	
  
	
  	
  	
  	
  },	
  
	
  	
  	
  	
  "pagetype":	
  "home",	
  
	
  	
  	
  	
  "productline":	
  "none",	
  
	
  	
  	
  	
  "customerprofile":	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  "age":	
  20,	
  
	
  	
  	
  	
  	
  	
  	
  	
  "gender":	
  "male",	
  
	
  	
  	
  	
  	
  	
  	
  	
  "customerinterests":	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "movies",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "fashion",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "music"	
  
	
  	
  	
  	
  	
  	
  	
  	
  ]	
  
	
  	
  	
  	
  }	
  
}	
  
Relational Processing of
Semi-Structured Data
16
1. Variant data type compresses storage of semi-
structured data
2. Data is analyzed during load to discern repetitive
attributes within the hierarchy
3. Repetitive attributes are columnar compressed
and statistics are collected for relational query
optimization
4. SQL extensions enable relational queries against
both semi-structured and structured data
FLATTEN() in Snowflake SQL
(Removing one level of nesting)
SELECT S.fullrow:fullName, t.value:name, t.value:age
FROM json_data_table as S, TABLE(FLATTEN(S.fullrow,'children')) t
WHERE s.fullrow:fullName = 'Mike Jones’
AND t.value:age::integer > 6 ;
FLATTEN() Converts a repeated field into a set of rows:
What makes Snowflake unique for
handling Semi-Structured Data?
•  Compression
•  Encryption / Role Based Authentication
•  Shredding
•  History/Results
•  Clone
•  Time Travel
•  Flatten
•  Regexp
•  No Contention
•  No Tuning
•  Infinitely scalable
•  SQL based with extremely high performance
z
Map-Reduce Jobs
One Platform for all Business Data
Data Sink
Structured
Storage
HDFS
Relational
Databases
Snowflake
ü One System
ü One Common Skillset
ü Faster/Less Costly Data Conversion
ü For both Structured and Semi-
Structured Business Data
Apple 101.12 250 FIH-2316
Pear 56.22 202 IHO-6912
Orange 98.21 600 WHQ-6090
Structured data
{ "firstName": "John",
"lastName": "Smith",
"height_cm": 167.64,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
Semi-structured data
Other Systems
ü Multiple Systems
ü Specialized Skillset
ü Slower/More Costly Data Conversion
THANK YOU!

More Related Content

PDF
Changing the game with cloud dw
PPTX
Demystifying Data Warehouse as a Service
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PDF
Actionable Insights with AI - Snowflake for Data Science
PDF
Delivering rapid-fire Analytics with Snowflake and Tableau
PPTX
Snowflake Automated Deployments / CI/CD Pipelines
PDF
Demystifying Data Warehousing as a Service - DFW
PPTX
A 30 day plan to start ending your data struggle with Snowflake
Changing the game with cloud dw
Demystifying Data Warehouse as a Service
Introducing the Snowflake Computing Cloud Data Warehouse
Actionable Insights with AI - Snowflake for Data Science
Delivering rapid-fire Analytics with Snowflake and Tableau
Snowflake Automated Deployments / CI/CD Pipelines
Demystifying Data Warehousing as a Service - DFW
A 30 day plan to start ending your data struggle with Snowflake

What's hot (17)

PDF
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
PPTX
Intro to Data Vault 2.0 on Snowflake
PPTX
Elastic Data Warehousing
PDF
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
PDF
Sydney: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cloud
PDF
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
PDF
SLC Snowflake User Group - Mar 12, 2020
PPTX
Snowflake essentials
PPTX
Launching a Data Platform on Snowflake
PDF
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
PDF
Data Mesh for Dinner
PPTX
Snowflake Overview
PPTX
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
PPTX
SQL vs NoSQL
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
PDF
KSnow: Getting started with Snowflake
PDF
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Intro to Data Vault 2.0 on Snowflake
Elastic Data Warehousing
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
Sydney: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cloud
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
SLC Snowflake User Group - Mar 12, 2020
Snowflake essentials
Launching a Data Platform on Snowflake
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Data Mesh for Dinner
Snowflake Overview
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
SQL vs NoSQL
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
KSnow: Getting started with Snowflake
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Ad

Viewers also liked (14)

PDF
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
PDF
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
PPTX
NT sf-bjq
PDF
The Snowflake Effect: open learning without barriers
PPT
Difference between data warehouse and data mining
PDF
eCloud newspapers
PPTX
Introduction Of Artificial neural network
PPTX
Distributed blood bank management system database
PPTX
Neural networks
PDF
Data Warehousing 2016
PPTX
Artificial intelligence NEURAL NETWORKS
PPTX
Neural network & its applications
PDF
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
PDF
Amazon.com Business Model
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
NT sf-bjq
The Snowflake Effect: open learning without barriers
Difference between data warehouse and data mining
eCloud newspapers
Introduction Of Artificial neural network
Distributed blood bank management system database
Neural networks
Data Warehousing 2016
Artificial intelligence NEURAL NETWORKS
Neural network & its applications
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
Amazon.com Business Model
Ad

Similar to Analyzing Semi-Structured Data At Volume In The Cloud (20)

PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PPTX
Sharing a Startup’s Big Data Lessons
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
Nisha talagala keynote_inflow_2016
PDF
Prague data management meetup 2018-03-27
PPT
Database Management System Processing.ppt
PDF
Modèles de données et langages de description ouverts 6 - 2021-2022
PPTX
Data modeling trends for analytics
PPTX
Middle Tier Scalability - Present and Future
PPTX
Lecture1
PPT
Big data.ppt
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Taming the shrew, Optimizing Power BI Options
PDF
So You Want to Build a Data Lake?
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
PDF
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
PPTX
Dw 07032018-dr pl pradhan
PPTX
Build a modern data platform.pptx
PPTX
Databricks Platform.pptx
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Sharing a Startup’s Big Data Lessons
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Nisha talagala keynote_inflow_2016
Prague data management meetup 2018-03-27
Database Management System Processing.ppt
Modèles de données et langages de description ouverts 6 - 2021-2022
Data modeling trends for analytics
Middle Tier Scalability - Present and Future
Lecture1
Big data.ppt
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Taming the shrew, Optimizing Power BI Options
So You Want to Build a Data Lake?
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Dw 07032018-dr pl pradhan
Build a modern data platform.pptx
Databricks Platform.pptx

More from Robert Dempsey (20)

PDF
Building A Production-Level Machine Learning Pipeline
PDF
Using PySpark to Process Boat Loads of Data
PDF
Practical Predictive Modeling in Python
PDF
Creating Your First Predictive Model In Python
PDF
Growth Hacking 101
PPTX
Web Scraping With Python
PPTX
DC Python Intro Slides - Rob's Version
PDF
Content Marketing Strategy for 2013
PDF
Creating Lead-Generating Social Media Campaigns
PDF
Goal Writing Workshop
PDF
Google AdWords Introduction
PDF
20 Tips For Freelance Success
PDF
How To Turn Your Business Into A Media Powerhouse
PDF
Agile Teams as Innovation Teams
PDF
Introduction to kanban
PDF
Get The **** Up And Market
PDF
Introduction To Inbound Marketing
PDF
Writing Agile Requirements
PDF
Twitter For Business
PDF
Introduction To Scrum For Managers
Building A Production-Level Machine Learning Pipeline
Using PySpark to Process Boat Loads of Data
Practical Predictive Modeling in Python
Creating Your First Predictive Model In Python
Growth Hacking 101
Web Scraping With Python
DC Python Intro Slides - Rob's Version
Content Marketing Strategy for 2013
Creating Lead-Generating Social Media Campaigns
Goal Writing Workshop
Google AdWords Introduction
20 Tips For Freelance Success
How To Turn Your Business Into A Media Powerhouse
Agile Teams as Innovation Teams
Introduction to kanban
Get The **** Up And Market
Introduction To Inbound Marketing
Writing Agile Requirements
Twitter For Business
Introduction To Scrum For Managers

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mushroom cultivation and it's methods.pdf
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
Assigned Numbers - 2025 - Bluetooth® Document
Mushroom cultivation and it's methods.pdf
1. Introduction to Computer Programming.pptx
Programs and apps: productivity, graphics, security and other tools
Accuracy of neural networks in brain wave diagnosis of schizophrenia
OMC Textile Division Presentation 2021.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
SOPHOS-XG Firewall Administrator PPT.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A Presentation on Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Group 1 Presentation -Planning and Decision Making .pptx
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars

Analyzing Semi-Structured Data At Volume In The Cloud

  • 1. Analyzing Semi-Structured Data at Volume in the Cloud Kevin Bair Solution Architect [email protected]
  • 2. Topics this presentation will cover 1.  Structured vs. Semi-Structured 2.  ETL / Data Pipeline Architecture 3.  Analytics on Semi-Structured Data Clickstream Demo 4.  Analyzing Structured with Semi-Structured Data Twitter feed Demo 5.  Time permitting…..Cloud Big Data / Data Warehousing 2
  • 3. 3 Surge in cloud spending and supporting technology (IDC) Of workloads will be processed In cloud data centers (Cisco) Data in the cloud today is expected to grow in the next two years. (Gigaom) Today’s data: big, complex, moving to cloud
  • 4. Structured data and Semi- Structured data •  Transactional data •  Relational •  Fixed schema •  OLTP / OLAP •  Machine-generated •  Non-relational •  Varying schema •  Most common in cloud environments
  • 5. What does Semi Structured mean? •  Data that may be of any type •  Data that is variable in length (arrays) •  Structure that can rapidly and unpredictably change •  Usually Self Describing •  Examples •  XML •  AVRO •  JSON
  • 6. XML Example <?xml version="1.0" encoding="UTF-8"?> <breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description> <calories>650</calories> </food> <food> <name>Strawberry Belgian Waffles</name> <price>$7.95</price> <description>Light Belgian waffles covered with strawberries and whipped cream</description> <calories>900</calories> </food> <food> <name>Berry-Berry Belgian Waffles</name> <price>$8.95</price> <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description> <calories>900</calories> </food> </breakfast_menu>
  • 7. JSON Example {          "custkey":  "450002",          "useragent":  {                  "devicetype":  "pc",                  "experience":  "browser",                  "platform":  "windows"          },          "pagetype":  "home",          "productline":  "television",          "customerprofile":  {                  "age":  20,                  "gender":  "male",                  "customerinterests":  [                          "movies",                          "fashion",                          "music"                  ]          }   }  
  • 9. Why is this so hard for a traditional Relational DBMS? •  Pre-defined Schema •  Store in Character Large Object (CLOB) data type •  Constantly Changing •  Inefficient to Query
  • 10. Data Warehousing •  Complex: manage hardware, data distribution, indexes, … •  Limited elasticity: forklift upgrades, data redistribution, downtime •  Costly: overprovisioning, significant care & feeding Hadoop •  Complex: specialized skills, new tools •  Limited elasticity: data redistribution, resource contention •  Not a data warehouse: batch- oriented, limited optimization, incomplete security Current architectures can’t keep up 10
  • 11. Source Website Logs Operational Systems External Providers Stream Data Stage S3 •  10TB Data Lake Hadoop •  30 TB Stage S3 •  5 TB •  Summary EDW MPP •  10 TB Disk Data Pipeline / Data Lake Architecture – “ETL”
  • 12. ETL vs ELT for Big Data •  Think more strategically about file formats, size, storage methods, standards •  Processing Power – Tools vs “Services” •  Pipeline – Where should the analysis occur? •  Platform •  Unlimited Processing Power •  Contention for resources •  Support SQL for both Schema-on-write and Schema- on-read with full “indexing” for Structured / Semi- Structured •  Compress, Clone metadata, don’t replicate…
  • 14. Demo Scenarios •  Clickstream Analysis (load JSON, multi-table insert) •  Which Product Category is most clicked on? •  Which Product line does the customer self identify as having the most interest in? •  Twitter Feed (Join Structured and Semi-Structured) •  From our twitter campaign, is there a correlation between twitter volume and sales?
  • 15. Clickstream Example {          "custkey":  "450002",          "useragent":  {                  "devicetype":  "pc",                  "experience":  "browser",                  "platform":  "windows"          },          "pagetype":  "home",          "productline":  "none",          "customerprofile":  {                  "age":  20,                  "gender":  "male",                  "customerinterests":  [                          "movies",                          "fashion",                          "music"                  ]          }   }  
  • 16. Relational Processing of Semi-Structured Data 16 1. Variant data type compresses storage of semi- structured data 2. Data is analyzed during load to discern repetitive attributes within the hierarchy 3. Repetitive attributes are columnar compressed and statistics are collected for relational query optimization 4. SQL extensions enable relational queries against both semi-structured and structured data
  • 17. FLATTEN() in Snowflake SQL (Removing one level of nesting) SELECT S.fullrow:fullName, t.value:name, t.value:age FROM json_data_table as S, TABLE(FLATTEN(S.fullrow,'children')) t WHERE s.fullrow:fullName = 'Mike Jones’ AND t.value:age::integer > 6 ; FLATTEN() Converts a repeated field into a set of rows:
  • 18. What makes Snowflake unique for handling Semi-Structured Data? •  Compression •  Encryption / Role Based Authentication •  Shredding •  History/Results •  Clone •  Time Travel •  Flatten •  Regexp •  No Contention •  No Tuning •  Infinitely scalable •  SQL based with extremely high performance
  • 19. z Map-Reduce Jobs One Platform for all Business Data Data Sink Structured Storage HDFS Relational Databases Snowflake ü One System ü One Common Skillset ü Faster/Less Costly Data Conversion ü For both Structured and Semi- Structured Business Data Apple 101.12 250 FIH-2316 Pear 56.22 202 IHO-6912 Orange 98.21 600 WHQ-6090 Structured data { "firstName": "John", "lastName": "Smith", "height_cm": 167.64, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, Semi-structured data Other Systems ü Multiple Systems ü Specialized Skillset ü Slower/More Costly Data Conversion