SlideShare a Scribd company logo
DEBUNKING THE MYTHS 
Speaker 10 of 17 
Martin Willcox 
@Willcoxmnk 
What is Data Lake, Anyway? 
Followed by 
Anthony Miller
One of the Big Data labels that we risk over-loading to 
complete abstraction is the idea of a "Data Lake”… 
2 © 2014 Teradata 
“…store all data present 
and future and create a 
centralised data archive 
location.” 
“A large 
object-based 
repository that 
holds data in 
its native 
format” 
“Sometimes 
called the bit 
bucket or the 
landing zone” 
“All Water 
and Little 
Substance” 
“As more and more applications 
are created that derive value 
from… new types of data… the 
Data Lake forms”
“Data lakes can 
help resolve the 
nagging problem of 
accessibility and 
data integration” 
…and some of the discussions sound eerily familiar 
3 © 2014 Teradata 
Data accessibility 
and integration? 
Isn’t that what the 
Data Warehouse is 
for?
So is the Data Lake a new architectural construct? 
4 © 2014 Teradata 
Or are we just re-platforming Data Marts? 
Simple, single subject area Dimensional 
Data Marts – with all of the dimensions 
pre-joined to the fact table? One-per-workload 
/ application? 
Is this really the future of Enterprise 
Analytics? Or circa 1995 silo, 
departmental Decision Support Systems 
warmed-over?
Take the merits of the different technologies out of the 
equation – and this is what some of us are thinking… 
5 © 2014 Teradata
…but there are no free lunches in Information 
Management – merely more and different options 
Explicit, or implicit, there 
is always, always, always 
(at least one) schema 
6 © 2014 Teradata 
Agile application 
development, versus 
agile data acquisition 
None of the information 
management 
strategies / technologies 
are magic - “pay me 
now, or pay me later”
7 © 2014 Teradata 
Big Data Are Plural 
For the foreseeable future, we will need multiple Information 
Management strategies - and multiple Information 
Management technologies 
DATA WAREHOUSE 
DISCOVERY PLATFORM 
Integration 
becomes a 
critical concern 
DATA 
PLATFORM 
– Gartner – 
Logical Data Warehouse 
– Forrester – 
Enterprise Data Hub 
– Teradata – 
Unified Data Architecture
8 © 2014 Teradata 
A definition of the Data Lake (Data Reservoir) 
A centralised, consolidated, persistent store of raw, un-modelled and un-transformed data from 
multiple sources / silos (without an explicit, pre-defined schema, without externally defined metadata – 
and without guarantees about the quality, provenance and security of the data) 
Agile data acquisition – 
a haystack to go looking 
for needles… 
…with a natural storage 
model for complex, 
multi-structured data… 
…support for efficient 
non-relational 
computation… 
Now that is new, interesting and (potentially) very, very useful… 
…and provision for cost-effective 
storage of large 
and noisy data-sets.
9 © 2014 Teradata 
Data. Science
does nature tend to give us a single, beautiful lake? Or a messy patchwork of lakes, plural? 
10 © 2014 Teradata 
Left to its own devices, 
STOP PRESS: Laws of Physics* Unchanged! 
(* More specifically, the 2nd Law of Thermodynamics) 
None of the new information management strategies and technologies is by itself a cure 
for information entropy – data silos form naturally, just like lakes form naturally
11 © 2014 Teradata 
Summary and conclusions

More Related Content

PDF
Mapping the road to better data storage strategies
PDF
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
PDF
Prague data management meetup 2016-01-12 pub
PPTX
Making big data work
PDF
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
PDF
A Glimpse into the Future of I/O
PDF
Denodo Global Cloud Survey 2020
PPTX
Attract The Best and Save Costs
Mapping the road to better data storage strategies
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Prague data management meetup 2016-01-12 pub
Making big data work
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
A Glimpse into the Future of I/O
Denodo Global Cloud Survey 2020
Attract The Best and Save Costs

What's hot (20)

PDF
Info qiy foundation digital me - dappre-eng-aug17
PDF
Modern Data Architecture
PDF
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
PDF
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
PDF
A Logical Architecture is Always a Flexible Architecture (ASEAN)
PDF
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
PPTX
Dell hans timmerman v1.1
PDF
A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...
PDF
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
PDF
Accelerate Cloud Modernization using Data Virtualization
PDF
Data Virtualization for Compliance – Creating a Controlled Data Environment
PDF
Vendor-Checklist
PDF
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
PDF
Atlantis company overview
PDF
TechEvent 2019: Provisioning of Data Platforms - Why, how, what; Martin Wunde...
PDF
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
PDF
Multi-Cloud-Datenintegration mit Datenvirtualisierung
PPTX
Study: #Big Data in #Austria
PDF
Data encryption-cloud
PDF
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Info qiy foundation digital me - dappre-eng-aug17
Modern Data Architecture
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Education Seminar: Self-service BI, Logical Data Warehouse and Data Lakes
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Dell hans timmerman v1.1
A "First Time Right" Start with Data Virtualization by Bart De Groeve, Practi...
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Accelerate Cloud Modernization using Data Virtualization
Data Virtualization for Compliance – Creating a Controlled Data Environment
Vendor-Checklist
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Atlantis company overview
TechEvent 2019: Provisioning of Data Platforms - Why, how, what; Martin Wunde...
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Study: #Big Data in #Austria
Data encryption-cloud
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Ad

Similar to Martin Willcox - What is a Data Lake, Anyway? (20)

PPTX
Data Virtualization – Gateway to a Digital Business - Barry Devlin
PDF
Data lakes
PPTX
Data lake ppt
PDF
Gerenral insurance Accounts IT and Investment
PDF
Big data data lake and beyond
PDF
Data Virtualization: An Introduction
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PDF
Enterprise Data Lake
PDF
Enterprise Data Lake - Scalable Digital
PDF
An Introduction to Data Virtualization in 2018
PDF
Data Virtualization: An Introduction
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PPTX
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
PPTX
Business intelligence-sharda-dss10-ppt-03-pptx.pptx
PDF
Data Virtualization: An Introduction
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Data Virtualization – Gateway to a Digital Business - Barry Devlin
Data lakes
Data lake ppt
Gerenral insurance Accounts IT and Investment
Big data data lake and beyond
Data Virtualization: An Introduction
Unlock Your Data for ML & AI using Data Virtualization
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Enterprise Data Lake
Enterprise Data Lake - Scalable Digital
An Introduction to Data Virtualization in 2018
Data Virtualization: An Introduction
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Business intelligence-sharda-dss10-ppt-03-pptx.pptx
Data Virtualization: An Introduction
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Ad

More from Saratoga (16)

PDF
Georgina Armstrong - Data Visualisations. Making Boring Data Exciting and Emp...
PDF
David Shorten - Artificial intelligence
PDF
Theo Priestley - Internet of Things - Forget the Numbers, Let's Talk Realities
PDF
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
PDF
Barry Devlin - The Myth of Data-Driven Business
PDF
Jeff Fletcher - Building a Hadoop based infrastructure as a service product a...
PDF
Anthony Miller - The second Half of the Chessboard: Thriving in a Time of Exp...
PDF
Marc Smith - Charting Collections of Connections in Social Media: Creating Ma...
PDF
Tristan Bergh - Predictive Analytics in Action: Real Business Results in Sout...
PDF
Gill Staniland - Interconnected BI - A systems thinking approach
PDF
Gary Hope - Machine Learning: It's Not as Hard as you Think
PDF
Jerry Chetty - Myth About Data Investigation
PDF
Mike McDougall - Business Intelligence - Perdition or Paradise
PDF
Mbwana Alliy - Big data from Silicon Valley to Africa
PDF
The art of visualising requirements
PPTX
Getting investment ready tech4 africa (zach)
Georgina Armstrong - Data Visualisations. Making Boring Data Exciting and Emp...
David Shorten - Artificial intelligence
Theo Priestley - Internet of Things - Forget the Numbers, Let's Talk Realities
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
Barry Devlin - The Myth of Data-Driven Business
Jeff Fletcher - Building a Hadoop based infrastructure as a service product a...
Anthony Miller - The second Half of the Chessboard: Thriving in a Time of Exp...
Marc Smith - Charting Collections of Connections in Social Media: Creating Ma...
Tristan Bergh - Predictive Analytics in Action: Real Business Results in Sout...
Gill Staniland - Interconnected BI - A systems thinking approach
Gary Hope - Machine Learning: It's Not as Hard as you Think
Jerry Chetty - Myth About Data Investigation
Mike McDougall - Business Intelligence - Perdition or Paradise
Mbwana Alliy - Big data from Silicon Valley to Africa
The art of visualising requirements
Getting investment ready tech4 africa (zach)

Recently uploaded (20)

PPTX
Modelling in Business Intelligence , information system
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
Modelling in Business Intelligence , information system
Optimise Shopper Experiences with a Strong Data Estate.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
A Complete Guide to Streamlining Business Processes
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Business Analytics and business intelligence.pdf
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STERILIZATION AND DISINFECTION-1.ppthhhbx
importance of Data-Visualization-in-Data-Science. for mba studnts
Acceptance and paychological effects of mandatory extra coach I classes.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Database Infoormation System (DBIS).pptx
Qualitative Qantitative and Mixed Methods.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx

Martin Willcox - What is a Data Lake, Anyway?

  • 1. DEBUNKING THE MYTHS Speaker 10 of 17 Martin Willcox @Willcoxmnk What is Data Lake, Anyway? Followed by Anthony Miller
  • 2. One of the Big Data labels that we risk over-loading to complete abstraction is the idea of a "Data Lake”… 2 © 2014 Teradata “…store all data present and future and create a centralised data archive location.” “A large object-based repository that holds data in its native format” “Sometimes called the bit bucket or the landing zone” “All Water and Little Substance” “As more and more applications are created that derive value from… new types of data… the Data Lake forms”
  • 3. “Data lakes can help resolve the nagging problem of accessibility and data integration” …and some of the discussions sound eerily familiar 3 © 2014 Teradata Data accessibility and integration? Isn’t that what the Data Warehouse is for?
  • 4. So is the Data Lake a new architectural construct? 4 © 2014 Teradata Or are we just re-platforming Data Marts? Simple, single subject area Dimensional Data Marts – with all of the dimensions pre-joined to the fact table? One-per-workload / application? Is this really the future of Enterprise Analytics? Or circa 1995 silo, departmental Decision Support Systems warmed-over?
  • 5. Take the merits of the different technologies out of the equation – and this is what some of us are thinking… 5 © 2014 Teradata
  • 6. …but there are no free lunches in Information Management – merely more and different options Explicit, or implicit, there is always, always, always (at least one) schema 6 © 2014 Teradata Agile application development, versus agile data acquisition None of the information management strategies / technologies are magic - “pay me now, or pay me later”
  • 7. 7 © 2014 Teradata Big Data Are Plural For the foreseeable future, we will need multiple Information Management strategies - and multiple Information Management technologies DATA WAREHOUSE DISCOVERY PLATFORM Integration becomes a critical concern DATA PLATFORM – Gartner – Logical Data Warehouse – Forrester – Enterprise Data Hub – Teradata – Unified Data Architecture
  • 8. 8 © 2014 Teradata A definition of the Data Lake (Data Reservoir) A centralised, consolidated, persistent store of raw, un-modelled and un-transformed data from multiple sources / silos (without an explicit, pre-defined schema, without externally defined metadata – and without guarantees about the quality, provenance and security of the data) Agile data acquisition – a haystack to go looking for needles… …with a natural storage model for complex, multi-structured data… …support for efficient non-relational computation… Now that is new, interesting and (potentially) very, very useful… …and provision for cost-effective storage of large and noisy data-sets.
  • 9. 9 © 2014 Teradata Data. Science
  • 10. does nature tend to give us a single, beautiful lake? Or a messy patchwork of lakes, plural? 10 © 2014 Teradata Left to its own devices, STOP PRESS: Laws of Physics* Unchanged! (* More specifically, the 2nd Law of Thermodynamics) None of the new information management strategies and technologies is by itself a cure for information entropy – data silos form naturally, just like lakes form naturally
  • 11. 11 © 2014 Teradata Summary and conclusions