SlideShare a Scribd company logo
DATA VIRTUALIZATION
APAC WEBINAR SERIES
Sessions Covering Key Data
Integration Challenges Solved
with Data Virtualization
How Data Virtualization adds value to your data
science stack
Chris Day
Director, APAC Sales Engineering, Denodo
Sushant Kumar
Product Marketing Manager, Denodo
Agenda
1. The data science stack
2. The data science workflow
3. Logical data lake architecture
4. Data virtualization features for data scientists
5. Demo
6. Q&A
7. Next Steps
How Data Virtualization adds value
to your data science stack
4
Product Marketing Manager, Denodo
Sushant Kumar
53
The Tools of Data Science
When thinking about data science, most
minds immediately go to languages of
Python and R, or tools like Spark and
TensorFlow
There is a myriad projects that currently
serve the needs of the data scientists
6
The Data Scientist Workflow
A typical workflow for a data scientist is:
1. Gather the requirements for the business problem
2. Identify useful data
▪ Ingest data
3. Cleanse data into a useful format
4. Analyze data
5. Prepare input for your algorithms
6. Execute data science algorithms (ML, AI, etc.)
▪ Iterate steps 2 to 6 until valuable insights are produced
7. Visualize and share
Source:
http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
7
Where does your time go?
A large amount of time and effort goes into tasks not intrinsically related to data science:
• Finding where the right data may be
• Getting access to the data
• Bureaucracy
• Understand access methods and technology (noSQL, REST APIs, etc.)
• Transforming data into a format easy to work with
• Combining data originally available in different sources and formats
• Profile and cleanse data to eliminate incomplete or inconsistent data points
8
Reference Architecture
ETL
Data Warehouse
Kafka
SparkML
Logical Data Lake
Spark
Streaming
SQL
interface
Distributed Storage (HDFS,
S3)
Physical Data Lake
Files
9
Data Scientist Flow
Identify useful
data
Modify data into
a useful format
Analyze
data
Execute data
science algorithms
(ML, AI, etc.)
Prepare for
ML algorithm
10
Identify useful data
If the company has a virtual layer with a good coverage of data
sources, this task is greatly simplified
• A data virtualization tool like Denodo can offer unified access to
all data available in the company
• It abstracts the technologies underneath, offering a standard
SQL interface to query and manipulate
To further simplify the challenge, Denodo offers a Data
Catalog to search, find and explore your data assets
11
Search & Explore: Metadata
Search the catalog and refine your results using descriptions, tags and business
categories
12
Search& Explore:Content
Integration with Lucene and ElasticSearch for indexing and performing keyword-
base searches on the content
13
Document your models
Rich HTML descriptions, editable directly from the catalog
Extended metadata support to enrich the catalog with custom fields and details
14
Data Scientist Flow
Identify useful
data
Modify data into
a useful format
Analyze
data
Execute data
science algorithms
(ML, AI, etc.)
Prepare for
ML algorithm
15
Ingestion and Data Manipulation tasks
• Typically, scientists get data from a variety of places through
various formats and protocols. From relational databases, to
REST web services or noSQL engines.
• Data is often exported into CSV files or loaded into Spark
• Later, that data is manipulated in scripts (e.g. Pandas and
Python)
• However, data virtualization offers the unique opportunity of
using standard SQL (joins, aggregations, transformations, etc.)
to access, manipulate and analyze any data
• Cleansing and transformation steps can be easily accomplished in
SQL
• Its modeling capabilities enable the definition of views that embed
this logic to foster reusability
16
Denodo Administration Tool
17
Notebooks: Apache Zeppelin
18
Denodo and Spark: data science with large volumes
Spark as a source: Spark, as well as many other Hadoop systems (Hive, Presto, Impala,
HBase, etc.), can be use by Denodo as a data source to read data
• Denodo will push down the execution to those systems, translating SQL into their
corresponding dialects
Spark as the processing engine: In cases where Denodo needs to post-process data,
for example in multi-source queries, Denodo is able to lift and shift to automatically
use Spark’s engine for execution
Spark as the data target: Denodo can automatically save the data from any execution
in a target Spark cluster when your processing needs (e.g. SparkML) require local data
Product Demonstration
Director, APAC Sales Engineering, Denodo
Chris Day
20
Key Takeaways
✓ Denodo can play key role in the data science ecosystem
to reduce data exploration and analysis timeframes
✓ Extends and integrates with the capabilities of notebooks,
Python, R, etc. to improve the toolset of the data scientist
✓ Provides a modern “SQL-on-Anything” engine
✓ Can leverage Big Data technologies like Spark (as a data
source, an ingestion tool and for external processing)
to efficiently work with large data volumes
✓ Helps productionalize data science
Q&A
22
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
https://bit.ly/2AouQLQ
GET STARTED TODAY
Q&A
Next Session: Jun 25
What is the future of data strategy?
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

More Related Content

DOCX
Introduction To Data Science with Apache Spark
PPTX
tecFinal 451 webinar deck
PDF
Big Data Ecosystem
PDF
Hadoop and Big Data Analytics | Sysfore
PPTX
Top Big data Analytics tools: Emerging trends and Best practices
PDF
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
PDF
Big data landscape
PDF
Big Data , Big Problem?
Introduction To Data Science with Apache Spark
tecFinal 451 webinar deck
Big Data Ecosystem
Hadoop and Big Data Analytics | Sysfore
Top Big data Analytics tools: Emerging trends and Best practices
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Big data landscape
Big Data , Big Problem?

What's hot (20)

PDF
Big Data processing with Apache Spark
PPTX
Exploring Big Data Analytics Tools
PDF
From hadoop to spark
PDF
Cassandra
PPTX
Big Data Analytics
PDF
Building Knowledge Graphs in 10 steps
PDF
Big Data Tech Stack
PPTX
Big data Analytics Hadoop
PPT
Big Tools for Big Data
PPTX
(The life of a) Data engineer
PPTX
BDaas- BigData as a service
PPTX
Big Data Analytics Projects - Real World with Pentaho
PDF
Modern Big Data Analytics Tools: An Overview
PDF
Modern data warehouse
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Big data ppt
PPTX
Great Expectations Presentation
PPTX
Big Data Use Cases
PPT
BigData Analytics with Hadoop and BIRT
PDF
Open source stak of big data techs open suse asia
Big Data processing with Apache Spark
Exploring Big Data Analytics Tools
From hadoop to spark
Cassandra
Big Data Analytics
Building Knowledge Graphs in 10 steps
Big Data Tech Stack
Big data Analytics Hadoop
Big Tools for Big Data
(The life of a) Data engineer
BDaas- BigData as a service
Big Data Analytics Projects - Real World with Pentaho
Modern Big Data Analytics Tools: An Overview
Modern data warehouse
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big data ppt
Great Expectations Presentation
Big Data Use Cases
BigData Analytics with Hadoop and BIRT
Open source stak of big data techs open suse asia
Ad

Similar to How Data Virtualization Adds Value to Your Data Science Stack (20)

PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
PDF
How Data Virtualization Puts Machine Learning into Production (APAC)
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
PDF
Data Science Operationalization: The Journey of Enterprise AI
PDF
Virtualisation de données : Enjeux, Usages & Bénéfices
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Modern Data Management for Federal Modernization
PDF
Delivering Faster Insights with a Logical Data Fabric
PDF
What is the future of data strategy?
PDF
Performance Acceleration: Summaries, Recommendation, MPP and more
PDF
Logical Data Fabric: An Introduction
PDF
Cloud Modernization and Data as a Service Option
PDF
Advanced Analytics and Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Machine Learning into Production (APAC)
Advanced Analytics and Machine Learning with Data Virtualization (India)
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Advanced Analytics and Machine Learning with Data Virtualization
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Data Science Operationalization: The Journey of Enterprise AI
Virtualisation de données : Enjeux, Usages & Bénéfices
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Unlock Your Data for ML & AI using Data Virtualization
Modern Data Management for Federal Modernization
Delivering Faster Insights with a Logical Data Fabric
What is the future of data strategy?
Performance Acceleration: Summaries, Recommendation, MPP and more
Logical Data Fabric: An Introduction
Cloud Modernization and Data as a Service Option
Advanced Analytics and Machine Learning with Data Virtualization
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Linux OS guide to know, operate. Linux Filesystem, command, users and system
PPTX
Computer network topology notes for revision
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
PPTX
Global journeys: estimating international migration
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Understanding Prototyping in Design and Development
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx
Fluorescence-microscope_Botany_detailed content
Linux OS guide to know, operate. Linux Filesystem, command, users and system
Computer network topology notes for revision
Logistic Regression ml machine learning.pptx
Moving the Public Sector (Government) to a Digital Adoption
Data-Driven-Credit-Card-Launch-A-Wells-Fargo-Case-Study.pptx
Global journeys: estimating international migration
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Knowledge Engineering Part 1
Taxes Foundatisdcsdcsdon Certificate.pdf
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Understanding Prototyping in Design and Development
IB Computer Science - Internal Assessment.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
.pdf is not working space design for the following data for the following dat...
STUDY DESIGN details- Lt Col Maksud (21).pptx

How Data Virtualization Adds Value to Your Data Science Stack

  • 1. DATA VIRTUALIZATION APAC WEBINAR SERIES Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2. How Data Virtualization adds value to your data science stack Chris Day Director, APAC Sales Engineering, Denodo Sushant Kumar Product Marketing Manager, Denodo
  • 3. Agenda 1. The data science stack 2. The data science workflow 3. Logical data lake architecture 4. Data virtualization features for data scientists 5. Demo 6. Q&A 7. Next Steps
  • 4. How Data Virtualization adds value to your data science stack 4 Product Marketing Manager, Denodo Sushant Kumar
  • 5. 53 The Tools of Data Science When thinking about data science, most minds immediately go to languages of Python and R, or tools like Spark and TensorFlow There is a myriad projects that currently serve the needs of the data scientists
  • 6. 6 The Data Scientist Workflow A typical workflow for a data scientist is: 1. Gather the requirements for the business problem 2. Identify useful data ▪ Ingest data 3. Cleanse data into a useful format 4. Analyze data 5. Prepare input for your algorithms 6. Execute data science algorithms (ML, AI, etc.) ▪ Iterate steps 2 to 6 until valuable insights are produced 7. Visualize and share Source: http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  • 7. 7 Where does your time go? A large amount of time and effort goes into tasks not intrinsically related to data science: • Finding where the right data may be • Getting access to the data • Bureaucracy • Understand access methods and technology (noSQL, REST APIs, etc.) • Transforming data into a format easy to work with • Combining data originally available in different sources and formats • Profile and cleanse data to eliminate incomplete or inconsistent data points
  • 8. 8 Reference Architecture ETL Data Warehouse Kafka SparkML Logical Data Lake Spark Streaming SQL interface Distributed Storage (HDFS, S3) Physical Data Lake Files
  • 9. 9 Data Scientist Flow Identify useful data Modify data into a useful format Analyze data Execute data science algorithms (ML, AI, etc.) Prepare for ML algorithm
  • 10. 10 Identify useful data If the company has a virtual layer with a good coverage of data sources, this task is greatly simplified • A data virtualization tool like Denodo can offer unified access to all data available in the company • It abstracts the technologies underneath, offering a standard SQL interface to query and manipulate To further simplify the challenge, Denodo offers a Data Catalog to search, find and explore your data assets
  • 11. 11 Search & Explore: Metadata Search the catalog and refine your results using descriptions, tags and business categories
  • 12. 12 Search& Explore:Content Integration with Lucene and ElasticSearch for indexing and performing keyword- base searches on the content
  • 13. 13 Document your models Rich HTML descriptions, editable directly from the catalog Extended metadata support to enrich the catalog with custom fields and details
  • 14. 14 Data Scientist Flow Identify useful data Modify data into a useful format Analyze data Execute data science algorithms (ML, AI, etc.) Prepare for ML algorithm
  • 15. 15 Ingestion and Data Manipulation tasks • Typically, scientists get data from a variety of places through various formats and protocols. From relational databases, to REST web services or noSQL engines. • Data is often exported into CSV files or loaded into Spark • Later, that data is manipulated in scripts (e.g. Pandas and Python) • However, data virtualization offers the unique opportunity of using standard SQL (joins, aggregations, transformations, etc.) to access, manipulate and analyze any data • Cleansing and transformation steps can be easily accomplished in SQL • Its modeling capabilities enable the definition of views that embed this logic to foster reusability
  • 18. 18 Denodo and Spark: data science with large volumes Spark as a source: Spark, as well as many other Hadoop systems (Hive, Presto, Impala, HBase, etc.), can be use by Denodo as a data source to read data • Denodo will push down the execution to those systems, translating SQL into their corresponding dialects Spark as the processing engine: In cases where Denodo needs to post-process data, for example in multi-source queries, Denodo is able to lift and shift to automatically use Spark’s engine for execution Spark as the data target: Denodo can automatically save the data from any execution in a target Spark cluster when your processing needs (e.g. SparkML) require local data
  • 19. Product Demonstration Director, APAC Sales Engineering, Denodo Chris Day
  • 20. 20 Key Takeaways ✓ Denodo can play key role in the data science ecosystem to reduce data exploration and analysis timeframes ✓ Extends and integrates with the capabilities of notebooks, Python, R, etc. to improve the toolset of the data scientist ✓ Provides a modern “SQL-on-Anything” engine ✓ Can leverage Big Data technologies like Spark (as a data source, an ingestion tool and for external processing) to efficiently work with large data volumes ✓ Helps productionalize data science
  • 21. Q&A
  • 22. 22 Next Steps Access Denodo Platform in the Cloud! Take a Test Drive today! https://bit.ly/2AouQLQ GET STARTED TODAY
  • 23. Q&A Next Session: Jun 25 What is the future of data strategy?
  • 24. Thanks! www.denodo.com [email protected] © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.