SlideShare a Scribd company logo
Page 1 Author – Ramkumar Rajendran
Integration of SAP HANA
with Hadoop
Page 2 Author – Ramkumar Rajendran
Author Biography
Ramkumar Rajendran
Ramkumar Rajendran is a Consultant at a leading firm with an
experience of 4 years. He has specialized in various tools like SAP HANA,
SAP BI, SAP BO (Xcelsius, Webi and IDT), Tableau, Lumira and Hadoop-
Hive. He has worked upon the Sentiment Analysis of Twitter data. He
has involved in the integration of HANA and Hadoop. He has worked on
multiple implementation projects for various industry sectors.
Page 3 Author – Ramkumar Rajendran
Table of Contents
1 About this document.....................................................................................4
2 Introduction..................................................................................................5
SAP HANA......................................................................................................5
Hadoop..........................................................................................................5
3 Combined Potential of HANA and Hadoop..........Error!Bookmark notdefined.
4 Scenarios of Hadoop and Hana integration....................................................7
Federated Data Query through Smart Data Access (SDA).................................8
Business Objects Data Services.......................................................................9
SQOOP ........................................................................................................10
JAVA Program..............................................................................................12
5 Summary.....................................................................................................13
6 Reference Material......................................................................................13
Page 4 Author – Ramkumar Rajendran
About this document
This document would be talking about the combined potential of the in-memory database
’SAP HANA’ and the bigdata solution ‘Hadoop’ and the various methods of integration of
both these technologies and the scenarios where each of these methods would be
applicable .
SAP HANA is specialized in real-time in-memory processing, while Hadoop is apt for massive
parallel processing. Integration of both these technologies would have the advantages from
both of them.
Hadoop handles both structured and unstructured data from social media, machine logs,
etc. which can be further used along with the transactional data present in HANA resulting
in more mature business analysis.
This document has been prepared based upon SAP HANA SP6 and Hadoop CDH 4.5.
Page 5 Author – Ramkumar Rajendran
Introduction
SAP HANA
SAP HANA is an innovative in-memory database and data management platform,
specifically developed to take full advantage of the capabilities provided by modern
hardware to increase application performance. By keeping all relevant data in main
memory, data processing operations are significantly accelerated.
Design for scalability is a core SAP HANA principle. SAP HANA can be distributed across
many multiple hosts to achieve scalability in terms of both data volume and user
concurrency. Unlike clusters, distributed HANA systems also distribute the data efficiently,
achieving high scaling without I/O locks.
The key performance indicators of SAP HANA appeal to many of our customers, and
thousands of deployments are in progress. SAP HANA has become the fastest growing
product in SAP’s 40+ year history.
Hadoop
Hadoop is an open source software project that enables the distributed processing of large
data sets across clusters of commodity servers. It is designed to scale up from a single
server to thousands of machines, with a very high degree of fault tolerance. Rather than
relying on high-end hardware, the resiliency of these clusters comes from the software’s
ability to detect and handle failures at the application layer.
Hadoop is known for its massive parallel processing capabilities on large datasets. It is also
scalable, cost effective owing to cheaper processers, flexible and fault tolerant.
Page 6 Author – Ramkumar Rajendran
CombinedPotential of HANAand Hadoop
Hadoop can store very huge amount of data. It is well suited for storing unstructured data,
is good for manipulating very large files and is tolerant to hardware and software failures.
But the main challenge with Hadoop is getting information out of this huge data in real
time.
HANA is well suited for processing data in real time, thanks to its in-memory technology.
By integrating Hadoop’s massive parallel processing and HANA’s in-memory computing
capabilities the resultant solution would be capable of the following:
 Accommodation of both structured and un-structured data.
 Provision of cost efficient data storage and processing for large volumes data.
 Computation of complex Information Processing.
 Enabling heavily recursive algorithms, machine learning and queries that cannot be
easily expressed in SQL.
 Low Value Data Archive & Data stays available, though access is slower.
 Mine raw data that is either schema-less or where schema changes over time.
Page 7 Author – Ramkumar Rajendran
Scenarios ofHadoopand Hana integration
Smart Data Access Business Objects Data Services
SQOOP Java
Federated Data Query
through Smart Data
Access(SDA)
Hadoop
Reporting Tools
SDA
Data Loading from Hadoop to
HANA
Hadoop
SAP HANA
Reporting Tools
BODS
Data Loading with
Java Programming
Hadoop
SAP HANA
Reporting Tools
Java
Hadoop
SAP HANA
Reporting Tools
Data Loading from Hadoop to
HANA
SQOOOP
PULL
mechanism
PUSH
mechanism
PUSH or PULL
mechanism
SAP HANA
No Data
Loading
Page 8 Author – Ramkumar Rajendran
Federated Data Query throughSmart Data Access (SDA)
SAP HANA smart data access enables remote Hadoop data to be accessed as if they are local
tables in SAP HANA, without loading the data into SAP HANA.
Not only does this capability provide operational and cost benefits, but most importantly it
supports the development and deployment of the next generation of analytical applications
which require the ability to access, synthesize and integrate data from multiple systems in
real-time regardless of where the data is located or what systems are generating it.
Specifically in SAP HANA, we can create virtual tables which point to remote tables in
Hadoop. Customers can then write SQL queries in SAP HANA, which could operate on virtual
tables. The SAP HANA query processor optimizes these queries, and executes the relevant
part of the query in the target database, returns the results of the query to SAP HANA, and
completes the operation.
Recommended Scenarios
Using SDA to access Hadoop from HANA would involve federated query being fired on
Hadoop with the execution of the report. This technique is recommended when large
amount of result set gets generated at Hadoop when the reporting query is fired. Smart
Data Access involves aggregating the dataset at Hadoop using its system resources,
resulting in the transfer of only end results from Hadoop to HANA.
Advantages of this technique
 Real-time data access from Hadoop without actually having to load it into HANA
 Helps in scenarios where the data residing in Hadoop is updated very frequently and
data loading would make no sense.
 Query can be optimized by pushing the processing down to Hadoop, as it will return
aggregated data.
Disadvantages of this technique
 Federated Query gets slowed down when huge processing needs to be done on the
data at Hadoop end.
 Data transformation is not possible while using Smart Data Access.
Page 9 Author – Ramkumar Rajendran
 With this technique the reporting query would also be fired on Hadoop, which
makes it critical for it to be up at all times. In cases of multiple Hadoop systems, it
would become more potent of risk.
 Data can only be extracted from HIVE.
 Data access can happen only from Hadoop to HANA.
Business Objects Data Services
SAP Data Services delivers a single enterprise-class solution for data integration, data
quality, data profiling and text data processing. This technique involves data PULL
mechanism from Hadoop to HANA; so the entire control is based on BODS.
This wide range of features helps to -
 Integrate, transform, improve, and deliver trusted data from Hadoop to HANA
 Provides development user interfaces, a metadata repository, a data connectivity
layer, a run-time environment, and a management console enabling IT organizations
to lower total cost of ownership and accelerate time to value.
 Enable IT organizations to maximize operational efficiency with a single solution to
improve data quality and gain access to heterogeneous sources and applications.
Recommended Scenarios
Integrating HANA with Hadoop using BODS would involve data loading on a timely manner.
This can be utilized in scenarios where there is not requirement of real-time reporting, but
involves complex calculations on large datasets. This technique would prove very effective
in scenarios which involve multiple Hadoop systems with variety of unstructured data to be
processed on a large scale.
Page 10 Author– Ramkumar Rajendran
Advantages of this technique
 Unstructured data can be loaded from Hadoop to HANA with all the transformation
done while data loading.
 It is better suited for loading of large dataset.
 BODS can be utilized to implement complex transformations while loading data from
Hadoop to HANA.
 Performance of HANA can be improved by moving complex calculations to BODS.
 Its Error Handling aspect helps in better support and maintenance.
 Data encryption function to encrypt sensitive data is one of the niche aspects of data
loading through BODS.
 Centralized monitoring favors better IT support.
 Delta loads are also supported.
 Data transfer can happen from both the sides.
Disadvantages of this technique
 Data present in Hadoop cannot be availed on a real time basis since BODS loads data
from Hadoop to HANA as a batch job.
SQOOP
SQOOP is a tool designed for efficiently transferring bulk data between Hadoop and
structured data stores like Oracle, MsSQL, SAP HANA, etc. SQOOP can be used to import
data from external structured data stores into Hadoop Distributed File System or related
systems like Hive and HBase. Conversely, SQOOP can be used to extract data from Hadoop
and export it to external structured data stores such as relational databases and enterprise
data warehouses.
SQOOP provides a pluggable connector mechanism for optimal connectivity to external
systems. The SQOOP extension API provides a convenient framework for building new
connectors. New connectors can be dropped into SQOOP installations to provide
connectivity to various systems. SQOOP itself comes bundled with various connectors that
can be used for popular database and data warehousing systems.
Page 11 Author– Ramkumar Rajendran
By utilizing SQOOP data transfer would be automated through batch jobs and it utilizes the
native tools for high performance data transfer. It uses data store metadata to infer
structure definitions. It utilizes the MapReduce framework of Hadoop to transfer data in
parallel, which proves fruitful for huge amount of data. It provides an extension mechanism
to incorporate high performance connectors for external systems.
For exporting data to external targets, SQOOP supports the functionality of Staging Tables
which considerably improves the efficiency of data transfer and also acts as insulation from
data corruption during times of failure.
This technique involves PUSH mechanism to load data from Hadoop to HANA; so the entire
control is based upon SQOOP in Hadoop.
Recommended Scenarios
SQQOP is a component in Hadoop which helps in data transfer from HDFS to external
databases and vice versa. This technique of integrating SAP HANA with Hadoop would
involve periodic loading of data directly from the underlying Hadoop files to HANA tables.
SQOOP doesn’t support any transformation while transferring data. Hence this technique
can be used in scenarios which require no real-time reporting and readily formatted source
data which requires no cleansing. Also this would be most suited for bulk data transfers
since SQOOP uses the underlying MapReduce framework of Hadoop enabling parallel data
transfer.
Advantages of this technique
 It is better suited for loading of bulk datasets.
 Data transfers can happen from both the sides.
 It is open-source and hence cost-effective.
Disadvantages of this technique
 Data present in Hadoop cannot be availed on a real time basis since SQOOP loads
data from Hadoop to HANA as a batch job.
 No cleansing and formatting on the data can be done with SQOOP.
Page 12 Author– Ramkumar Rajendran
JAVA Program
Java program can be used to load data from Hadoop to HANA through JDBC connectivity.
This technique of HANA-Hadoop offers very high level of customization in terms of
cleansing, transformation, refining, filtering, etc. We can implement both PUSH and PULL
mechanism to transfer data from Hadoop to HANA, depending upon where the program is
installed and scheduled.
Recommended Scenarios
Data transfer from Hadoop to HANA is recommended in scenarios where it involves very
less data transfer. This technique offers very high level of control with the developers; so
they can come with a very customizable solution.
Advantages of this technique
 It offers customization at a greater extent.
 Java is open source; and hence it would be a cost-effective solution.
 Java program can be executed from the command line and doesn’t require any
additional setup to host.
Disadvantages of this technique
 It would require high level of programming skills.
 Error tracking and debugging becomes difficult.
Page 13 Author– Ramkumar Rajendran
Summary
The integration of HANA with Hadoop enables customers to move data between Hive and
Hadoop’s Distributed File System and SAP HANA. Hadoop is good at processing bulk data at
a very cheaper rate. Hence if a particular junk of data is not much valuable to the users, and
they don’t access them often, storing it in HANA will be cost-prohibitive.
By combining SAP HANA and Hadoop together, customers get the power of instant access
with SAP HANA and infinite scale with Hadoop. This gives SAP users a broad range of
options for storing and analyzing new types of data and the ability to create applications
that can uncover new business opportunities from vast amounts of data that would not
have been previously possible.
References
http://blog.cloudera.com/blog/
https://www.brighttalk.com/webcast/9727/86361
http://scn.sap.com/community/developer-center/hana/blog/2014/01/27/exporting-and-importing-
data-to-hana-with-hadoop-sqoop
http://www.saphana.com/docs/DOC-2934

More Related Content

PDF
How can Hadoop & SAP be integrated
PDF
SAP HORTONWORKS
DOCX
Sap hana platform sps 11 introduces new sap hana hadoop integration features
PPTX
Harnessing Big Data in Real-Time
PDF
CIO Guide to Using SAP HANA Platform For Big Data
PDF
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
PPTX
Leveraging SAP, Hadoop, and Big Data to Redefine Business
PPTX
Building Information Platform - Integration of Hadoop with SAP HANA and HANA ...
How can Hadoop & SAP be integrated
SAP HORTONWORKS
Sap hana platform sps 11 introduces new sap hana hadoop integration features
Harnessing Big Data in Real-Time
CIO Guide to Using SAP HANA Platform For Big Data
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Building Information Platform - Integration of Hadoop with SAP HANA and HANA ...

What's hot (20)

PDF
Hadoop integration with SAP HANA
PDF
SAP HANA Vora SITMTY 20160707
PPTX
Leveraging SAP, Hadoop, and Big Data to Redefine Business
PDF
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
PPTX
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
PDF
SAP HANA SPS10- Hadoop Integration
PDF
SAP Vora CodeJam
PDF
Filling the Data Lake
PDF
DoneDeal - AWS Data Analytics Platform
PPTX
Hadoop Innovation Summit 2014
PPTX
Hadoop in a Nutshell
PDF
Hawq wp 042313_final
 
PPTX
Hortonworks.bdb
PPTX
What's new on SAP HANA Smart Data Access
PPTX
PDF
Splice machine-bloor-webinar-data-lakes
PPTX
Tableau and hadoop
PPTX
Hackathon bonn
PPTX
Introduction to HANA in-memory from SAP
PPTX
SDA - POC
Hadoop integration with SAP HANA
SAP HANA Vora SITMTY 20160707
Leveraging SAP, Hadoop, and Big Data to Redefine Business
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
SAP HANA SPS10- Hadoop Integration
SAP Vora CodeJam
Filling the Data Lake
DoneDeal - AWS Data Analytics Platform
Hadoop Innovation Summit 2014
Hadoop in a Nutshell
Hawq wp 042313_final
 
Hortonworks.bdb
What's new on SAP HANA Smart Data Access
Splice machine-bloor-webinar-data-lakes
Tableau and hadoop
Hackathon bonn
Introduction to HANA in-memory from SAP
SDA - POC
Ad

Viewers also liked (13)

PDF
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
PDF
Imagem estadao
PDF
Advanced analytics with sap hana and r
PDF
NetScaler TCP Performance Tuning
PDF
Building LinkedIn's Learning Platform with MongoDB
PPT
Loading text data from SAP source systems
PDF
Performance Testing: Eliminate System Outages and Save Millions
PDF
Leverage Social Media Data with SAP Data Services
PDF
Leverage Data Services to Boost Sales
PPTX
HANA SPS07 Smart Data Access
PDF
Big data/Hadoop/HANA Basics
PPTX
RDS Supporting SAP HANA
PDF
Translating Big Data Insight Into Action
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
Imagem estadao
Advanced analytics with sap hana and r
NetScaler TCP Performance Tuning
Building LinkedIn's Learning Platform with MongoDB
Loading text data from SAP source systems
Performance Testing: Eliminate System Outages and Save Millions
Leverage Social Media Data with SAP Data Services
Leverage Data Services to Boost Sales
HANA SPS07 Smart Data Access
Big data/Hadoop/HANA Basics
RDS Supporting SAP HANA
Translating Big Data Insight Into Action
Ad

Similar to Integration of SAP HANA with Hadoop (20)

PPTX
finap ppt conference.pptx
PDF
SAP Lambda Architecture Point of View
PDF
What Is SAP HANA And Its Benefits?
PDF
How is sap data services unique for sap hana integration
PPT
Pervasive DataRush
PDF
Actian DataFlow Whitepaper
PPTX
Why Hadoop as a Service?
PPTX
View on big data technologies
PDF
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
PPTX
Hadoop Architecture_ Understanding HDFS, MapReduce, and YARN.pptx
PDF
SAP HANA - Big Data and Fast Data
PPTX
Hadoop Training in Delhi
PDF
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
PDF
Hadoop vs spark
PDF
Hadoop data-lake-white-paper
PPTX
Hadoop is not an Island in the Enterprise
PDF
Rajesh Angadi Brochure
PPT
Introduction to Apache hadoop
PDF
Top 10 Big Data Tools that you should know about.pdf
PPT
Hadoop in action
finap ppt conference.pptx
SAP Lambda Architecture Point of View
What Is SAP HANA And Its Benefits?
How is sap data services unique for sap hana integration
Pervasive DataRush
Actian DataFlow Whitepaper
Why Hadoop as a Service?
View on big data technologies
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
Hadoop Architecture_ Understanding HDFS, MapReduce, and YARN.pptx
SAP HANA - Big Data and Fast Data
Hadoop Training in Delhi
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
Hadoop vs spark
Hadoop data-lake-white-paper
Hadoop is not an Island in the Enterprise
Rajesh Angadi Brochure
Introduction to Apache hadoop
Top 10 Big Data Tools that you should know about.pdf
Hadoop in action

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
medical staffing services at VALiNTRY
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administration Chapter 2
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
System and Network Administraation Chapter 3
PDF
Cost to Outsource Software Development in 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Digital Strategies for Manufacturing Companies
PPTX
history of c programming in notes for students .pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
top salesforce developer skills in 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Reimagine Home Health with the Power of Agentic AI​
medical staffing services at VALiNTRY
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administration Chapter 2
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
System and Network Administraation Chapter 3
Cost to Outsource Software Development in 2025
Why Generative AI is the Future of Content, Code & Creativity?
PTS Company Brochure 2025 (1).pdf.......
Digital Strategies for Manufacturing Companies
history of c programming in notes for students .pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
L1 - Introduction to python Backend.pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
top salesforce developer skills in 2025.pdf
Operating system designcfffgfgggggggvggggggggg
CHAPTER 2 - PM Management and IT Context
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

Integration of SAP HANA with Hadoop

  • 1. Page 1 Author – Ramkumar Rajendran Integration of SAP HANA with Hadoop
  • 2. Page 2 Author – Ramkumar Rajendran Author Biography Ramkumar Rajendran Ramkumar Rajendran is a Consultant at a leading firm with an experience of 4 years. He has specialized in various tools like SAP HANA, SAP BI, SAP BO (Xcelsius, Webi and IDT), Tableau, Lumira and Hadoop- Hive. He has worked upon the Sentiment Analysis of Twitter data. He has involved in the integration of HANA and Hadoop. He has worked on multiple implementation projects for various industry sectors.
  • 3. Page 3 Author – Ramkumar Rajendran Table of Contents 1 About this document.....................................................................................4 2 Introduction..................................................................................................5 SAP HANA......................................................................................................5 Hadoop..........................................................................................................5 3 Combined Potential of HANA and Hadoop..........Error!Bookmark notdefined. 4 Scenarios of Hadoop and Hana integration....................................................7 Federated Data Query through Smart Data Access (SDA).................................8 Business Objects Data Services.......................................................................9 SQOOP ........................................................................................................10 JAVA Program..............................................................................................12 5 Summary.....................................................................................................13 6 Reference Material......................................................................................13
  • 4. Page 4 Author – Ramkumar Rajendran About this document This document would be talking about the combined potential of the in-memory database ’SAP HANA’ and the bigdata solution ‘Hadoop’ and the various methods of integration of both these technologies and the scenarios where each of these methods would be applicable . SAP HANA is specialized in real-time in-memory processing, while Hadoop is apt for massive parallel processing. Integration of both these technologies would have the advantages from both of them. Hadoop handles both structured and unstructured data from social media, machine logs, etc. which can be further used along with the transactional data present in HANA resulting in more mature business analysis. This document has been prepared based upon SAP HANA SP6 and Hadoop CDH 4.5.
  • 5. Page 5 Author – Ramkumar Rajendran Introduction SAP HANA SAP HANA is an innovative in-memory database and data management platform, specifically developed to take full advantage of the capabilities provided by modern hardware to increase application performance. By keeping all relevant data in main memory, data processing operations are significantly accelerated. Design for scalability is a core SAP HANA principle. SAP HANA can be distributed across many multiple hosts to achieve scalability in terms of both data volume and user concurrency. Unlike clusters, distributed HANA systems also distribute the data efficiently, achieving high scaling without I/O locks. The key performance indicators of SAP HANA appeal to many of our customers, and thousands of deployments are in progress. SAP HANA has become the fastest growing product in SAP’s 40+ year history. Hadoop Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. Hadoop is known for its massive parallel processing capabilities on large datasets. It is also scalable, cost effective owing to cheaper processers, flexible and fault tolerant.
  • 6. Page 6 Author – Ramkumar Rajendran CombinedPotential of HANAand Hadoop Hadoop can store very huge amount of data. It is well suited for storing unstructured data, is good for manipulating very large files and is tolerant to hardware and software failures. But the main challenge with Hadoop is getting information out of this huge data in real time. HANA is well suited for processing data in real time, thanks to its in-memory technology. By integrating Hadoop’s massive parallel processing and HANA’s in-memory computing capabilities the resultant solution would be capable of the following:  Accommodation of both structured and un-structured data.  Provision of cost efficient data storage and processing for large volumes data.  Computation of complex Information Processing.  Enabling heavily recursive algorithms, machine learning and queries that cannot be easily expressed in SQL.  Low Value Data Archive & Data stays available, though access is slower.  Mine raw data that is either schema-less or where schema changes over time.
  • 7. Page 7 Author – Ramkumar Rajendran Scenarios ofHadoopand Hana integration Smart Data Access Business Objects Data Services SQOOP Java Federated Data Query through Smart Data Access(SDA) Hadoop Reporting Tools SDA Data Loading from Hadoop to HANA Hadoop SAP HANA Reporting Tools BODS Data Loading with Java Programming Hadoop SAP HANA Reporting Tools Java Hadoop SAP HANA Reporting Tools Data Loading from Hadoop to HANA SQOOOP PULL mechanism PUSH mechanism PUSH or PULL mechanism SAP HANA No Data Loading
  • 8. Page 8 Author – Ramkumar Rajendran Federated Data Query throughSmart Data Access (SDA) SAP HANA smart data access enables remote Hadoop data to be accessed as if they are local tables in SAP HANA, without loading the data into SAP HANA. Not only does this capability provide operational and cost benefits, but most importantly it supports the development and deployment of the next generation of analytical applications which require the ability to access, synthesize and integrate data from multiple systems in real-time regardless of where the data is located or what systems are generating it. Specifically in SAP HANA, we can create virtual tables which point to remote tables in Hadoop. Customers can then write SQL queries in SAP HANA, which could operate on virtual tables. The SAP HANA query processor optimizes these queries, and executes the relevant part of the query in the target database, returns the results of the query to SAP HANA, and completes the operation. Recommended Scenarios Using SDA to access Hadoop from HANA would involve federated query being fired on Hadoop with the execution of the report. This technique is recommended when large amount of result set gets generated at Hadoop when the reporting query is fired. Smart Data Access involves aggregating the dataset at Hadoop using its system resources, resulting in the transfer of only end results from Hadoop to HANA. Advantages of this technique  Real-time data access from Hadoop without actually having to load it into HANA  Helps in scenarios where the data residing in Hadoop is updated very frequently and data loading would make no sense.  Query can be optimized by pushing the processing down to Hadoop, as it will return aggregated data. Disadvantages of this technique  Federated Query gets slowed down when huge processing needs to be done on the data at Hadoop end.  Data transformation is not possible while using Smart Data Access.
  • 9. Page 9 Author – Ramkumar Rajendran  With this technique the reporting query would also be fired on Hadoop, which makes it critical for it to be up at all times. In cases of multiple Hadoop systems, it would become more potent of risk.  Data can only be extracted from HIVE.  Data access can happen only from Hadoop to HANA. Business Objects Data Services SAP Data Services delivers a single enterprise-class solution for data integration, data quality, data profiling and text data processing. This technique involves data PULL mechanism from Hadoop to HANA; so the entire control is based on BODS. This wide range of features helps to -  Integrate, transform, improve, and deliver trusted data from Hadoop to HANA  Provides development user interfaces, a metadata repository, a data connectivity layer, a run-time environment, and a management console enabling IT organizations to lower total cost of ownership and accelerate time to value.  Enable IT organizations to maximize operational efficiency with a single solution to improve data quality and gain access to heterogeneous sources and applications. Recommended Scenarios Integrating HANA with Hadoop using BODS would involve data loading on a timely manner. This can be utilized in scenarios where there is not requirement of real-time reporting, but involves complex calculations on large datasets. This technique would prove very effective in scenarios which involve multiple Hadoop systems with variety of unstructured data to be processed on a large scale.
  • 10. Page 10 Author– Ramkumar Rajendran Advantages of this technique  Unstructured data can be loaded from Hadoop to HANA with all the transformation done while data loading.  It is better suited for loading of large dataset.  BODS can be utilized to implement complex transformations while loading data from Hadoop to HANA.  Performance of HANA can be improved by moving complex calculations to BODS.  Its Error Handling aspect helps in better support and maintenance.  Data encryption function to encrypt sensitive data is one of the niche aspects of data loading through BODS.  Centralized monitoring favors better IT support.  Delta loads are also supported.  Data transfer can happen from both the sides. Disadvantages of this technique  Data present in Hadoop cannot be availed on a real time basis since BODS loads data from Hadoop to HANA as a batch job. SQOOP SQOOP is a tool designed for efficiently transferring bulk data between Hadoop and structured data stores like Oracle, MsSQL, SAP HANA, etc. SQOOP can be used to import data from external structured data stores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, SQOOP can be used to extract data from Hadoop and export it to external structured data stores such as relational databases and enterprise data warehouses. SQOOP provides a pluggable connector mechanism for optimal connectivity to external systems. The SQOOP extension API provides a convenient framework for building new connectors. New connectors can be dropped into SQOOP installations to provide connectivity to various systems. SQOOP itself comes bundled with various connectors that can be used for popular database and data warehousing systems.
  • 11. Page 11 Author– Ramkumar Rajendran By utilizing SQOOP data transfer would be automated through batch jobs and it utilizes the native tools for high performance data transfer. It uses data store metadata to infer structure definitions. It utilizes the MapReduce framework of Hadoop to transfer data in parallel, which proves fruitful for huge amount of data. It provides an extension mechanism to incorporate high performance connectors for external systems. For exporting data to external targets, SQOOP supports the functionality of Staging Tables which considerably improves the efficiency of data transfer and also acts as insulation from data corruption during times of failure. This technique involves PUSH mechanism to load data from Hadoop to HANA; so the entire control is based upon SQOOP in Hadoop. Recommended Scenarios SQQOP is a component in Hadoop which helps in data transfer from HDFS to external databases and vice versa. This technique of integrating SAP HANA with Hadoop would involve periodic loading of data directly from the underlying Hadoop files to HANA tables. SQOOP doesn’t support any transformation while transferring data. Hence this technique can be used in scenarios which require no real-time reporting and readily formatted source data which requires no cleansing. Also this would be most suited for bulk data transfers since SQOOP uses the underlying MapReduce framework of Hadoop enabling parallel data transfer. Advantages of this technique  It is better suited for loading of bulk datasets.  Data transfers can happen from both the sides.  It is open-source and hence cost-effective. Disadvantages of this technique  Data present in Hadoop cannot be availed on a real time basis since SQOOP loads data from Hadoop to HANA as a batch job.  No cleansing and formatting on the data can be done with SQOOP.
  • 12. Page 12 Author– Ramkumar Rajendran JAVA Program Java program can be used to load data from Hadoop to HANA through JDBC connectivity. This technique of HANA-Hadoop offers very high level of customization in terms of cleansing, transformation, refining, filtering, etc. We can implement both PUSH and PULL mechanism to transfer data from Hadoop to HANA, depending upon where the program is installed and scheduled. Recommended Scenarios Data transfer from Hadoop to HANA is recommended in scenarios where it involves very less data transfer. This technique offers very high level of control with the developers; so they can come with a very customizable solution. Advantages of this technique  It offers customization at a greater extent.  Java is open source; and hence it would be a cost-effective solution.  Java program can be executed from the command line and doesn’t require any additional setup to host. Disadvantages of this technique  It would require high level of programming skills.  Error tracking and debugging becomes difficult.
  • 13. Page 13 Author– Ramkumar Rajendran Summary The integration of HANA with Hadoop enables customers to move data between Hive and Hadoop’s Distributed File System and SAP HANA. Hadoop is good at processing bulk data at a very cheaper rate. Hence if a particular junk of data is not much valuable to the users, and they don’t access them often, storing it in HANA will be cost-prohibitive. By combining SAP HANA and Hadoop together, customers get the power of instant access with SAP HANA and infinite scale with Hadoop. This gives SAP users a broad range of options for storing and analyzing new types of data and the ability to create applications that can uncover new business opportunities from vast amounts of data that would not have been previously possible. References http://blog.cloudera.com/blog/ https://www.brighttalk.com/webcast/9727/86361 http://scn.sap.com/community/developer-center/hana/blog/2014/01/27/exporting-and-importing- data-to-hana-with-hadoop-sqoop http://www.saphana.com/docs/DOC-2934