SlideShare a Scribd company logo
3
Most read
4
Most read
Introduction to SQOOP
Agenda
 What is Sqoop
 Why Sqoop?
 How Sqoop Works
 Sqoop Architecture
 Sqoop Import
 Sqoop Export
What is Sqoop
 Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and
structured datastores such as relational databases.
 Sqoop imports data from external structured datastores into HDFS or related systems like Hive and
HBase.
 Sqoop can also be used to export data from Hadoop and export it to external structured datastores
such as relational databases and enterprise data warehouses.
Why Sqoop?
 As more organizations deploy Hadoop to analyse vast streams of information, they may
find they need to transfer large amount of data between Hadoop and their existing
databases, data warehouses and other data sources
 Loading bulk data into Hadoop from production systems or accessing it from map-
reduce applications running on a large cluster is a challenging task since transferring
data using scripts is a inefficient and time-consuming task
 Allows data imports from external datastores and enterprise data warehouses into
Hadoop
 Parallelizes data transfer for fast performance and optimal system utilization
 Copies data quickly from external systems to Hadoop
 Makes data analysis more efficient
How Sqoop Works
Sqoop Architecture
Sqoop Import
 sqoop import --connect jdbc:postgresql://hdp-master/sqoop_db --username
sqoop_user --password postgres --table cities
Sqoop Export
 sqoop export --connect jdbc:postgresql://hdp-master/sqoop_db --username
sqoop_user --password postgres --table cities --export-dir cities

More Related Content

PDF
Introduction to Apache Sqoop
PDF
PPTX
Apache sqoop with an use case
PDF
SQOOP PPT
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PDF
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
PDF
Apache Sqoop: A Data Transfer Tool for Hadoop
PPTX
Apache hive introduction
Introduction to Apache Sqoop
Apache sqoop with an use case
SQOOP PPT
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache hive introduction

What's hot (20)

PDF
PPTX
Session 14 - Hive
PPTX
Programming in Spark using PySpark
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PDF
Intro to HBase
PPTX
Azure Data Factory Data Flow
PPTX
Apache Hive
PDF
Annexe Big Data
PPTX
PPTX
Introduction to Apache Spark
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PDF
Apache Spark Overview
PPTX
Learn Apache Spark: A Comprehensive Guide
PPTX
Couchbase 101
PDF
Big Data Analytics with Spark
PPTX
Data Vault Overview
PPTX
Apache Spark Architecture
PDF
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
PPTX
Session 14 - Hive
Programming in Spark using PySpark
Data Lakehouse Symposium | Day 1 | Part 2
Intro to HBase
Azure Data Factory Data Flow
Apache Hive
Annexe Big Data
Introduction to Apache Spark
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Overview
Learn Apache Spark: A Comprehensive Guide
Couchbase 101
Big Data Analytics with Spark
Data Vault Overview
Apache Spark Architecture
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Ad

Viewers also liked (20)

PDF
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
PDF
Highlights Of Sqoop2
PPTX
Big Data with Apache Hadoop
PPTX
Hadoop crashcourse v3
PDF
Big data: Loading your data with flume and sqoop
PDF
New Data Transfer Tools for Hadoop: Sqoop 2
PDF
Optimizing Hive Queries
PPTX
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
PDF
Apache Flume
PDF
Apache Flume
PPTX
From oracle to hadoop with Sqoop and other tools
PDF
Intro To MongoDB
PDF
Apache Flume - DataDayTexas
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PPT
Introduction to MongoDB
PDF
Hive Quick Start Tutorial
PDF
Integration of Hive and HBase
KEY
Intro to Data Science for Enterprise Big Data
PDF
Hadoop Family and Ecosystem
Sqoop2 refactoring for generic data transfer - Hadoop Strata Sqoop Meetup
Highlights Of Sqoop2
Big Data with Apache Hadoop
Hadoop crashcourse v3
Big data: Loading your data with flume and sqoop
New Data Transfer Tools for Hadoop: Sqoop 2
Optimizing Hive Queries
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
HBaseCon 2013: Integration of Apache Hive and HBase
Apache Flume
Apache Flume
From oracle to hadoop with Sqoop and other tools
Intro To MongoDB
Apache Flume - DataDayTexas
Apache Hadoop YARN - Enabling Next Generation Data Applications
Introduction to MongoDB
Hive Quick Start Tutorial
Integration of Hive and HBase
Intro to Data Science for Enterprise Big Data
Hadoop Family and Ecosystem
Ad

Similar to Introduction to sqoop (20)

PDF
SQOOP - RDBMS to Hadoop
PPTX
Bigdata
PDF
What is hadoop
PPTX
Analysis of historical movie data by BHADRA
PPTX
Data ingestion
DOCX
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
PPTX
Brief Introduction about Hadoop and Core Services.
PPTX
Oozie & sqoop by pradeep
PDF
Hawq wp 042313_final
 
PPTX
Hadoop white papers
PDF
What is Apache Hadoop and its ecosystem?
PDF
Hadoop data-lake-white-paper
PPTX
PPTX
Intro to Hadoop
PDF
Hadoop content
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
DOC
HariKrishna4+_cv
PPTX
Intro to Hybrid Data Warehouse
PPTX
Big Data Technology Stack : Nutshell
PPTX
12 SQL On-Hadoop Tools
SQOOP - RDBMS to Hadoop
Bigdata
What is hadoop
Analysis of historical movie data by BHADRA
Data ingestion
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Brief Introduction about Hadoop and Core Services.
Oozie & sqoop by pradeep
Hawq wp 042313_final
 
Hadoop white papers
What is Apache Hadoop and its ecosystem?
Hadoop data-lake-white-paper
Intro to Hadoop
Hadoop content
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
HariKrishna4+_cv
Intro to Hybrid Data Warehouse
Big Data Technology Stack : Nutshell
12 SQL On-Hadoop Tools

More from Uday Vakalapudi (12)

PPTX
Introduction to pig
PPTX
Introduction to hbase
PPTX
Introduction to Hive
PPTX
Introduction to HDFS and MapReduce
PPTX
Advanced topics in hive
PPTX
Mapreduce total order sorting technique
PPTX
Repartition join in mapreduce
PPTX
Hadoop Mapreduce joins
PPTX
Oozie workflow using HUE 2.2
PPTX
Apache Storm and twitter Streaming API integration
PPTX
How Hadoop Exploits Data Locality
PPTX
Flume basic
Introduction to pig
Introduction to hbase
Introduction to Hive
Introduction to HDFS and MapReduce
Advanced topics in hive
Mapreduce total order sorting technique
Repartition join in mapreduce
Hadoop Mapreduce joins
Oozie workflow using HUE 2.2
Apache Storm and twitter Streaming API integration
How Hadoop Exploits Data Locality
Flume basic

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Introduction to the R Programming Language
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Computer network topology notes for revision
PPT
Predictive modeling basics in data cleaning process
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
Mega Projects Data Mega Projects Data
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to the R Programming Language
climate analysis of Dhaka ,Banglades.pptx
SAP 2 completion done . PRESENTATION.pptx
.pdf is not working space design for the following data for the following dat...
Introduction-to-Cloud-ComputingFinal.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Computer network topology notes for revision
Predictive modeling basics in data cleaning process
ISS -ESG Data flows What is ESG and HowHow
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21

Introduction to sqoop

  • 2. Agenda  What is Sqoop  Why Sqoop?  How Sqoop Works  Sqoop Architecture  Sqoop Import  Sqoop Export
  • 3. What is Sqoop  Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.  Sqoop imports data from external structured datastores into HDFS or related systems like Hive and HBase.  Sqoop can also be used to export data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.
  • 4. Why Sqoop?  As more organizations deploy Hadoop to analyse vast streams of information, they may find they need to transfer large amount of data between Hadoop and their existing databases, data warehouses and other data sources  Loading bulk data into Hadoop from production systems or accessing it from map- reduce applications running on a large cluster is a challenging task since transferring data using scripts is a inefficient and time-consuming task  Allows data imports from external datastores and enterprise data warehouses into Hadoop  Parallelizes data transfer for fast performance and optimal system utilization  Copies data quickly from external systems to Hadoop  Makes data analysis more efficient
  • 7. Sqoop Import  sqoop import --connect jdbc:postgresql://hdp-master/sqoop_db --username sqoop_user --password postgres --table cities
  • 8. Sqoop Export  sqoop export --connect jdbc:postgresql://hdp-master/sqoop_db --username sqoop_user --password postgres --table cities --export-dir cities