SlideShare a Scribd company logo
Welcome to Sqoop
Sqoop
Sqoop - Introduction
Open source tool to efficiently transferring bulk data between
Hadoop and structured datastores such as MySQL, Oracle and
HBase
Sqoop
Sqoop - Tools
RDBMS
Oracle,
MySQL
PostgreSQL
HDFS
Hive
HBase
Import
Export
Sqoop Tool
Sqoop
Sqoop - Connectors
Available Connectors:
Include MySQL, PostgreSQL, Oracle, SQL Server, DB2.
Generic JDBC Connector - any database that support jdbc
Third Party too - Netezza, Teradata
Sqoop
Sqoop - Help
Go to shell:
>sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
Sqoop
Sqoop Import - MySQL to HDFS
Check the content of the imported File:
hadoop fs -cat widgets/part-m-00000
Also notice that widgets.java was created.
sqoop import --connect jdbc:mysql://ip-172-31-13-154/sqoopex --table
widgets -m 2 --username sqoopuser -P --split-by id
Sqoop
Sqoop - MySQL Connection
MySQL
Mapper 1on machine A
Mapper 2 on machine B
Mapper 3 on machine C
Sqoop
sqoop import --connect
jdbc:mysql://172.31.13.154/sqoopex --table widgets -m
2 --hive-import --username sqoopuser -P
--hive-database sqoop_testing
Sqoop Import - MySQL to Hive
Sqoop
sqoop import --connect jdbc:mysql://172.31.13.154/sqoopex --table widgets
--hbase-table 'widgets' --column-family cf2 --username sqoopuser -P
--hbase-create-table --columns id,widget_name,price --hbase-row-key
'widget_name' -m 1
Sqoop Import - MySQL to HBase
Sqoop
# Copy sales.log locally
hadoop fs -copyToLocal /data/hive/sales.log
# Create Hive Table:
CREATE TABLE sales_test(widget_id INT, qty INT,
street STRING, city STRING, state STRING,
zip INT, sale_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
# Load Data:
LOAD DATA LOCAL INPATH “sales.log” INTO TABLE sales_test;
# Select rows to see data:
select * from sales_test;
Sqoop Export - Hive to MySQL
Sqoop
#Create MYSQL Table:
CREATE TABLE sales_test(widget_id INT, qty INT,
street varchar(100), city varchar(100), state varchar(100),
zip INT, sale_date varchar(100))
# Sqoop Export:
sqoop export --connect jdbc:mysql://172.31.13.154/sqoopex -m 1
--table sales_test --export-dir /apps/hive/warehouse/sales_test
--input-fields-terminated-by ',' --username sqoopuser -P
Sqoop Export - Hive To MySQL
Sqoop
● Introduction
● Import
● Export
Sqoop - Summary
Sqoop
Sqoop - Introduction Contd.
Relational Databases
HBase
HDFS
Hive
Thank you!

More Related Content

PDF
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Advanced Sqoop
PDF
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
PDF
Introduction to scoop and its functions
PDF
SQL to Hive Cheat Sheet
PPTX
Data analysis scala_spark
PDF
Introduction to Apache Hive
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Linux | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Sqoop
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Introduction to scoop and its functions
SQL to Hive Cheat Sheet
Data analysis scala_spark
Introduction to Apache Hive

What's hot (20)

PDF
Installing Apache Hive, internal and external table, import-export
PDF
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
HiveServer2
PDF
Scala+data
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
PDF
Debugging & Tuning in Spark
PPT
11. From Hadoop to Spark 2/2
PDF
Introductive to Hive
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
SORT & JOIN IN SPARK 2.0
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
Cassandra and Spark
PPTX
Spark 1.6 vs Spark 2.0
PPT
Hive User Meeting August 2009 Facebook
PDF
Spark Cassandra Connector: Past, Present, and Future
PDF
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
PPTX
mesos-devoxx14
PDF
PySpark with Juypter
KEY
Cassandra and Rails at LA NoSQL Meetup
ODP
Cascalog internal dsl_preso
Installing Apache Hive, internal and external table, import-export
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
HiveServer2
Scala+data
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
Debugging & Tuning in Spark
11. From Hadoop to Spark 2/2
Introductive to Hive
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
SORT & JOIN IN SPARK 2.0
DataEngConf SF16 - Collecting and Moving Data at Scale
Cassandra and Spark
Spark 1.6 vs Spark 2.0
Hive User Meeting August 2009 Facebook
Spark Cassandra Connector: Past, Present, and Future
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
mesos-devoxx14
PySpark with Juypter
Cassandra and Rails at LA NoSQL Meetup
Cascalog internal dsl_preso
Ad

Similar to Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab (20)

PDF
Sqoop Explanation with examples and syntax
PDF
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
PDF
Real-Time Data Loading from MySQL to Hadoop
PPTX
Apache sqoop with an use case
PPTX
BigData - Apache Spark Sqoop Introduce Basic
PPTX
Windows Azure HDInsight Service
PDF
PDF
Hadoop sqoop
PPTX
Analysis of historical movie data by BHADRA
PDF
SQOOP PPT
PPTX
Get started with Microsoft SQL Polybase
PDF
AWS Lambda with Serverless Framework and Java
PPTX
Getting started with agile database migrations for java flywaydb
PDF
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
PDF
Working with Hive Analytics
PDF
Hive
PPTX
“Automation Testing for Embedded Systems”
PPTX
Introduction to sqoop
PPTX
מיכאל
PPTX
Session 14 - Hive
Sqoop Explanation with examples and syntax
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Real-Time Data Loading from MySQL to Hadoop
Apache sqoop with an use case
BigData - Apache Spark Sqoop Introduce Basic
Windows Azure HDInsight Service
Hadoop sqoop
Analysis of historical movie data by BHADRA
SQOOP PPT
Get started with Microsoft SQL Polybase
AWS Lambda with Serverless Framework and Java
Getting started with agile database migrations for java flywaydb
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Working with Hive Analytics
Hive
“Automation Testing for Embedded Systems”
Introduction to sqoop
מיכאל
Session 14 - Hive
Ad

More from CloudxLab (20)

PDF
Understanding computer vision with Deep Learning
PDF
Deep Learning Overview
PDF
Recurrent Neural Networks
PDF
Natural Language Processing
PDF
Naive Bayes
PDF
Autoencoders
PDF
Training Deep Neural Nets
PDF
Reinforcement Learning
PDF
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
PDF
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
PPTX
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
PPTX
Introduction to Deep Learning | CloudxLab
PPTX
Dimensionality Reduction | Machine Learning | CloudxLab
PPTX
Ensemble Learning and Random Forests
PPTX
Decision Trees
PPTX
Support Vector Machines
Understanding computer vision with Deep Learning
Deep Learning Overview
Recurrent Neural Networks
Natural Language Processing
Naive Bayes
Autoencoders
Training Deep Neural Nets
Reinforcement Learning
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction to Deep Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
Ensemble Learning and Random Forests
Decision Trees
Support Vector Machines

Recently uploaded (20)

PDF
Mushroom cultivation and it's methods.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
Mushroom cultivation and it's methods.pdf
Group 1 Presentation -Planning and Decision Making .pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TLE Review Electricity (Electricity).pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
cloud_computing_Infrastucture_as_cloud_p
Encapsulation_ Review paper, used for researhc scholars
SOPHOS-XG Firewall Administrator PPT.pptx
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25-Week II
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf

Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab

  • 2. Sqoop Sqoop - Introduction Open source tool to efficiently transferring bulk data between Hadoop and structured datastores such as MySQL, Oracle and HBase
  • 4. Sqoop Sqoop - Connectors Available Connectors: Include MySQL, PostgreSQL, Oracle, SQL Server, DB2. Generic JDBC Connector - any database that support jdbc Third Party too - Netezza, Teradata
  • 5. Sqoop Sqoop - Help Go to shell: >sqoop help Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information
  • 6. Sqoop Sqoop Import - MySQL to HDFS Check the content of the imported File: hadoop fs -cat widgets/part-m-00000 Also notice that widgets.java was created. sqoop import --connect jdbc:mysql://ip-172-31-13-154/sqoopex --table widgets -m 2 --username sqoopuser -P --split-by id
  • 7. Sqoop Sqoop - MySQL Connection MySQL Mapper 1on machine A Mapper 2 on machine B Mapper 3 on machine C
  • 8. Sqoop sqoop import --connect jdbc:mysql://172.31.13.154/sqoopex --table widgets -m 2 --hive-import --username sqoopuser -P --hive-database sqoop_testing Sqoop Import - MySQL to Hive
  • 9. Sqoop sqoop import --connect jdbc:mysql://172.31.13.154/sqoopex --table widgets --hbase-table 'widgets' --column-family cf2 --username sqoopuser -P --hbase-create-table --columns id,widget_name,price --hbase-row-key 'widget_name' -m 1 Sqoop Import - MySQL to HBase
  • 10. Sqoop # Copy sales.log locally hadoop fs -copyToLocal /data/hive/sales.log # Create Hive Table: CREATE TABLE sales_test(widget_id INT, qty INT, street STRING, city STRING, state STRING, zip INT, sale_date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; # Load Data: LOAD DATA LOCAL INPATH “sales.log” INTO TABLE sales_test; # Select rows to see data: select * from sales_test; Sqoop Export - Hive to MySQL
  • 11. Sqoop #Create MYSQL Table: CREATE TABLE sales_test(widget_id INT, qty INT, street varchar(100), city varchar(100), state varchar(100), zip INT, sale_date varchar(100)) # Sqoop Export: sqoop export --connect jdbc:mysql://172.31.13.154/sqoopex -m 1 --table sales_test --export-dir /apps/hive/warehouse/sales_test --input-fields-terminated-by ',' --username sqoopuser -P Sqoop Export - Hive To MySQL
  • 12. Sqoop ● Introduction ● Import ● Export Sqoop - Summary
  • 13. Sqoop Sqoop - Introduction Contd. Relational Databases HBase HDFS Hive