Import and Export Data using SQOOP
Last Updated :
10 Sep, 2020
SQOOP is basically used to transfer data from relational databases such as MySQL, Oracle to data warehouses such as Hadoop HDFS(Hadoop File System). Thus, when data is transferred from a relational database to HDFS, we say we are importing data. Otherwise, when we transfer data from HDFS to relational databases, we say we are exporting data.
Note: To import or export, the order of columns in both MySQL and Hive should be the same.
Importing data from MySQL to HDFS
In order to store data into HDFS, we make use of Apache Hive which provides an SQL-like interface between the user and the Hadoop distributed file system (HDFS) which integrates Hadoop. We perform the following steps:
Step 1: Login into MySQL
mysql -u root -pcloudera
Logging into MySQLStep 2: Create a database and table and insert data.
create database geeksforgeeeks;
create table geeksforgeeeks.geeksforgeeks(author_name varchar(65), total_no_of_articles int, phone_no int, address varchar(65));
insert into geeksforgeeks values("Rohan",10,123456789,"Lucknow");
Database Name : geeksforgeeeks and Table Name : geeksforgeeksStep 3: Create a database and table in the hive where data should be imported.
create table geeks_hive_table(name string, total_articles int, phone_no int, address string) row format delimited fields terminated by ',';
Hive Database : geeks_hive and Hive Table : geeks_hive_tableStep 4: Run below the import command on Hadoop.
sqoop import --connect \
jdbc:mysql://127.0.0.1:3306/database_name_in_mysql \
--username root --password cloudera \
--table table_name_in_mysql \
--hive-import --hive-table database_name_in_hive.table_name_in_hive \
--m 1
SQOOP Command to import DataIn the above code following things should be noted.
- 127.0.0.1 is localhost IP address.
- 3306 is the port number for MySQL.
- m is the number of mappers
Step 5: Check-in hive if data is imported successfully or not.
Data imported into hive successfully.Exporting data from HDFS to MySQL
To export data into MySQL from HDFS, perform the following steps:
Step 1: Create a database and table in the hive.
create table hive_table_export(name string,company string, phone int, age int) row format delimited fields terminated by ',';
Hive Database : hive_export and Hive Table : hive_table_export Step 2: Insert data into the hive table.
insert into hive_table_export values("Ritik","Amazon",234567891,35);
Data in Hive tableStep 3: Create a database and table in MySQL in which data should be exported.
MySQL Database : mysql_export and MySQL Table : mysql_table_exportStep 4: Run the following command on Hadoop.
sqoop export --connect \
jdbc:mysql://127.0.0.1:3306/database_name_in_mysql \
--table table_name_in_mysql \
--username root --password cloudera \
--export-dir /user/hive/warehouse/hive_database_name.db/table_name_in_hive \
--m 1 \
-- driver com.mysql.jdbc.Driver
--input-fields-terminated-by ','
SQOOP command to export dataIn the above code following things should be noted.
- 127.0.0.1 is the localhost IP address.
- 3306 is the port number for MySQL.
- In the case of exporting data, the entire path to the table should be specified
- m is the number of mappers
Step 5: Check-in MySQL if data is exported successfully or not.
Data exported into MySQL successfully
Similar Reads
Export and Import data in Cassandra
Prerequisite - Cassandra In this article, we are going to discuss how we can export and import data through cqlsh query. Let's discuss one by one. First, we are going to create table namely as Data in which id, firstname, lastname are the fields for sample exercise. Let's have a look. Table name: Da
3 min read
How to Import and Export Data to Database in MySQL Workbench?
Utilizing MySQL Workbench, a unified visual database designing or graphical user interface tool is necessary when working with database architects, developers, and administrators. It was made and is updated by Oracle. It offers comprehensive administration tools for server configuration, user admini
3 min read
How to Export data from Oracle Database?
An Oracle database is a collection of data treated as a unit. The sole purpose of a database is to store and retrieve related data. This database contains data in the form of tables which are distributed in rows and columns. In our day-to-day life, we have to work on data and while working on databa
2 min read
How to Import Data into Oracle Database ?
An Oracle database is a collection of data treated as a unit. The purpose of a database is to store and retrieve related information. Oracle database contains data in the form of tables in the form of rows and columns. In this article, we will see how to import data into the Oracle database. Here is
2 min read
Inserting data using a CSV file in Cassandra
In this article, we will discuss how you can insert data into the table using a CSV file. And we will also cover the implementation with the help of examples. Let's discuss it one by one. Pre-requisite - Introduction to Cassandra Introduction :If you want to store data in bulk then inserting data fr
2 min read
Data Manipulation in Cassandra
In this article, we will describe the following DML commands in Cassandra which help us to insert, update, delete, query data, etc. In CQL there are following data manipulation command. Letâs discuss one by one. 1. Insert 2. Update 3. Delete 4. Batch Letâs take an example: Table Name: Employee_info
3 min read
Database Operations in HIVE Using CLOUDERA - VMWARE Work Station
We are going to create a database and create a table in our database. And will cover Database operations in HIVE Using CLOUDERA - VMWARE Work Station. Let's discuss one by one. Introduction: Hive is an ETL tool that provides an SQL-like interface between the user and the Hadoop distributed file syst
2 min read
Difference between Data Warehouse and Hadoop
Data Warehouse and Hadoop are two commonly used technologies that serve as the repositories of large amounts of data. In their essence, while both aim at addressing the need for data storage and analysis they are quite distinct in their structure, performance, and applications. This article will fur
5 min read
Apache HIVE - Database Options
Apache hive is a data-warehousing tool built on top of Hadoop. The structured data can be handled with the Hive query language. In this article, we are going to see options that are available with databases in the Hive. The database is used for storing information. The hive will create a directory f
4 min read
Difference between Pig and Hive
1. Pig : Pig is used for the analysis of a large amount of data. It is abstract over MapReduce. Pig is used to perform all kinds of data manipulation operations in Hadoop. It provides the Pig-Latin language to write the code that contains many inbuilt functions like join, filter, etc. The two parts
2 min read