SlideShare a Scribd company logo
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn
What’s in it for you?
What is HBase?
1
What’s in it for you?
What is HBase?
1
HBase Use Case
2
What’s in it for you?
What is HBase?
Applications of HBase
1
HBase Use Case
2
3
What’s in it for you?
What is HBase?
Applications of HBase
HBase vs RDBMS
1
HBase Use Case
2
3
4
What’s in it for you?
What is HBase?
Applications of HBase
HBase Storage HBase vs RDBMS
1
HBase Use Case
2
3
45
What’s in it for you?
What is HBase?
Applications of HBase
HBase Storage
HBase Architectural
Components
HBase vs RDBMS
1
HBase Use Case
2
3
45
6
What’s in it for you?
What is HBase?
Applications of HBase
HBase Storage
HBase Architectural
Components
Demo on HBase
HBase vs RDBMS
1
HBase Use Case
2
3
45
6
7
Introduction to HBaseIntroduction to HBase
Introduction to HBase
This data could be easily
stored in a Relational
Database (RDMS)
Structured data
Back in the days, data
used to be less and was
mostly structured
Introduction to HBase
Then, Internet evolved and
huge volumes of
structured and semi-
structured data got
generated
Storing and processing this
data on RDBMS became a
major problem
Semi-structured
data
Introduction to HBase
Apache HBASE was
the solution for this
Semi-structured
data
SolutionThen, Internet evolved and
huge volumes of
structured and semi-
structured data got
generated
Introduction to HBaseHBase History
HBase History
1
Google released the
paper on BigTable
Nov 2006
HBase History
1
2
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
Nov 2006
Feb 2007
HBase History
1
2
3
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released
Nov 2006
Feb 2007
Oct 2007
HBase History
1
2
3
4
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released HBase became the
subproject of Hadoop
Nov 2006
Feb 2007
Oct 2007
Jan 2008
HBase History
1
2
3
4
5
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released HBase became the
subproject of Hadoop
HBase 0.81.1, 0.19.0
and 0.20.0 was
released between Oct
2008 – Sep 2009
Nov 2006
Feb 2007
Oct 2007
Jan 2008
Oct 2008 – Sep 2009
HBase History
1
2
3
4
5
6
Google released the
paper on BigTable HBase prototype was
created as a Hadoop
contribution
First usable HBase
along with Hadoop
0.15.0 was released HBase became the
subproject of Hadoop
HBase 0.81.1, 0.19.0
and 0.20.0 was
released between Oct
2008 – Sep 2009 HBase became Apache
top-level project
Nov 2006
Feb 2007
Oct 2007
Jan 2008
Oct 2008 – Sep 2009
May 2010
Introduction to HBaseWhat is HBase?
What is HBase?
HBase is a column oriented database management system derived from Google’s NoSQL database
BigTable that runs on top of HDFS
What is HBase?
HBase is a column oriented database management system derived from Google’s NoSQL database
BigTable that runs on top of HDFS
Open source project that is horizontally scalable1
What is HBase?
Open source project that is horizontally scalable1
2
HBase is a column oriented database management system derived from Google’s NoSQL database
BigTable that runs on top of HDFS
NoSQL database written in JAVA which performs
faster querying
What is HBase?
Open source project that is horizontally scalable
NoSQL database written in JAVA which performs
faster querying
Well suited for sparse data sets
(can contain missing or NA values)
1
2
3
HBase is a column oriented database management system derived from Google’s NoSQL database
BigTable that runs on top of HDFS
Introduction to HBaseCompanies using HBase
Companies using HBase
Introduction to HBaseHBase Use Case
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
Generated billions of Call
Detail Records (CDR)
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
Traditional database systems were
unable to scale up to the vast volumes
of data and provide a cost-effective
solution
Generated billions of Call
Detail Records (CDR)
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
Generated billions of Call
Detail Records (CDR)
Traditional database systems were
unable to scale up to the vast volumes
of data and provide a cost-effective
solution
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
Storing and real-time analysis of billions
of call records was a major problem
Generated billions of Call
Detail Records (CDR)
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
HBase stores billions of rows of detailed
call records
Solution
Generated billions of Call
Detail Records (CDR)
HBase Use Case
Telecommunication company
that provides mobile voice and
multimedia services across
China
HBase performs fast processing of
records using SQL queries
Generates billions of Call Detail
Records (CDR)
Introduction to HBaseApplications of HBase
Applications of HBase
Medical
HBase is used for storing genome
sequences
Storing disease history of people
or an area
Applications of HBase
Medical E-Commerce
HBase is used for storing genome
sequences
Storing disease history of people
or an area
HBase is used for storing logs about
customer search history
Performs analytics and target
advertisement for better business insights
Applications of HBase
Medical E-Commerce Sports
HBase is used for storing genome
sequences
Storing disease history of people
or an area
HBase is used for storing logs about
customer search history
Performs analytics and target
advertisement for better business insights
HBase stores match details and history of
each match
Uses this data for better prediction
Introduction to HBaseHBase vs RDBMS
HBase vs RDBMS
Does not have a fixed schema (schema-less). Defines only
column families
Has a fixed schema which describes the structure of the
tables
HBase RDBMS
HBase vs RDBMS
Does not have a fixed schema (schema-less). Defines only
column families
Has a fixed schema which describes the structure of the
tables
Works well with structured and semi-structured data Works well with structured data
HBase RDBMS
HBase vs RDBMS
Does not have a fixed schema (schema-less). Defines only
column families
Has a fixed schema which describes the structure of the
tables
Works well with structured and semi-structured data Works well with structured data
RDBMS can store only normalized data
HBase RDBMS
It can have de-normalized data
(can contain missing or NA values)
HBase vs RDBMS
Does not have a fixed schema (schema-less). Defines only
column families
Has a fixed schema which describes the structure of the
tables
Works well with structured and semi-structured data Works well with structured data
It can have de-normalized data
(can contain missing or NA values)
RDBMS can store only normalized data
Built for wide tables that can be scaled horizontally Built for thin tables that is hard to scale
HBase RDBMS
Introduction to HBaseFeatures of HBase
Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS
Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure
Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure
Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
JAVA API for client
access
Provides easy to use
JAVA API for clients
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure
Features of HBase
Scalable
Data can be scaled
across various
nodes as it is stored
in HDFS
Consistent read and
write
HBase provides
consistent read and
write of data
JAVA API for client
access
Provides easy to use
JAVA API for clients
Automatic failure
support
Write Ahead Log
across clusters
which provides
automatic support
against failure
Block cache and
bloom filters
Supports block
cache and bloom
filters for high
volume query
optimization
Introduction to HBaseHBase Storage
HBase column oriented storage
Column Family 1 Column Family 2 Column Family 3
Rowid
Col 1 Col 2
Row 1
Row 2
Row 3
Col 3 Col 3Col 1 Col 2 Col 3Col 1 Col 2
Row Key Column Family
Column
Qualifiers
Cells
HBase column oriented storage
Personal data Professional dataRowid
name
1
2
3
Row Key Column Family
Column
Qualifiers
Cells
city age salaryempid
Angela
Dwayne
David
Chicago
Boston
Seattle
31
35
29
Data
Analyst
Web
Developer
Big Data
Architect
$70,000
$65,000
$55,000
designation
Introduction to HBaseHBase Architecture
HBase Architectural Components
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
Region Server
HLog
MemStore
StoreFile StoreFile
HFile HFile
StoreRegion
HDFS
HMaster
HBase Master assigns
regions and load
balancing
ZooKeeper is used for
monitoring
Region server serves data
for read and write
HBase Architectural Components - Regions
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
startKey
endKey endKey
Client
HBase tables are divided horizontally by row key
range into “Regions”
A region contains all rows in the table between the
region’s start key and end key
Regions are assigned to the nodes in the cluster,
called “Region Servers”
These servers serve data for read and write
startKey
get
Region Server 1 Region Server 2
HBase Architectural Components - HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ClientRegion assignment, Data Definition Language operation
(create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery or
load balancing and monitoring all servers
Region Server 1 Region Server 2
HMaster
create, delete, update
table
HBase Architectural Components - HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ClientRegion assignment, Data Definition Language operation
(create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery or
load balancing and monitoring all servers
Region Server 1 Region Server 2
HMaster
create, delete, update
table
Monitors region
servers
HBase Architectural Components - HMaster
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ClientRegion assignment, Data Definition Language operation
(create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery or
load balancing and monitoring all servers
Region Server 1 Region Server 2
HMaster
create, delete, update
table
Monitors region
servers
Assigns regions to
region servers
HBase has a distributed environment where HMaster alone is not sufficient to
manage everything. Hence, ZooKeeper was introduced
Assigns regions to
region servers
Inactive
HMaster
HBase Architectural Components - ZooKeeper
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
Zookeeper maintains which servers are alive and
available, and provides server failure notification
Region Server 1 Region Server 2
Active
HMaster
ZooKeeper
Active HMaster sends a heartbeat signal to ZooKeeper indicating that
its active
Inactive
HMaster
HBase Architectural Components - ZooKeeper
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
Zookeeper maintains which servers are alive and
available, and provides server failure notification
Region Server 1 Region Server 2
Active
HMaster
heartbeat
Region servers send their status to ZooKeeper indicating they are
ready for read and write operation
ZooKeeper
Inactive
HMaster
HBase Architectural Components - ZooKeeper
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
Zookeeper maintains which servers are alive and
available, and provides server failure notification
Region Server 1 Region Server 2
Active
HMaster
heartbeat
Inactive server acts as a backup. If the active HMaster fails, it will come
to rescue
ZooKeeper
How the components work together?
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
Region Server 1 Region Server 2
HMaster
ZooKeeper
1 master is
active
• Active HMaster selection
• Region Server session
Active HMaster and Region Servers connect with a session to ZooKeeper
How the components work together?
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
Region Server 1 Region Server 2
HMaster
heartbeat
1 master is
active
• Active HMaster selection
• Region Server session
Active HMaster and Region Servers connect with a session to ZooKeeper
ZooKeeper
How the components work together?
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Key col col
xxx val Val
xxx val Val
Key col col
xxx val val
xxx val val
Region 1 Region 2 Region 3 Region 4
……... ….….
Region Server 1 Region Server 2
HMaster
heartbeat
1 master is
active
Ephemeral
node
Ephemeral
node
• Active HMaster selection
• Region Server session
ZooKeeper maintains ephemeral nodes for active sessions via
heartbeats to indicate that region servers are up and running
ZooKeeper
Introduction to HBaseHBase Read or Write
HBase Read or Write
ZooKeeper
.META location is stored in
ZooKeeper
There is a special HBase Catalog table called the META table, which holds the location of the
regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
Client
Region Server Region Server
DataNode DataNode
The client gets the Region Server
that hosts the META table from
ZooKeeper
Request for
Region Server
HBase Read or Write
ZooKeeper
.META location is stored in
ZooKeeper
There is a special HBase Catalog table called the META table, which holds the location of the
regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
Client
Region Server Region Server
DataNode DataNode
Meta table
location
The client gets the Region Server
that hosts the META table from
ZooKeeper
Request for
Region Server
HBase Read or Write
ZooKeeper
There is a special HBase Catalog table called the META table, which holds the location of the
regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
Client
Meta Cache
Region Server Region Server
DataNode DataNode
The client will query the .META
server to get the region server
corresponding to the row key it
wants to access
The client caches this information
along with the META table
location
Meta table
location
Request for
Region Server
Get region server for row key
from meta table
.META location is stored in
ZooKeeper
HBase Read or Write
ZooKeeper
There is a special HBase Catalog table called the META table, which holds the location of the
regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
Client
Region Server Region Server
DataNode DataNode
Put row
Meta Cache
It will get the Row from the
corresponding Region Server
Get region server for row key
from meta table
.META location is stored in
ZooKeeper
Meta table
location
Request for
Region Server
Get row
Introduction to HBaseHBase Meta Table
HBase Meta Table
Meta Table
Row key value
table, key, region region server
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Region 1 Region 2
Region Server
Key col col
xxx val val
xxx val val
Key col col
xxx val val
xxx val val
Region 3 Region 4
Region Server
Special HBase catalog table that
maintains a list of all the Region
Servers in the HBase storage
system
META table is used to find the
Region for a given Table key
Introduction to HBaseHBase Write Mechanism
HBase Write Mechanism
WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
When client issues a put request, it will write the data to the write-ahead log (WAL)1
Write Ahead Log (WAL) is a file
used to store new data that is yet to
be put on permanent storage. It is
used for recovery is the case of
failure.
HBase Write Mechanism
WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
2
Once data is written to the WAL, it is then copied to the MemStore2
MemStore is the write cache that
stores new data that has not yet
been written to disk. There is one
MemStore per column family per
region.
HBase Write Mechanism
WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
3 ACK
2
Once the data is placed in MemStore, the client then receives the acknowledgment3
HBase Write Mechanism
WAL
Region Server
Region
MemStore MemStore
HFile HFile
HDFS DataNodeClient
1
3 ACK
2
4 4
When the MemStore reaches the threshold, it dumps or commits the data into a HFile4
Hfiles store the rows of data as
sorted KeyValue on disk
Introduction to HBaseDemo on HBase
Key Takeaways
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

More Related Content

PPTX
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
PPTX
Apache HBase™
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
PPT
Hadoop HDFS.ppt
PPTX
Introduction to Pig
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Apache HBase™
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Introduction to Big Data & Hadoop Architecture - Module 1
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hadoop HDFS.ppt
Introduction to Pig

What's hot (20)

PPT
Hive(ppt)
PPTX
Apache hive introduction
PDF
Intro to HBase
PPTX
Apache hive
PPTX
PPT
Chicago Data Summit: Apache HBase: An Introduction
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Introduction to HBase
PPTX
Securing Hadoop with Apache Ranger
PDF
Hadoop Overview & Architecture
 
PDF
SQOOP PPT
PPTX
Hadoop HDFS Architeture and Design
PPTX
Unit 5-apache hive
PDF
Introduction to Apache Hive
PPTX
Hadoop File system (HDFS)
PPTX
Introduction To HBase
PPTX
Apache Spark Architecture
PPTX
Impala presentation
PPTX
Graph databases
PPTX
Introduction to HDFS
Hive(ppt)
Apache hive introduction
Intro to HBase
Apache hive
Chicago Data Summit: Apache HBase: An Introduction
Hive + Tez: A Performance Deep Dive
Introduction to HBase
Securing Hadoop with Apache Ranger
Hadoop Overview & Architecture
 
SQOOP PPT
Hadoop HDFS Architeture and Design
Unit 5-apache hive
Introduction to Apache Hive
Hadoop File system (HDFS)
Introduction To HBase
Apache Spark Architecture
Impala presentation
Graph databases
Introduction to HDFS
Ad

Similar to HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn (20)

PDF
Intro to HBase - Lars George
PPTX
Hadoop - Apache Hbase
PPTX
HBase.pptx
PPTX
H-Base in Data Base Mangement System
PPTX
PPTX
Apache HBase - Introduction & Use Cases
ODP
Apache hadoop hbase
PPTX
Hbasepreso 111116185419-phpapp02
PPTX
PDF
Nyc hadoop meetup introduction to h base
PPT
HBASE Overview
PDF
Conhecendo o Apache HBase
ODP
HBase introduction talk
PPTX
PPTX
PDF
PPTX
Introduction to Apache HBase
PPTX
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
PDF
Hbase: an introduction
PDF
Apache HBase
Intro to HBase - Lars George
Hadoop - Apache Hbase
HBase.pptx
H-Base in Data Base Mangement System
Apache HBase - Introduction & Use Cases
Apache hadoop hbase
Hbasepreso 111116185419-phpapp02
Nyc hadoop meetup introduction to h base
HBASE Overview
Conhecendo o Apache HBase
HBase introduction talk
Introduction to Apache HBase
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
Hbase: an introduction
Apache HBase
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Open folder Downloads.pdf yes yes ges yes
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Structure & Organelles in detailed.
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Pre independence Education in Inndia.pdf
Anesthesia in Laparoscopic Surgery in India
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Open folder Downloads.pdf yes yes ges yes
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial disease of the cardiovascular and lymphatic systems
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Pharma ospi slides which help in ospi learning
Microbial diseases, their pathogenesis and prophylaxis
GDM (1) (1).pptx small presentation for students
Pharmacology of Heart Failure /Pharmacotherapy of CHF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O7-L3 Supply Chain Operations - ICLT Program
Introduction-to-Social-Work-by-Leonora-Serafeca-De-Guzman-Group-2.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing
Cell Structure & Organelles in detailed.
Cardiovascular Pharmacology for pharmacy students.pptx
human mycosis Human fungal infections are called human mycosis..pptx
01-Introduction-to-Information-Management.pdf
Pre independence Education in Inndia.pdf

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop Tutorial | Simplilearn

  • 2. What’s in it for you? What is HBase? 1
  • 3. What’s in it for you? What is HBase? 1 HBase Use Case 2
  • 4. What’s in it for you? What is HBase? Applications of HBase 1 HBase Use Case 2 3
  • 5. What’s in it for you? What is HBase? Applications of HBase HBase vs RDBMS 1 HBase Use Case 2 3 4
  • 6. What’s in it for you? What is HBase? Applications of HBase HBase Storage HBase vs RDBMS 1 HBase Use Case 2 3 45
  • 7. What’s in it for you? What is HBase? Applications of HBase HBase Storage HBase Architectural Components HBase vs RDBMS 1 HBase Use Case 2 3 45 6
  • 8. What’s in it for you? What is HBase? Applications of HBase HBase Storage HBase Architectural Components Demo on HBase HBase vs RDBMS 1 HBase Use Case 2 3 45 6 7
  • 10. Introduction to HBase This data could be easily stored in a Relational Database (RDMS) Structured data Back in the days, data used to be less and was mostly structured
  • 11. Introduction to HBase Then, Internet evolved and huge volumes of structured and semi- structured data got generated Storing and processing this data on RDBMS became a major problem Semi-structured data
  • 12. Introduction to HBase Apache HBASE was the solution for this Semi-structured data SolutionThen, Internet evolved and huge volumes of structured and semi- structured data got generated
  • 14. HBase History 1 Google released the paper on BigTable Nov 2006
  • 15. HBase History 1 2 Google released the paper on BigTable HBase prototype was created as a Hadoop contribution Nov 2006 Feb 2007
  • 16. HBase History 1 2 3 Google released the paper on BigTable HBase prototype was created as a Hadoop contribution First usable HBase along with Hadoop 0.15.0 was released Nov 2006 Feb 2007 Oct 2007
  • 17. HBase History 1 2 3 4 Google released the paper on BigTable HBase prototype was created as a Hadoop contribution First usable HBase along with Hadoop 0.15.0 was released HBase became the subproject of Hadoop Nov 2006 Feb 2007 Oct 2007 Jan 2008
  • 18. HBase History 1 2 3 4 5 Google released the paper on BigTable HBase prototype was created as a Hadoop contribution First usable HBase along with Hadoop 0.15.0 was released HBase became the subproject of Hadoop HBase 0.81.1, 0.19.0 and 0.20.0 was released between Oct 2008 – Sep 2009 Nov 2006 Feb 2007 Oct 2007 Jan 2008 Oct 2008 – Sep 2009
  • 19. HBase History 1 2 3 4 5 6 Google released the paper on BigTable HBase prototype was created as a Hadoop contribution First usable HBase along with Hadoop 0.15.0 was released HBase became the subproject of Hadoop HBase 0.81.1, 0.19.0 and 0.20.0 was released between Oct 2008 – Sep 2009 HBase became Apache top-level project Nov 2006 Feb 2007 Oct 2007 Jan 2008 Oct 2008 – Sep 2009 May 2010
  • 21. What is HBase? HBase is a column oriented database management system derived from Google’s NoSQL database BigTable that runs on top of HDFS
  • 22. What is HBase? HBase is a column oriented database management system derived from Google’s NoSQL database BigTable that runs on top of HDFS Open source project that is horizontally scalable1
  • 23. What is HBase? Open source project that is horizontally scalable1 2 HBase is a column oriented database management system derived from Google’s NoSQL database BigTable that runs on top of HDFS NoSQL database written in JAVA which performs faster querying
  • 24. What is HBase? Open source project that is horizontally scalable NoSQL database written in JAVA which performs faster querying Well suited for sparse data sets (can contain missing or NA values) 1 2 3 HBase is a column oriented database management system derived from Google’s NoSQL database BigTable that runs on top of HDFS
  • 28. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China
  • 29. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China Generated billions of Call Detail Records (CDR)
  • 30. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China Traditional database systems were unable to scale up to the vast volumes of data and provide a cost-effective solution Generated billions of Call Detail Records (CDR)
  • 31. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China Generated billions of Call Detail Records (CDR) Traditional database systems were unable to scale up to the vast volumes of data and provide a cost-effective solution
  • 32. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China Storing and real-time analysis of billions of call records was a major problem Generated billions of Call Detail Records (CDR)
  • 33. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China HBase stores billions of rows of detailed call records Solution Generated billions of Call Detail Records (CDR)
  • 34. HBase Use Case Telecommunication company that provides mobile voice and multimedia services across China HBase performs fast processing of records using SQL queries Generates billions of Call Detail Records (CDR)
  • 36. Applications of HBase Medical HBase is used for storing genome sequences Storing disease history of people or an area
  • 37. Applications of HBase Medical E-Commerce HBase is used for storing genome sequences Storing disease history of people or an area HBase is used for storing logs about customer search history Performs analytics and target advertisement for better business insights
  • 38. Applications of HBase Medical E-Commerce Sports HBase is used for storing genome sequences Storing disease history of people or an area HBase is used for storing logs about customer search history Performs analytics and target advertisement for better business insights HBase stores match details and history of each match Uses this data for better prediction
  • 40. HBase vs RDBMS Does not have a fixed schema (schema-less). Defines only column families Has a fixed schema which describes the structure of the tables HBase RDBMS
  • 41. HBase vs RDBMS Does not have a fixed schema (schema-less). Defines only column families Has a fixed schema which describes the structure of the tables Works well with structured and semi-structured data Works well with structured data HBase RDBMS
  • 42. HBase vs RDBMS Does not have a fixed schema (schema-less). Defines only column families Has a fixed schema which describes the structure of the tables Works well with structured and semi-structured data Works well with structured data RDBMS can store only normalized data HBase RDBMS It can have de-normalized data (can contain missing or NA values)
  • 43. HBase vs RDBMS Does not have a fixed schema (schema-less). Defines only column families Has a fixed schema which describes the structure of the tables Works well with structured and semi-structured data Works well with structured data It can have de-normalized data (can contain missing or NA values) RDBMS can store only normalized data Built for wide tables that can be scaled horizontally Built for thin tables that is hard to scale HBase RDBMS
  • 45. Features of HBase Scalable Data can be scaled across various nodes as it is stored in HDFS
  • 46. Features of HBase Scalable Data can be scaled across various nodes as it is stored in HDFS Automatic failure support Write Ahead Log across clusters which provides automatic support against failure
  • 47. Features of HBase Scalable Data can be scaled across various nodes as it is stored in HDFS Consistent read and write HBase provides consistent read and write of data Automatic failure support Write Ahead Log across clusters which provides automatic support against failure
  • 48. Features of HBase Scalable Data can be scaled across various nodes as it is stored in HDFS Consistent read and write HBase provides consistent read and write of data JAVA API for client access Provides easy to use JAVA API for clients Automatic failure support Write Ahead Log across clusters which provides automatic support against failure
  • 49. Features of HBase Scalable Data can be scaled across various nodes as it is stored in HDFS Consistent read and write HBase provides consistent read and write of data JAVA API for client access Provides easy to use JAVA API for clients Automatic failure support Write Ahead Log across clusters which provides automatic support against failure Block cache and bloom filters Supports block cache and bloom filters for high volume query optimization
  • 51. HBase column oriented storage Column Family 1 Column Family 2 Column Family 3 Rowid Col 1 Col 2 Row 1 Row 2 Row 3 Col 3 Col 3Col 1 Col 2 Col 3Col 1 Col 2 Row Key Column Family Column Qualifiers Cells
  • 52. HBase column oriented storage Personal data Professional dataRowid name 1 2 3 Row Key Column Family Column Qualifiers Cells city age salaryempid Angela Dwayne David Chicago Boston Seattle 31 35 29 Data Analyst Web Developer Big Data Architect $70,000 $65,000 $55,000 designation
  • 54. HBase Architectural Components Region Server HLog MemStore StoreFile StoreFile HFile HFile StoreRegion Region Server HLog MemStore StoreFile StoreFile HFile HFile StoreRegion Region Server HLog MemStore StoreFile StoreFile HFile HFile StoreRegion HDFS HMaster HBase Master assigns regions and load balancing ZooKeeper is used for monitoring Region server serves data for read and write
  • 55. HBase Architectural Components - Regions Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. startKey endKey endKey Client HBase tables are divided horizontally by row key range into “Regions” A region contains all rows in the table between the region’s start key and end key Regions are assigned to the nodes in the cluster, called “Region Servers” These servers serve data for read and write startKey get Region Server 1 Region Server 2
  • 56. HBase Architectural Components - HMaster Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ClientRegion assignment, Data Definition Language operation (create, delete) are handled by HMaster Assigning and re-assigning regions for recovery or load balancing and monitoring all servers Region Server 1 Region Server 2 HMaster create, delete, update table
  • 57. HBase Architectural Components - HMaster Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ClientRegion assignment, Data Definition Language operation (create, delete) are handled by HMaster Assigning and re-assigning regions for recovery or load balancing and monitoring all servers Region Server 1 Region Server 2 HMaster create, delete, update table Monitors region servers
  • 58. HBase Architectural Components - HMaster Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ClientRegion assignment, Data Definition Language operation (create, delete) are handled by HMaster Assigning and re-assigning regions for recovery or load balancing and monitoring all servers Region Server 1 Region Server 2 HMaster create, delete, update table Monitors region servers Assigns regions to region servers HBase has a distributed environment where HMaster alone is not sufficient to manage everything. Hence, ZooKeeper was introduced Assigns regions to region servers
  • 59. Inactive HMaster HBase Architectural Components - ZooKeeper Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ZooKeeper is a distributed coordination service to maintain server state in the cluster Zookeeper maintains which servers are alive and available, and provides server failure notification Region Server 1 Region Server 2 Active HMaster ZooKeeper Active HMaster sends a heartbeat signal to ZooKeeper indicating that its active
  • 60. Inactive HMaster HBase Architectural Components - ZooKeeper Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ZooKeeper is a distributed coordination service to maintain server state in the cluster Zookeeper maintains which servers are alive and available, and provides server failure notification Region Server 1 Region Server 2 Active HMaster heartbeat Region servers send their status to ZooKeeper indicating they are ready for read and write operation ZooKeeper
  • 61. Inactive HMaster HBase Architectural Components - ZooKeeper Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. ZooKeeper is a distributed coordination service to maintain server state in the cluster Zookeeper maintains which servers are alive and available, and provides server failure notification Region Server 1 Region Server 2 Active HMaster heartbeat Inactive server acts as a backup. If the active HMaster fails, it will come to rescue ZooKeeper
  • 62. How the components work together? Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. Region Server 1 Region Server 2 HMaster ZooKeeper 1 master is active • Active HMaster selection • Region Server session Active HMaster and Region Servers connect with a session to ZooKeeper
  • 63. How the components work together? Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. Region Server 1 Region Server 2 HMaster heartbeat 1 master is active • Active HMaster selection • Region Server session Active HMaster and Region Servers connect with a session to ZooKeeper ZooKeeper
  • 64. How the components work together? Key col col xxx val val xxx val val Key col col xxx val val xxx val val Key col col xxx val Val xxx val Val Key col col xxx val val xxx val val Region 1 Region 2 Region 3 Region 4 ……... ….…. Region Server 1 Region Server 2 HMaster heartbeat 1 master is active Ephemeral node Ephemeral node • Active HMaster selection • Region Server session ZooKeeper maintains ephemeral nodes for active sessions via heartbeats to indicate that region servers are up and running ZooKeeper
  • 65. Introduction to HBaseHBase Read or Write
  • 66. HBase Read or Write ZooKeeper .META location is stored in ZooKeeper There is a special HBase Catalog table called the META table, which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase Client Region Server Region Server DataNode DataNode The client gets the Region Server that hosts the META table from ZooKeeper Request for Region Server
  • 67. HBase Read or Write ZooKeeper .META location is stored in ZooKeeper There is a special HBase Catalog table called the META table, which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase Client Region Server Region Server DataNode DataNode Meta table location The client gets the Region Server that hosts the META table from ZooKeeper Request for Region Server
  • 68. HBase Read or Write ZooKeeper There is a special HBase Catalog table called the META table, which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase Client Meta Cache Region Server Region Server DataNode DataNode The client will query the .META server to get the region server corresponding to the row key it wants to access The client caches this information along with the META table location Meta table location Request for Region Server Get region server for row key from meta table .META location is stored in ZooKeeper
  • 69. HBase Read or Write ZooKeeper There is a special HBase Catalog table called the META table, which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase Client Region Server Region Server DataNode DataNode Put row Meta Cache It will get the Row from the corresponding Region Server Get region server for row key from meta table .META location is stored in ZooKeeper Meta table location Request for Region Server Get row
  • 71. HBase Meta Table Meta Table Row key value table, key, region region server Key col col xxx val val xxx val val Key col col xxx val val xxx val val Region 1 Region 2 Region Server Key col col xxx val val xxx val val Key col col xxx val val xxx val val Region 3 Region 4 Region Server Special HBase catalog table that maintains a list of all the Region Servers in the HBase storage system META table is used to find the Region for a given Table key
  • 72. Introduction to HBaseHBase Write Mechanism
  • 73. HBase Write Mechanism WAL Region Server Region MemStore MemStore HFile HFile HDFS DataNodeClient 1 When client issues a put request, it will write the data to the write-ahead log (WAL)1 Write Ahead Log (WAL) is a file used to store new data that is yet to be put on permanent storage. It is used for recovery is the case of failure.
  • 74. HBase Write Mechanism WAL Region Server Region MemStore MemStore HFile HFile HDFS DataNodeClient 1 2 Once data is written to the WAL, it is then copied to the MemStore2 MemStore is the write cache that stores new data that has not yet been written to disk. There is one MemStore per column family per region.
  • 75. HBase Write Mechanism WAL Region Server Region MemStore MemStore HFile HFile HDFS DataNodeClient 1 3 ACK 2 Once the data is placed in MemStore, the client then receives the acknowledgment3
  • 76. HBase Write Mechanism WAL Region Server Region MemStore MemStore HFile HFile HDFS DataNodeClient 1 3 ACK 2 4 4 When the MemStore reaches the threshold, it dumps or commits the data into a HFile4 Hfiles store the rows of data as sorted KeyValue on disk

Editor's Notes