SlideShare a Scribd company logo
Referent
Einrichtung Titel des Vortrages 1
WP-Benchmarking Top NoSQL
Databases
Apache Cassandra, Apache HBase and MongoDB
Presented By
Athiq Ahamed
Supriya
Referent
Einrichtung Titel des Vortrages 2
Introduction
 Enormous amount of data-BigData
 Scalabilty issue in RDBMS
 Rise of NoSQL databases
 Amazon Dynamo
 Big table
 CAP Theorem
 BASE system
Referent
Einrichtung Titel des Vortrages 3
CAP Theorem
 Consistency
 Availability
 Partition tolerance
CAP theorem states that only two of the properties can be
achieved at a time.
Referent
Einrichtung Titel des Vortrages 4
RDBMS NoSQL
Supports powerful query
language
Supports very simple query
language
It has a fixed schema No fixed schema
Follows ACID (Atomicity,
Consistency, Isolation and
Durability)
It is only eventually consistent
Supports transactions Does not support transactions
RDBMS vs NoSQL
Content:tutorialspoint.com
Referent
Einrichtung Titel des Vortrages 5
 Basically available: System guarantees availability, in
terms of the CAP theorem
 Soft state: State of the system may change over time,
because of eventual consistency model
 Eventual consistency: System will become consistent over
time
BASE
Content:www.edureka.in
Referent
Einrichtung Titel des Vortrages 6
 Fast Performance is the key.
 POC processes include right benchmarks:
 Configurations
 Parameters
 Workloads
Making the right choice!
Selection of NoSQL
Referent
Einrichtung Titel des Vortrages 7
 Yahoo Cloud Serving Benchmark (YCSB)
 Top 3 NoSQL databases-Apache Cassandra, Apache
Hbase and MongoDB.
 Amazon Web Services EC2 instances for hosting the tests
 Test performed 3 times on 3 different days
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 8
 The tests ran on large size instances (15GB RAM and 4
CPU cores)
 Instances used customized Ubuntu with Oracle Java 1.6
installed as a base.
 A customized script written to drive the benchmark
processes
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 9
 Each NoSQL system performs differently, not alike.
 Components and Internal working.
 Apache Cassandra: Columnar database model
 Apache HBase: Columnar database model
 MongoDB: Document storage database model
Understanding NoSQL Databases
Referent
Einrichtung Titel des Vortrages 10
Apache Cassandra
 Cassandra is scalable, fault-tolerant, and consistent. All
nodes are equal.
 Its distribution design is based on Amazon’s Dynamo and
its data model on Google’s Bigtable.
 Key components: Node, Cluster, Commit log, Mem-table,
SSTable and Bloom filter
Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
Referent
Einrichtung Titel des Vortrages 11
 Ring structure, peer to peer architecture
 All nodes are equal
 This improves general database availablity
 Scaling up and scaling down is easier
 Cassandra has key-value, column oriented database
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 12
Apache Cassandra
Content:http://demoiselle.sourceforge.net/component/demoiselle-
cassandra/1.0.0/images/datamodel1.png
Referent
Einrichtung Titel des Vortrages 13
 Cassandra has an internal keyspace called system, stores
metadata about the cluster.
 Metadata:
 The node‘s token
 The cluster name
 Keyspace n schema definitions (dynamic loading)
 Whether or not the node is bootstrapped
Apache Cassandra
Content:https://www.edureka.co/blog/category/apache-cassandra/
Referent
Einrichtung Titel des Vortrages 14
 Commit log: Crash recovery mechanism. Every write
operation is written to commit log
 Mem-Table: A memory resident data structure.
 SSTable: It is a disk file to which the data is flushed from
the mem-table
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 15
 Bloom filters are used as a performance booster
 Bloom filter are very fast, quick algorithms for testing a
member in the set.
 Bloom filters serves as a special kind of cache – quick
lookups/search as they reside in memory
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 16
 Gossip protocol: Communiction between nodes, co-
ordination and failure check
 Anti-Entropy protocol: Replica sync mechanism enusing
data on different nodes are updated (Merkle trees)
 Snitches ensures host proximity
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 17
Apache Cassandra- Read/Write operation
Referent
Einrichtung Titel des Vortrages 18
 Sparse, distributed, sorted map and multidimensional and
consistent.
 Hbase is a Key/value store
 Consists Row key, Column family, columns and timestamp.
Apache HBase
Referent
Einrichtung Titel des Vortrages 19
Apache HBase
Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
Referent
Einrichtung Titel des Vortrages 20
 Region: Contiguous rows form a region
 Region server(RS): Serves one or more regions.
 Master server: Daemon responsible for managing Hbase
cluster
 HDFS: Distributed, open source file system containing
HBase‘s data
 Zookeeper: Distributed, open source co-ordinated service
for co-ordination of master and region servers.
Apache HBase Components
Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
Referent
Einrichtung Titel des Vortrages 21
Apache Hbase Architecture
Referent
Einrichtung Titel des Vortrages 22
 Client obtains meta table RS from Zookeeper
 Client gets RS which holds the corresponding rowkey
 Client receives the row from the respective Region server
 Client caches this information along with the location of
meta table server.
First Read/Write to HBase
Referent
Einrichtung Titel des Vortrages 23
 WAL: Write Ahead Log is a file on the distributed file
system. It is used to store new data
 Block Cache: It is the read cache. It stores frequently
read data in memory
 Mem Store: Write cache that stores new data which is not
written to disk yet.
 Hfiles stores the rows as sorted key values on disk
HBase RS Components
Referent
Einrichtung Titel des Vortrages 24
 Client writes the data to the WAL file stored on disk
 WAL is used to recover not yet persisted data in case a
server crashes.
 Once data is written to WAL, it is placed in Mem Store
Hbase Write steps (1)
Referent
Einrichtung Titel des Vortrages 25
 All write/read are to/from the primary node.
 HDFS replicates WAL and Hfile blocks. Replication
happens automatically.
 When data is written in HDFS, one copy is written locally
and then it is replicated to a secondary node and later to
tertiary node.
HDFS Write steps (2)
Referent
Einrichtung Titel des Vortrages 26
 Cassandra usecase: Availability and Partition tolerant
requirements.
Consistency is tunable by setting it high in the option
 Hbase usecase: Consistency and Scalability. However, at
less number of nodes/threads, availability is achieved high
Cassandra and Hbase
Referent
Einrichtung Titel des Vortrages 27
 Document-oriented database
 High performance and automatic scaling
 High consistency and partition tolerant
 Replication and failover for high availability
 Low latency
 Flexible indexing
MongoDB
Referent
Einrichtung Titel des Vortrages 28
 Document is the basic unit for MongoDB(row)
 Collection is similar to a table
 A single instance has multiple independent databases
 Every document has a special key, “_id”
 Powerful JavaScript shell for administration
 Configdb contains metadata of clusters
MongoDB Concepts
Referent
Einrichtung Titel des Vortrages 29
MongoDB Simple Architecture
Referent
Einrichtung Titel des Vortrages 30
 A mongo receives queries from applications
 Uses metadata from config server for the data
 Mangos directs write operations to a particular shard
 Mongos uses the cluster metadata from the config
database
Read/Write MongoDB
Referent
Einrichtung Titel des Vortrages 31
 Scalability
 Availability
 Partition Tolerant
 Consistency
MOST IMPORTANT PERFORMANCE
Yahoo Cloud Serving Benchmark (YCSB)
Recap Importance of Benchmark and Factors
Referent
Einrichtung Titel des Vortrages 32
Results: Load Process
Referent
Einrichtung Titel des Vortrages 33
Results: Read/Write Mix Workload
Referent
Einrichtung Titel des Vortrages 34
Results: Read/Scan Mix Workload
Referent
Einrichtung Titel des Vortrages 35
Results: Read Latency across all workloads
Referent
Einrichtung Titel des Vortrages 36
Results: Insert Latency across all workloads
Referent
Einrichtung Titel des Vortrages 37
Lets MIGRATE from traditional data base !!!!
Live Demo
Referent
Einrichtung Titel des Vortrages 38
 Identify data model for the application
 Corresponding data sets have to be known
 Whether the application requires replication
 Identify the performance requirements
 Prototype the application
 Test the performance of the prototype
Discussion
Referent
Einrichtung Titel des Vortrages 39
Conclusion
 NoSQL replaced tradition relational databases
 Performance is the key feature
 Importance of benchmarks
 Top three NoSQL data base’s performance tested
 Cassandra outperforms all the other NoSQL data bases
 Decide based on application
Referent
Einrichtung Titel des Vortrages 40
Ad

Recommended

Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
sonalighai
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Antonio Severien
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
Intro to cassandra
Intro to cassandra
Aaron Ploetz
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
Getting started with postgresql
Getting started with postgresql
botsplash.com
 
No sq lv1_0
No sq lv1_0
Tuan Luong
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Scaling with MongoDB
Scaling with MongoDB
Rick Copeland
 
Cassandra training
Cassandra training
András Fehér
 
NoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Cassandra an overview
Cassandra an overview
PritamKathar
 
Voldemort
Voldemort
fasiha ikram
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Migrating to postgresql
Migrating to postgresql
botsplash.com
 
Cassandra
Cassandra
Upaang Saxena
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
aaronmorton
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Voldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Real-time Cassandra
Real-time Cassandra
Acunu
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra
Knoldus Inc.
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Cassandra architecture
Cassandra architecture
T Jake Luciani
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 

More Related Content

What's hot (20)

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Scaling with MongoDB
Scaling with MongoDB
Rick Copeland
 
Cassandra training
Cassandra training
András Fehér
 
NoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Cassandra an overview
Cassandra an overview
PritamKathar
 
Voldemort
Voldemort
fasiha ikram
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Migrating to postgresql
Migrating to postgresql
botsplash.com
 
Cassandra
Cassandra
Upaang Saxena
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
aaronmorton
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Voldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Real-time Cassandra
Real-time Cassandra
Acunu
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra
Knoldus Inc.
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Cassandra architecture
Cassandra architecture
T Jake Luciani
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
Scaling with MongoDB
Scaling with MongoDB
Rick Copeland
 
NoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Cassandra an overview
Cassandra an overview
PritamKathar
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Migrating to postgresql
Migrating to postgresql
botsplash.com
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
aaronmorton
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Voldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Real-time Cassandra
Real-time Cassandra
Acunu
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra
Knoldus Inc.
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Cassandra architecture
Cassandra architecture
T Jake Luciani
 

Viewers also liked (6)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
Hive tuning
Hive tuning
Michael Zhang
 
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB (20)

Data Storage Management
Data Storage Management
Nisheet Mahajan
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
Kaushik Rajan
 
Nosql seminar
Nosql seminar
Shreyashkumar Nangnurwar
 
The ABC of Big Data
The ABC of Big Data
André Faria Gomes
 
Lecture-20.pptx
Lecture-20.pptx
mohaaalsa
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
In15orlesss hadoop
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another Introduction
Kelum Senanayake
 
Hbase
Hbase
Shashwat Shriparv
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
Hadoop_arunam_ppt
Hadoop_arunam_ppt
jerrin joseph
 
Oracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBase
Paulo Fagundes
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
drupalcampest
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
Exove
 
5266732.ppt
5266732.ppt
hothyfa
 
Hadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
cassandra
cassandra
Akash R
 
No sq lv2
No sq lv2
Nusrat Sharmin
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
SindhujanDhayalan
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
Kaushik Rajan
 
Lecture-20.pptx
Lecture-20.pptx
mohaaalsa
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another Introduction
Kelum Senanayake
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
Oracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBase
Paulo Fagundes
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
drupalcampest
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
Exove
 
5266732.ppt
5266732.ppt
hothyfa
 
cassandra
cassandra
Akash R
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
SindhujanDhayalan
 
Ad

Recently uploaded (20)

presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
BCG-Executive-Perspectives-CEOs-Guide-to-Maximizing-Value-from-AI-EP0-3July20...
BCG-Executive-Perspectives-CEOs-Guide-to-Maximizing-Value-from-AI-EP0-3July20...
benediktnetzer1
 
最新版美国史蒂文斯理工学院毕业证(SIT毕业证书)原版定制
最新版美国史蒂文斯理工学院毕业证(SIT毕业证书)原版定制
Taqyea
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
Taqyea
 
Data-Driven-Operational--Excellence.pptx
Data-Driven-Operational--Excellence.pptx
NiwanthaThilanjanaGa
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
SUNSSE Engineering Introduction 2021.pdf
SUNSSE Engineering Introduction 2021.pdf
Ongkino
 
LECTURE_2skkkkskskskskksksksosoowowowowkwkw.ccoo
LECTURE_2skkkkskskskskksksksosoowowowowkwkw.ccoo
ssuseraf13da
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
B.Tech Business Plan mena countries and europe
B.Tech Business Plan mena countries and europe
AhmedSelim238929
 
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
jacoba18
 
Module 1Integrity_and_Ethics_PPT-2025.pptx
Module 1Integrity_and_Ethics_PPT-2025.pptx
Karikalcholan Mayavan
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
All the DataOps, all the paradigms .
All the DataOps, all the paradigms .
Lars Albertsson
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
BCG-Executive-Perspectives-CEOs-Guide-to-Maximizing-Value-from-AI-EP0-3July20...
BCG-Executive-Perspectives-CEOs-Guide-to-Maximizing-Value-from-AI-EP0-3July20...
benediktnetzer1
 
最新版美国史蒂文斯理工学院毕业证(SIT毕业证书)原版定制
最新版美国史蒂文斯理工学院毕业证(SIT毕业证书)原版定制
Taqyea
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
Taqyea
 
Data-Driven-Operational--Excellence.pptx
Data-Driven-Operational--Excellence.pptx
NiwanthaThilanjanaGa
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
SUNSSE Engineering Introduction 2021.pdf
SUNSSE Engineering Introduction 2021.pdf
Ongkino
 
LECTURE_2skkkkskskskskksksksosoowowowowkwkw.ccoo
LECTURE_2skkkkskskskskksksksosoowowowowkwkw.ccoo
ssuseraf13da
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
B.Tech Business Plan mena countries and europe
B.Tech Business Plan mena countries and europe
AhmedSelim238929
 
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
Veilig en vlot fietsen in Oost-Vlaanderen: Fietssnelwegen geoptimaliseerd met...
jacoba18
 
Module 1Integrity_and_Ethics_PPT-2025.pptx
Module 1Integrity_and_Ethics_PPT-2025.pptx
Karikalcholan Mayavan
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB

  • 1. Referent Einrichtung Titel des Vortrages 1 WP-Benchmarking Top NoSQL Databases Apache Cassandra, Apache HBase and MongoDB Presented By Athiq Ahamed Supriya
  • 2. Referent Einrichtung Titel des Vortrages 2 Introduction  Enormous amount of data-BigData  Scalabilty issue in RDBMS  Rise of NoSQL databases  Amazon Dynamo  Big table  CAP Theorem  BASE system
  • 3. Referent Einrichtung Titel des Vortrages 3 CAP Theorem  Consistency  Availability  Partition tolerance CAP theorem states that only two of the properties can be achieved at a time.
  • 4. Referent Einrichtung Titel des Vortrages 4 RDBMS NoSQL Supports powerful query language Supports very simple query language It has a fixed schema No fixed schema Follows ACID (Atomicity, Consistency, Isolation and Durability) It is only eventually consistent Supports transactions Does not support transactions RDBMS vs NoSQL Content:tutorialspoint.com
  • 5. Referent Einrichtung Titel des Vortrages 5  Basically available: System guarantees availability, in terms of the CAP theorem  Soft state: State of the system may change over time, because of eventual consistency model  Eventual consistency: System will become consistent over time BASE Content:www.edureka.in
  • 6. Referent Einrichtung Titel des Vortrages 6  Fast Performance is the key.  POC processes include right benchmarks:  Configurations  Parameters  Workloads Making the right choice! Selection of NoSQL
  • 7. Referent Einrichtung Titel des Vortrages 7  Yahoo Cloud Serving Benchmark (YCSB)  Top 3 NoSQL databases-Apache Cassandra, Apache Hbase and MongoDB.  Amazon Web Services EC2 instances for hosting the tests  Test performed 3 times on 3 different days Benchmark configuration
  • 8. Referent Einrichtung Titel des Vortrages 8  The tests ran on large size instances (15GB RAM and 4 CPU cores)  Instances used customized Ubuntu with Oracle Java 1.6 installed as a base.  A customized script written to drive the benchmark processes Benchmark configuration
  • 9. Referent Einrichtung Titel des Vortrages 9  Each NoSQL system performs differently, not alike.  Components and Internal working.  Apache Cassandra: Columnar database model  Apache HBase: Columnar database model  MongoDB: Document storage database model Understanding NoSQL Databases
  • 10. Referent Einrichtung Titel des Vortrages 10 Apache Cassandra  Cassandra is scalable, fault-tolerant, and consistent. All nodes are equal.  Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.  Key components: Node, Cluster, Commit log, Mem-table, SSTable and Bloom filter Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
  • 11. Referent Einrichtung Titel des Vortrages 11  Ring structure, peer to peer architecture  All nodes are equal  This improves general database availablity  Scaling up and scaling down is easier  Cassandra has key-value, column oriented database Apache Cassandra
  • 12. Referent Einrichtung Titel des Vortrages 12 Apache Cassandra Content:http://demoiselle.sourceforge.net/component/demoiselle- cassandra/1.0.0/images/datamodel1.png
  • 13. Referent Einrichtung Titel des Vortrages 13  Cassandra has an internal keyspace called system, stores metadata about the cluster.  Metadata:  The node‘s token  The cluster name  Keyspace n schema definitions (dynamic loading)  Whether or not the node is bootstrapped Apache Cassandra Content:https://www.edureka.co/blog/category/apache-cassandra/
  • 14. Referent Einrichtung Titel des Vortrages 14  Commit log: Crash recovery mechanism. Every write operation is written to commit log  Mem-Table: A memory resident data structure.  SSTable: It is a disk file to which the data is flushed from the mem-table Apache Cassandra
  • 15. Referent Einrichtung Titel des Vortrages 15  Bloom filters are used as a performance booster  Bloom filter are very fast, quick algorithms for testing a member in the set.  Bloom filters serves as a special kind of cache – quick lookups/search as they reside in memory Apache Cassandra
  • 16. Referent Einrichtung Titel des Vortrages 16  Gossip protocol: Communiction between nodes, co- ordination and failure check  Anti-Entropy protocol: Replica sync mechanism enusing data on different nodes are updated (Merkle trees)  Snitches ensures host proximity Apache Cassandra
  • 17. Referent Einrichtung Titel des Vortrages 17 Apache Cassandra- Read/Write operation
  • 18. Referent Einrichtung Titel des Vortrages 18  Sparse, distributed, sorted map and multidimensional and consistent.  Hbase is a Key/value store  Consists Row key, Column family, columns and timestamp. Apache HBase
  • 19. Referent Einrichtung Titel des Vortrages 19 Apache HBase Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
  • 20. Referent Einrichtung Titel des Vortrages 20  Region: Contiguous rows form a region  Region server(RS): Serves one or more regions.  Master server: Daemon responsible for managing Hbase cluster  HDFS: Distributed, open source file system containing HBase‘s data  Zookeeper: Distributed, open source co-ordinated service for co-ordination of master and region servers. Apache HBase Components Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
  • 21. Referent Einrichtung Titel des Vortrages 21 Apache Hbase Architecture
  • 22. Referent Einrichtung Titel des Vortrages 22  Client obtains meta table RS from Zookeeper  Client gets RS which holds the corresponding rowkey  Client receives the row from the respective Region server  Client caches this information along with the location of meta table server. First Read/Write to HBase
  • 23. Referent Einrichtung Titel des Vortrages 23  WAL: Write Ahead Log is a file on the distributed file system. It is used to store new data  Block Cache: It is the read cache. It stores frequently read data in memory  Mem Store: Write cache that stores new data which is not written to disk yet.  Hfiles stores the rows as sorted key values on disk HBase RS Components
  • 24. Referent Einrichtung Titel des Vortrages 24  Client writes the data to the WAL file stored on disk  WAL is used to recover not yet persisted data in case a server crashes.  Once data is written to WAL, it is placed in Mem Store Hbase Write steps (1)
  • 25. Referent Einrichtung Titel des Vortrages 25  All write/read are to/from the primary node.  HDFS replicates WAL and Hfile blocks. Replication happens automatically.  When data is written in HDFS, one copy is written locally and then it is replicated to a secondary node and later to tertiary node. HDFS Write steps (2)
  • 26. Referent Einrichtung Titel des Vortrages 26  Cassandra usecase: Availability and Partition tolerant requirements. Consistency is tunable by setting it high in the option  Hbase usecase: Consistency and Scalability. However, at less number of nodes/threads, availability is achieved high Cassandra and Hbase
  • 27. Referent Einrichtung Titel des Vortrages 27  Document-oriented database  High performance and automatic scaling  High consistency and partition tolerant  Replication and failover for high availability  Low latency  Flexible indexing MongoDB
  • 28. Referent Einrichtung Titel des Vortrages 28  Document is the basic unit for MongoDB(row)  Collection is similar to a table  A single instance has multiple independent databases  Every document has a special key, “_id”  Powerful JavaScript shell for administration  Configdb contains metadata of clusters MongoDB Concepts
  • 29. Referent Einrichtung Titel des Vortrages 29 MongoDB Simple Architecture
  • 30. Referent Einrichtung Titel des Vortrages 30  A mongo receives queries from applications  Uses metadata from config server for the data  Mangos directs write operations to a particular shard  Mongos uses the cluster metadata from the config database Read/Write MongoDB
  • 31. Referent Einrichtung Titel des Vortrages 31  Scalability  Availability  Partition Tolerant  Consistency MOST IMPORTANT PERFORMANCE Yahoo Cloud Serving Benchmark (YCSB) Recap Importance of Benchmark and Factors
  • 32. Referent Einrichtung Titel des Vortrages 32 Results: Load Process
  • 33. Referent Einrichtung Titel des Vortrages 33 Results: Read/Write Mix Workload
  • 34. Referent Einrichtung Titel des Vortrages 34 Results: Read/Scan Mix Workload
  • 35. Referent Einrichtung Titel des Vortrages 35 Results: Read Latency across all workloads
  • 36. Referent Einrichtung Titel des Vortrages 36 Results: Insert Latency across all workloads
  • 37. Referent Einrichtung Titel des Vortrages 37 Lets MIGRATE from traditional data base !!!! Live Demo
  • 38. Referent Einrichtung Titel des Vortrages 38  Identify data model for the application  Corresponding data sets have to be known  Whether the application requires replication  Identify the performance requirements  Prototype the application  Test the performance of the prototype Discussion
  • 39. Referent Einrichtung Titel des Vortrages 39 Conclusion  NoSQL replaced tradition relational databases  Performance is the key feature  Importance of benchmarks  Top three NoSQL data base’s performance tested  Cassandra outperforms all the other NoSQL data bases  Decide based on application

Editor's Notes

  • #9: Managing the start up Configuration and Termination of EC2 instances Running the test on clients
  • #10: Apache Cassandra: Columnar database model (Combination of Amazon Dynamo+Bigtable) Apache HBase: Columnar database model (Big table inspired Hadoop system)
  • #12: Rows are split and it has row key for range of rows (primary key is hashed, md5 hash), column family (column name) with value and time stamp. In habse, data is split columnwise, it has row key for range of rows, column family and column qualifier and time stamp. Ordered distribution and no hash distribution. Frequently accessed column are grouped together under commom family.
  • #14: System keyspace stores metadata for the local node. System keyspace cannot be modeified or edited by us . The node‘s token is decided by the partitioner.
  • #16: Memory reads are faster than disk reads..so when we see results of test, cassandra outperforms and bloom filters could be one of the reason, because of fast memory access and reads.
  • #17: Cassandra nodes exchange merkle trees for conversation with neighbours. Merkle tree is a hash representing the data in a column family. Trees are compared and if there is any difference, it launches a repair for the ranges that dont agree. Read-repair happens in the background internally.There is something called as snitch which routes the client to the nearest node.(there is no separate configdb like mongodb to route or zookeeper in hbase..which may take aditional time to respond). Snitch gives host proximity.
  • #27: Give example of facebook