Referent
Einrichtung Titel des Vortrages 1
WP-Benchmarking Top NoSQL
Databases
Apache Cassandra, Apache HBase and MongoDB
Presented By
Athiq Ahamed
Supriya
Referent
Einrichtung Titel des Vortrages 2
Introduction
 Enormous amount of data-BigData
 Scalabilty issue in RDBMS
 Rise of NoSQL databases
 Amazon Dynamo
 Big table
 CAP Theorem
 BASE system
Referent
Einrichtung Titel des Vortrages 3
CAP Theorem
 Consistency
 Availability
 Partition tolerance
CAP theorem states that only two of the properties can be
achieved at a time.
Referent
Einrichtung Titel des Vortrages 4
RDBMS NoSQL
Supports powerful query
language
Supports very simple query
language
It has a fixed schema No fixed schema
Follows ACID (Atomicity,
Consistency, Isolation and
Durability)
It is only eventually consistent
Supports transactions Does not support transactions
RDBMS vs NoSQL
Content:tutorialspoint.com
Referent
Einrichtung Titel des Vortrages 5
 Basically available: System guarantees availability, in
terms of the CAP theorem
 Soft state: State of the system may change over time,
because of eventual consistency model
 Eventual consistency: System will become consistent over
time
BASE
Content:www.edureka.in
Referent
Einrichtung Titel des Vortrages 6
 Fast Performance is the key.
 POC processes include right benchmarks:
 Configurations
 Parameters
 Workloads
Making the right choice!
Selection of NoSQL
Referent
Einrichtung Titel des Vortrages 7
 Yahoo Cloud Serving Benchmark (YCSB)
 Top 3 NoSQL databases-Apache Cassandra, Apache
Hbase and MongoDB.
 Amazon Web Services EC2 instances for hosting the tests
 Test performed 3 times on 3 different days
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 8
 The tests ran on large size instances (15GB RAM and 4
CPU cores)
 Instances used customized Ubuntu with Oracle Java 1.6
installed as a base.
 A customized script written to drive the benchmark
processes
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 9
 Each NoSQL system performs differently, not alike.
 Components and Internal working.
 Apache Cassandra: Columnar database model
 Apache HBase: Columnar database model
 MongoDB: Document storage database model
Understanding NoSQL Databases
Referent
Einrichtung Titel des Vortrages 10
Apache Cassandra
 Cassandra is scalable, fault-tolerant, and consistent. All
nodes are equal.
 Its distribution design is based on Amazon’s Dynamo and
its data model on Google’s Bigtable.
 Key components: Node, Cluster, Commit log, Mem-table,
SSTable and Bloom filter
Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
Referent
Einrichtung Titel des Vortrages 11
 Ring structure, peer to peer architecture
 All nodes are equal
 This improves general database availablity
 Scaling up and scaling down is easier
 Cassandra has key-value, column oriented database
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 12
Apache Cassandra
Content:http://demoiselle.sourceforge.net/component/demoiselle-
cassandra/1.0.0/images/datamodel1.png
Referent
Einrichtung Titel des Vortrages 13
 Cassandra has an internal keyspace called system, stores
metadata about the cluster.
 Metadata:
 The node‘s token
 The cluster name
 Keyspace n schema definitions (dynamic loading)
 Whether or not the node is bootstrapped
Apache Cassandra
Content:https://www.edureka.co/blog/category/apache-cassandra/
Referent
Einrichtung Titel des Vortrages 14
 Commit log: Crash recovery mechanism. Every write
operation is written to commit log
 Mem-Table: A memory resident data structure.
 SSTable: It is a disk file to which the data is flushed from
the mem-table
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 15
 Bloom filters are used as a performance booster
 Bloom filter are very fast, quick algorithms for testing a
member in the set.
 Bloom filters serves as a special kind of cache – quick
lookups/search as they reside in memory
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 16
 Gossip protocol: Communiction between nodes, co-
ordination and failure check
 Anti-Entropy protocol: Replica sync mechanism enusing
data on different nodes are updated (Merkle trees)
 Snitches ensures host proximity
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 17
Apache Cassandra- Read/Write operation
Referent
Einrichtung Titel des Vortrages 18
 Sparse, distributed, sorted map and multidimensional and
consistent.
 Hbase is a Key/value store
 Consists Row key, Column family, columns and timestamp.
Apache HBase
Referent
Einrichtung Titel des Vortrages 19
Apache HBase
Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
Referent
Einrichtung Titel des Vortrages 20
 Region: Contiguous rows form a region
 Region server(RS): Serves one or more regions.
 Master server: Daemon responsible for managing Hbase
cluster
 HDFS: Distributed, open source file system containing
HBase‘s data
 Zookeeper: Distributed, open source co-ordinated service
for co-ordination of master and region servers.
Apache HBase Components
Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
Referent
Einrichtung Titel des Vortrages 21
Apache Hbase Architecture
Referent
Einrichtung Titel des Vortrages 22
 Client obtains meta table RS from Zookeeper
 Client gets RS which holds the corresponding rowkey
 Client receives the row from the respective Region server
 Client caches this information along with the location of
meta table server.
First Read/Write to HBase
Referent
Einrichtung Titel des Vortrages 23
 WAL: Write Ahead Log is a file on the distributed file
system. It is used to store new data
 Block Cache: It is the read cache. It stores frequently
read data in memory
 Mem Store: Write cache that stores new data which is not
written to disk yet.
 Hfiles stores the rows as sorted key values on disk
HBase RS Components
Referent
Einrichtung Titel des Vortrages 24
 Client writes the data to the WAL file stored on disk
 WAL is used to recover not yet persisted data in case a
server crashes.
 Once data is written to WAL, it is placed in Mem Store
Hbase Write steps (1)
Referent
Einrichtung Titel des Vortrages 25
 All write/read are to/from the primary node.
 HDFS replicates WAL and Hfile blocks. Replication
happens automatically.
 When data is written in HDFS, one copy is written locally
and then it is replicated to a secondary node and later to
tertiary node.
HDFS Write steps (2)
Referent
Einrichtung Titel des Vortrages 26
 Cassandra usecase: Availability and Partition tolerant
requirements.
Consistency is tunable by setting it high in the option
 Hbase usecase: Consistency and Scalability. However, at
less number of nodes/threads, availability is achieved high
Cassandra and Hbase
Referent
Einrichtung Titel des Vortrages 27
 Document-oriented database
 High performance and automatic scaling
 High consistency and partition tolerant
 Replication and failover for high availability
 Low latency
 Flexible indexing
MongoDB
Referent
Einrichtung Titel des Vortrages 28
 Document is the basic unit for MongoDB(row)
 Collection is similar to a table
 A single instance has multiple independent databases
 Every document has a special key, “_id”
 Powerful JavaScript shell for administration
 Configdb contains metadata of clusters
MongoDB Concepts
Referent
Einrichtung Titel des Vortrages 29
MongoDB Simple Architecture
Referent
Einrichtung Titel des Vortrages 30
 A mongo receives queries from applications
 Uses metadata from config server for the data
 Mangos directs write operations to a particular shard
 Mongos uses the cluster metadata from the config
database
Read/Write MongoDB
Referent
Einrichtung Titel des Vortrages 31
 Scalability
 Availability
 Partition Tolerant
 Consistency
MOST IMPORTANT PERFORMANCE
Yahoo Cloud Serving Benchmark (YCSB)
Recap Importance of Benchmark and Factors
Referent
Einrichtung Titel des Vortrages 32
Results: Load Process
Referent
Einrichtung Titel des Vortrages 33
Results: Read/Write Mix Workload
Referent
Einrichtung Titel des Vortrages 34
Results: Read/Scan Mix Workload
Referent
Einrichtung Titel des Vortrages 35
Results: Read Latency across all workloads
Referent
Einrichtung Titel des Vortrages 36
Results: Insert Latency across all workloads
Referent
Einrichtung Titel des Vortrages 37
Lets MIGRATE from traditional data base !!!!
Live Demo
Referent
Einrichtung Titel des Vortrages 38
 Identify data model for the application
 Corresponding data sets have to be known
 Whether the application requires replication
 Identify the performance requirements
 Prototype the application
 Test the performance of the prototype
Discussion
Referent
Einrichtung Titel des Vortrages 39
Conclusion
 NoSQL replaced tradition relational databases
 Performance is the key feature
 Importance of benchmarks
 Top three NoSQL data base’s performance tested
 Cassandra outperforms all the other NoSQL data bases
 Decide based on application
Referent
Einrichtung Titel des Vortrages 40

More Related Content

PDF
Comparison between mongo db and cassandra using ycsb
PPTX
1. beyond mission critical virtualizing big data and hadoop
PPTX
NoSQL: Cassadra vs. HBase
PPTX
Introduction to NoSQL & Apache Cassandra
ODP
Intro to cassandra
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PPTX
Getting started with postgresql
PDF
No sq lv1_0
Comparison between mongo db and cassandra using ycsb
1. beyond mission critical virtualizing big data and hadoop
NoSQL: Cassadra vs. HBase
Introduction to NoSQL & Apache Cassandra
Intro to cassandra
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Getting started with postgresql
No sq lv1_0

What's hot (20)

PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
PPTX
Scaling with MongoDB
PPTX
Cassandra training
PPTX
NoSQL databases - An introduction
PPTX
Cassandra an overview
PPTX
Voldemort
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PDF
Migrating to postgresql
PPTX
Cassandra
PDF
Cassandra TK 2014 - Large Nodes
PPT
Apache Cassandra training. Overview and Basics
PDF
Voldemort on Solid State Drives
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
PDF
HBaseCon 2015- HBase @ Flipboard
PDF
Real-time Cassandra
PDF
Cassandra: Open Source Bigtable + Dynamo
ODP
Introduction to Apache Cassandra
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
PPT
Cassandra architecture
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Scaling with MongoDB
Cassandra training
NoSQL databases - An introduction
Cassandra an overview
Voldemort
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Migrating to postgresql
Cassandra
Cassandra TK 2014 - Large Nodes
Apache Cassandra training. Overview and Basics
Voldemort on Solid State Drives
Gruter TECHDAY 2014 Realtime Processing in Telco
HBaseCon 2015- HBase @ Flipboard
Real-time Cassandra
Cassandra: Open Source Bigtable + Dynamo
Introduction to Apache Cassandra
Run Cloud Native MySQL NDB Cluster in Kubernetes
Cassandra architecture
Ad

Viewers also liked (6)

PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
PDF
Analytical Queries with Hive: SQL Windowing and Table Functions
PDF
Optimizing Hive Queries
PDF
Hive tuning
PPTX
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Analytical Queries with Hive: SQL Windowing and Table Functions
Optimizing Hive Queries
Hive tuning
How to understand and analyze Apache Hive query execution plan for performanc...
Ad

Similar to Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB (20)

PDF
Data Storage Management
PDF
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
PDF
Performance Analysis of HBASE and MONGODB
PPTX
PDF
The ABC of Big Data
PPTX
Lecture-20.pptx
PPT
Bhupeshbansal bigdata
PPTX
In15orlesss hadoop
PDF
Couchbase - Yet Another Introduction
PDF
Nosql Presentation.pdf for DBMS understanding
PPTX
Hadoop_arunam_ppt
PDF
Oracle NoSQL Database Compared to Cassandra and HBase
PPT
Drupalcamp Estonia - High Performance Sites
PPT
Drupalcamp Estonia - High Performance Sites
PPT
5266732.ppt
PDF
Hadoop data management
PDF
cassandra
PPTX
No sq lv2
PDF
Performance analysis of MongoDB and HBase
Data Storage Management
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Performance Analysis of HBASE and MONGODB
The ABC of Big Data
Lecture-20.pptx
Bhupeshbansal bigdata
In15orlesss hadoop
Couchbase - Yet Another Introduction
Nosql Presentation.pdf for DBMS understanding
Hadoop_arunam_ppt
Oracle NoSQL Database Compared to Cassandra and HBase
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
5266732.ppt
Hadoop data management
cassandra
No sq lv2
Performance analysis of MongoDB and HBase

Recently uploaded (20)

PDF
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
PPTX
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
PDF
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
PPTX
recommendation Project PPT with details attached
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PDF
A biomechanical Functional analysis of the masitary muscles in man
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
The Data Security Envisioning Workshop provides a summary of an organization...
PPTX
ai agent creaction with langgraph_presentation_
PPTX
PPT for Diseases.pptx, there are 3 types of diseases
PPTX
Tapan_20220802057_Researchinternship_final_stage.pptx
PPTX
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
recommendation Project PPT with details attached
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
A biomechanical Functional analysis of the masitary muscles in man
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Navigating the Thai Supplements Landscape.pdf
AI AND ML PROPOSAL PRESENTATION MUST.pptx
IMPACT OF LANDSLIDE.....................
The Data Security Envisioning Workshop provides a summary of an organization...
ai agent creaction with langgraph_presentation_
PPT for Diseases.pptx, there are 3 types of diseases
Tapan_20220802057_Researchinternship_final_stage.pptx
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB

  • 1. Referent Einrichtung Titel des Vortrages 1 WP-Benchmarking Top NoSQL Databases Apache Cassandra, Apache HBase and MongoDB Presented By Athiq Ahamed Supriya
  • 2. Referent Einrichtung Titel des Vortrages 2 Introduction  Enormous amount of data-BigData  Scalabilty issue in RDBMS  Rise of NoSQL databases  Amazon Dynamo  Big table  CAP Theorem  BASE system
  • 3. Referent Einrichtung Titel des Vortrages 3 CAP Theorem  Consistency  Availability  Partition tolerance CAP theorem states that only two of the properties can be achieved at a time.
  • 4. Referent Einrichtung Titel des Vortrages 4 RDBMS NoSQL Supports powerful query language Supports very simple query language It has a fixed schema No fixed schema Follows ACID (Atomicity, Consistency, Isolation and Durability) It is only eventually consistent Supports transactions Does not support transactions RDBMS vs NoSQL Content:tutorialspoint.com
  • 5. Referent Einrichtung Titel des Vortrages 5  Basically available: System guarantees availability, in terms of the CAP theorem  Soft state: State of the system may change over time, because of eventual consistency model  Eventual consistency: System will become consistent over time BASE Content:www.edureka.in
  • 6. Referent Einrichtung Titel des Vortrages 6  Fast Performance is the key.  POC processes include right benchmarks:  Configurations  Parameters  Workloads Making the right choice! Selection of NoSQL
  • 7. Referent Einrichtung Titel des Vortrages 7  Yahoo Cloud Serving Benchmark (YCSB)  Top 3 NoSQL databases-Apache Cassandra, Apache Hbase and MongoDB.  Amazon Web Services EC2 instances for hosting the tests  Test performed 3 times on 3 different days Benchmark configuration
  • 8. Referent Einrichtung Titel des Vortrages 8  The tests ran on large size instances (15GB RAM and 4 CPU cores)  Instances used customized Ubuntu with Oracle Java 1.6 installed as a base.  A customized script written to drive the benchmark processes Benchmark configuration
  • 9. Referent Einrichtung Titel des Vortrages 9  Each NoSQL system performs differently, not alike.  Components and Internal working.  Apache Cassandra: Columnar database model  Apache HBase: Columnar database model  MongoDB: Document storage database model Understanding NoSQL Databases
  • 10. Referent Einrichtung Titel des Vortrages 10 Apache Cassandra  Cassandra is scalable, fault-tolerant, and consistent. All nodes are equal.  Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.  Key components: Node, Cluster, Commit log, Mem-table, SSTable and Bloom filter Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
  • 11. Referent Einrichtung Titel des Vortrages 11  Ring structure, peer to peer architecture  All nodes are equal  This improves general database availablity  Scaling up and scaling down is easier  Cassandra has key-value, column oriented database Apache Cassandra
  • 12. Referent Einrichtung Titel des Vortrages 12 Apache Cassandra Content:http://demoiselle.sourceforge.net/component/demoiselle- cassandra/1.0.0/images/datamodel1.png
  • 13. Referent Einrichtung Titel des Vortrages 13  Cassandra has an internal keyspace called system, stores metadata about the cluster.  Metadata:  The node‘s token  The cluster name  Keyspace n schema definitions (dynamic loading)  Whether or not the node is bootstrapped Apache Cassandra Content:https://www.edureka.co/blog/category/apache-cassandra/
  • 14. Referent Einrichtung Titel des Vortrages 14  Commit log: Crash recovery mechanism. Every write operation is written to commit log  Mem-Table: A memory resident data structure.  SSTable: It is a disk file to which the data is flushed from the mem-table Apache Cassandra
  • 15. Referent Einrichtung Titel des Vortrages 15  Bloom filters are used as a performance booster  Bloom filter are very fast, quick algorithms for testing a member in the set.  Bloom filters serves as a special kind of cache – quick lookups/search as they reside in memory Apache Cassandra
  • 16. Referent Einrichtung Titel des Vortrages 16  Gossip protocol: Communiction between nodes, co- ordination and failure check  Anti-Entropy protocol: Replica sync mechanism enusing data on different nodes are updated (Merkle trees)  Snitches ensures host proximity Apache Cassandra
  • 17. Referent Einrichtung Titel des Vortrages 17 Apache Cassandra- Read/Write operation
  • 18. Referent Einrichtung Titel des Vortrages 18  Sparse, distributed, sorted map and multidimensional and consistent.  Hbase is a Key/value store  Consists Row key, Column family, columns and timestamp. Apache HBase
  • 19. Referent Einrichtung Titel des Vortrages 19 Apache HBase Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
  • 20. Referent Einrichtung Titel des Vortrages 20  Region: Contiguous rows form a region  Region server(RS): Serves one or more regions.  Master server: Daemon responsible for managing Hbase cluster  HDFS: Distributed, open source file system containing HBase‘s data  Zookeeper: Distributed, open source co-ordinated service for co-ordination of master and region servers. Apache HBase Components Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
  • 21. Referent Einrichtung Titel des Vortrages 21 Apache Hbase Architecture
  • 22. Referent Einrichtung Titel des Vortrages 22  Client obtains meta table RS from Zookeeper  Client gets RS which holds the corresponding rowkey  Client receives the row from the respective Region server  Client caches this information along with the location of meta table server. First Read/Write to HBase
  • 23. Referent Einrichtung Titel des Vortrages 23  WAL: Write Ahead Log is a file on the distributed file system. It is used to store new data  Block Cache: It is the read cache. It stores frequently read data in memory  Mem Store: Write cache that stores new data which is not written to disk yet.  Hfiles stores the rows as sorted key values on disk HBase RS Components
  • 24. Referent Einrichtung Titel des Vortrages 24  Client writes the data to the WAL file stored on disk  WAL is used to recover not yet persisted data in case a server crashes.  Once data is written to WAL, it is placed in Mem Store Hbase Write steps (1)
  • 25. Referent Einrichtung Titel des Vortrages 25  All write/read are to/from the primary node.  HDFS replicates WAL and Hfile blocks. Replication happens automatically.  When data is written in HDFS, one copy is written locally and then it is replicated to a secondary node and later to tertiary node. HDFS Write steps (2)
  • 26. Referent Einrichtung Titel des Vortrages 26  Cassandra usecase: Availability and Partition tolerant requirements. Consistency is tunable by setting it high in the option  Hbase usecase: Consistency and Scalability. However, at less number of nodes/threads, availability is achieved high Cassandra and Hbase
  • 27. Referent Einrichtung Titel des Vortrages 27  Document-oriented database  High performance and automatic scaling  High consistency and partition tolerant  Replication and failover for high availability  Low latency  Flexible indexing MongoDB
  • 28. Referent Einrichtung Titel des Vortrages 28  Document is the basic unit for MongoDB(row)  Collection is similar to a table  A single instance has multiple independent databases  Every document has a special key, “_id”  Powerful JavaScript shell for administration  Configdb contains metadata of clusters MongoDB Concepts
  • 29. Referent Einrichtung Titel des Vortrages 29 MongoDB Simple Architecture
  • 30. Referent Einrichtung Titel des Vortrages 30  A mongo receives queries from applications  Uses metadata from config server for the data  Mangos directs write operations to a particular shard  Mongos uses the cluster metadata from the config database Read/Write MongoDB
  • 31. Referent Einrichtung Titel des Vortrages 31  Scalability  Availability  Partition Tolerant  Consistency MOST IMPORTANT PERFORMANCE Yahoo Cloud Serving Benchmark (YCSB) Recap Importance of Benchmark and Factors
  • 32. Referent Einrichtung Titel des Vortrages 32 Results: Load Process
  • 33. Referent Einrichtung Titel des Vortrages 33 Results: Read/Write Mix Workload
  • 34. Referent Einrichtung Titel des Vortrages 34 Results: Read/Scan Mix Workload
  • 35. Referent Einrichtung Titel des Vortrages 35 Results: Read Latency across all workloads
  • 36. Referent Einrichtung Titel des Vortrages 36 Results: Insert Latency across all workloads
  • 37. Referent Einrichtung Titel des Vortrages 37 Lets MIGRATE from traditional data base !!!! Live Demo
  • 38. Referent Einrichtung Titel des Vortrages 38  Identify data model for the application  Corresponding data sets have to be known  Whether the application requires replication  Identify the performance requirements  Prototype the application  Test the performance of the prototype Discussion
  • 39. Referent Einrichtung Titel des Vortrages 39 Conclusion  NoSQL replaced tradition relational databases  Performance is the key feature  Importance of benchmarks  Top three NoSQL data base’s performance tested  Cassandra outperforms all the other NoSQL data bases  Decide based on application

Editor's Notes

  • #9: Managing the start up Configuration and Termination of EC2 instances Running the test on clients
  • #10: Apache Cassandra: Columnar database model (Combination of Amazon Dynamo+Bigtable) Apache HBase: Columnar database model (Big table inspired Hadoop system)
  • #12: Rows are split and it has row key for range of rows (primary key is hashed, md5 hash), column family (column name) with value and time stamp. In habse, data is split columnwise, it has row key for range of rows, column family and column qualifier and time stamp. Ordered distribution and no hash distribution. Frequently accessed column are grouped together under commom family.
  • #14: System keyspace stores metadata for the local node. System keyspace cannot be modeified or edited by us . The node‘s token is decided by the partitioner.
  • #16: Memory reads are faster than disk reads..so when we see results of test, cassandra outperforms and bloom filters could be one of the reason, because of fast memory access and reads.
  • #17: Cassandra nodes exchange merkle trees for conversation with neighbours. Merkle tree is a hash representing the data in a column family. Trees are compared and if there is any difference, it launches a repair for the ranges that dont agree. Read-repair happens in the background internally.There is something called as snitch which routes the client to the nearest node.(there is no separate configdb like mongodb to route or zookeeper in hbase..which may take aditional time to respond). Snitch gives host proximity.
  • #27: Give example of facebook