SlideShare a Scribd company logo
HBase
           Introduction to
           column oriented
           databases




        Luís Cipriani
        @lfcipriani (twitter, linkedin, github, ...)
        22o. GURU (2012-02-25) - Sao Paulo/Brazil

sexta-feira, 24 de fevereiro de 12
ME




sexta-feira, 24 de fevereiro de 12
intro




                                     “A BigTable HBase is a sparse,
                                         distributed, persistent
                                     multidimensional sorted map”




                                                     http://research.google.com/archive/bigtable.html
sexta-feira, 24 de fevereiro de 12
intro > data model
                           {                           <-- table
                                // ...
                                "aaaaa" : {            <--   row
                                   "A" : {             <--   column family
                                      "foo" : {        <--   column (qualifier)
                                        15: "y",       <--   timestamp, value
                                         4: "m"
                                      }
                                      "bar" : {...}
                                   },
                                   "B" : {
                                      "" : {...}
                                   }
                                },
                                "aaaab" : {
                                   "A" : {
                                      "foo" : {...},
                                      "bar" : {...},
                                      "joe" : {...}
                                   },
                                   "B" : {
                                      "" : {...}
                                   }
                                },
                                // ...
                           }
sexta-feira, 24 de fevereiro de 12
intro > data model




      (Table, RowKey, Family, Column, Timestamp) → Value




sexta-feira, 24 de fevereiro de 12
intro > hadoop stack




                          • hadoop HDFS (or not)
                          • hadoop MapReduce
                          • hadoop ZooKeeper
                          • hadoop HBase
                          • hadoop Hue, Whirr, etc...



sexta-feira, 24 de fevereiro de 12
architecture




sexta-feira, 24 de fevereiro de 12
key design > read/write model




               • randon reads (get)
               • sequential reads (scan)
                 • partial key scans
               • writes (put = update)



sexta-feira, 24 de fevereiro de 12
key design > storage model




                                     http://ofps.oreilly.com/titles/9781449396107/advanced.html
sexta-feira, 24 de fevereiro de 12
key design > strategies


                                 • tall-narrow vs flat-wide
                                 • partial key scans
                                 • pagination
                                 • time series
                                    • salting
                                    • field swap
                                    • randomization
                                 • secondary indexes

sexta-feira, 24 de fevereiro de 12
key design > example




sexta-feira, 24 de fevereiro de 12
development




            • installation modes
               • standalone, pseudo-distributed, distributed
            • JRuby console
            • Access
               • java/jruby API (more features)
               • entrypoints REST, Thrift, Avro, Protobuffers
               • there several other libs


sexta-feira, 24 de fevereiro de 12
cons




            • complex config and maintenance
            • hot regions
            • no secondary index built-in
            • no transactions built-in
            • complex schema design




sexta-feira, 24 de fevereiro de 12
pros




               • distributed
               • scalable (auto-sharding)
               • built on Hadoop stack
               • handles Big Data
               • high performance for write and read
               • no SPOF
               • fault tolerant, no data loss
               • active community



sexta-feira, 24 de fevereiro de 12
Reformulação Box de Login                                                       Abril ID
   http://engineering.abril.com.br/
   http://abr.io/hbase-intro
   https://pinboard.in/u:lfcipriani/t:hbase/
   http://hbase.apache.org/




                                     ?   http://shop.oreilly.com/product/0636920014348.do




sexta-feira, 24 de fevereiro de 12

More Related Content

PDF
HBase Storage Internals
PPT
8. column oriented databases
PPTX
Column oriented database
KEY
Case Abril: Tracking real time user behavior in websites Homes with Ruby, Sin...
PDF
Intro to column stores
PDF
Introduction to column oriented databases
KEY
PDF
Hadoop Overview kdd2011
HBase Storage Internals
8. column oriented databases
Column oriented database
Case Abril: Tracking real time user behavior in websites Homes with Ruby, Sin...
Intro to column stores
Introduction to column oriented databases
Hadoop Overview kdd2011

Similar to Hbase: Introduction to column oriented databases (14)

PPT
Hive introduction 介绍
PDF
Intro to HBase - Lars George
PDF
Hadoop Overview & Architecture
 
PPT
Brust hadoopecosystem
PPT
Generalized framework for using NoSQL Databases
PDF
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
PDF
20100128ebay
PPTX
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
PDF
HBase, no trouble
PPT
Hadoop - Introduction to Hadoop
PPTX
HBaseConEast2016: HBase and Spark, State of the Art
PDF
HBase and Impala Notes - Munich HUG - 20131017
KEY
MongoDB at GUL
PPTX
Hadoop @ eBay: Past, Present, and Future
Hive introduction 介绍
Intro to HBase - Lars George
Hadoop Overview & Architecture
 
Brust hadoopecosystem
Generalized framework for using NoSQL Databases
Hadoop/Mahout/HBaseで テキスト分類器を作ったよ
20100128ebay
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
HBase, no trouble
Hadoop - Introduction to Hadoop
HBaseConEast2016: HBase and Spark, State of the Art
HBase and Impala Notes - Munich HUG - 20131017
MongoDB at GUL
Hadoop @ eBay: Past, Present, and Future
Ad

More from Luis Cipriani (10)

PDF
Adventures with Raspberry Pi and Twitter API
PDF
Capturando o pulso do planeta com as APIs de Streaming do Twitter
PDF
Twitter e suas APIs de Streaming - Campus Party Brasil 7
PDF
Segurança de APIs HTTP, um guia sensato para desenvolvedores preocupados
PDF
API Caching, why your server needs some rest
PDF
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
PDF
Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...
PDF
Como um verdadeiro sistema REST funciona: arquitetura e performance na Abril
PDF
Explaining Semantic Web
KEY
Fearless HTTP requests abuse
Adventures with Raspberry Pi and Twitter API
Capturando o pulso do planeta com as APIs de Streaming do Twitter
Twitter e suas APIs de Streaming - Campus Party Brasil 7
Segurança de APIs HTTP, um guia sensato para desenvolvedores preocupados
API Caching, why your server needs some rest
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Alexandria: um Sistema de Sistemas para Publicação de Conteúdo Digital utiliz...
Como um verdadeiro sistema REST funciona: arquitetura e performance na Abril
Explaining Semantic Web
Fearless HTTP requests abuse
Ad

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Electronic commerce courselecture one. Pdf
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 2 Digital Image Fundamentals.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx

Hbase: Introduction to column oriented databases

  • 1. HBase Introduction to column oriented databases Luís Cipriani @lfcipriani (twitter, linkedin, github, ...) 22o. GURU (2012-02-25) - Sao Paulo/Brazil sexta-feira, 24 de fevereiro de 12
  • 2. ME sexta-feira, 24 de fevereiro de 12
  • 3. intro “A BigTable HBase is a sparse, distributed, persistent multidimensional sorted map” http://research.google.com/archive/bigtable.html sexta-feira, 24 de fevereiro de 12
  • 4. intro > data model { <-- table // ... "aaaaa" : { <-- row "A" : { <-- column family "foo" : { <-- column (qualifier) 15: "y", <-- timestamp, value 4: "m" } "bar" : {...} }, "B" : { "" : {...} } }, "aaaab" : { "A" : { "foo" : {...}, "bar" : {...}, "joe" : {...} }, "B" : { "" : {...} } }, // ... } sexta-feira, 24 de fevereiro de 12
  • 5. intro > data model (Table, RowKey, Family, Column, Timestamp) → Value sexta-feira, 24 de fevereiro de 12
  • 6. intro > hadoop stack • hadoop HDFS (or not) • hadoop MapReduce • hadoop ZooKeeper • hadoop HBase • hadoop Hue, Whirr, etc... sexta-feira, 24 de fevereiro de 12
  • 8. key design > read/write model • randon reads (get) • sequential reads (scan) • partial key scans • writes (put = update) sexta-feira, 24 de fevereiro de 12
  • 9. key design > storage model http://ofps.oreilly.com/titles/9781449396107/advanced.html sexta-feira, 24 de fevereiro de 12
  • 10. key design > strategies • tall-narrow vs flat-wide • partial key scans • pagination • time series • salting • field swap • randomization • secondary indexes sexta-feira, 24 de fevereiro de 12
  • 11. key design > example sexta-feira, 24 de fevereiro de 12
  • 12. development • installation modes • standalone, pseudo-distributed, distributed • JRuby console • Access • java/jruby API (more features) • entrypoints REST, Thrift, Avro, Protobuffers • there several other libs sexta-feira, 24 de fevereiro de 12
  • 13. cons • complex config and maintenance • hot regions • no secondary index built-in • no transactions built-in • complex schema design sexta-feira, 24 de fevereiro de 12
  • 14. pros • distributed • scalable (auto-sharding) • built on Hadoop stack • handles Big Data • high performance for write and read • no SPOF • fault tolerant, no data loss • active community sexta-feira, 24 de fevereiro de 12
  • 15. Reformulação Box de Login Abril ID http://engineering.abril.com.br/ http://abr.io/hbase-intro https://pinboard.in/u:lfcipriani/t:hbase/ http://hbase.apache.org/ ? http://shop.oreilly.com/product/0636920014348.do sexta-feira, 24 de fevereiro de 12