SlideShare a Scribd company logo
#MongoDB




Introduction to MongoDB
& MongoDB + Hadoop
Steve Francia
Chief Evangelist, 10gen
What is MongoDB
MongoDB is a ___________
database
• Document
• Open source
• High performance
• Horizontally scalable
• Full featured
Document Database
• Not for .PDF & .DOC files
• A document is essentially an associative array
• Document == JSON object
• Document == PHP Array
• Document == Python Dict
• Document == Ruby Hash
• etc
Open Source
• MongoDB is an open source project
• On GitHub
• Licensed under the AGPL
• Started & sponsored by 10gen
• Commercial licenses available
• Contributions welcome
High Performance
• Written in C++
• Extensive use of memory-mapped files
 i.e. read-through write-through memory caching.
• Runs nearly everywhere
• Data serialized as BSON (fast parsing)
• Full support for primary & secondary indexes
• Document model = less work
Horizontally Scalable
Full Featured
• Ad Hoc queries
• Real time aggregation
• Rich query capabilities
• Traditionally consistent
• Geospatial features
• Support for most programming languages
• Flexible schema
Database Landscape
http://www.mongodb.org/download
s
Mongo Shell
Document Database
RDBMS                MongoDB
Table, View   ➜   Collection
Row           ➜   Document
Index         ➜   Index
Join          ➜   Embedded Document
Foreign Key   ➜   Reference
Partition     ➜   Shard


Terminology
Typical (relational) ERD
MongoDB ERD
Working with MongoDB
Creating an author
> db.author.insert({
                       first_name: 'j.r.r.',
                       last_name: 'tolkien',
         bio: 'J.R.R. Tolkien (1892.1973), beloved throughout the
world as the creator of The Hobbit and The Lord of the Rings, was a
professor of Anglo-Saxon at Oxford, a fellow of Pembroke
College, and a fellow of Merton College until his retirement in 1959.
His chief interest was the linguistic aspects of the early English
written tradition, but even as he studied these classics he was
creating a set of his own.'
})
Querying for our author
> db.author.findOne( { last_name : 'tolkien' } )
{
    "_id" : ObjectId("507ffbb1d94ccab2da652597"),
    "first_name" : "j.r.r.",
    "last_name" : "tolkien",
    "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world
as the creator of The Hobbit and The Lord of the Rings, was a
professor of Anglo-Saxon at Oxford, a fellow of Pembroke
College, and a fellow of Merton College until his retirement in 1959.
His chief interest was the linguistic aspects of the early English
written tradition, but even as he studied these classics he was
creating a set of his own."
}
Creating a Book
> db.books.insert({
           title: 'fellowship of the ring, the',
           author: ObjectId("507ffbb1d94ccab2da652597"),
           language: 'english',
           genre: ['fantasy', 'adventure'],
           publication: {
                      name: 'george allen & unwin',
                      location: 'London',
                      date: new Date('21 July 1954'),
           }
})

                                     http://society6.com/PastaSoup/The-Fellowship-of-the-Ring-ZZc_Print/
Multiple values per key
> db.books.findOne({language: 'english'}, {genre: 1})
{
    "_id" : ObjectId("50804391d94ccab2da652598"),
    "genre" : [
        "fantasy",
        "adventure"
    ]
}
Querying for key with
multiple values
> db.books.findOne({genre: 'fantasy'}, {title: 1})
{
    "_id" : ObjectId("50804391d94ccab2da652598"),
    "title" : "fellowship of the ring, the"
}




                      Query key with single value or
                      multiple values the same way.
Nested Values
> db.books.findOne({}, {publication: 1})
{
    "_id" : ObjectId("50804ec7d94ccab2da65259a"),
    "publication" : {
            "name" : "george allen & unwin",
            "location" : "London",
            "date" : ISODate("1954-07-21T04:00:00Z")
    }
}
Reach into nested values
using dot notation
> db.books.findOne(
    {'publication.date' :
              { $lt : new Date('21 June 1960')}
    }
)
{
    "_id" : ObjectId("50804391d94ccab2da652598"),
    "title" : "fellowship of the ring, the",
    "author" : ObjectId("507ffbb1d94ccab2da652597"),
    "language" : "english",
    "genre" : [ "fantasy",     "adventure" ],
    "publication" : {
              "name" : "george allen & unwin",
              "location" : "London",
              "date" : ISODate("1954-07-21T04:00:00Z")
    }
}
Update books
> db.books.update(
           {"_id" : ObjectId("50804391d94ccab2da652598")},
              { $set : {
                       isbn: '0547928211',
                       pages: 432
                       }
 })

               True agile development .
               Simply change how you work with
               the data and the database follows
The Updated Book record
db.books.findOne()
{
    "_id" : ObjectId("50804ec7d94ccab2da65259a"),
    "author" : ObjectId("507ffbb1d94ccab2da652597"),
    "genre" : [ "fantasy", "adventure" ],
    "isbn" : "0395082544",
    "language" : "english",
    "pages" : 432,
    "publication" : {
              "name" : "george allen & unwin",
              "location" : "London",
              "date" : ISODate("1954-07-21T04:00:00Z")
    },
    "title" : "fellowship of the ring, the"
}
Creating indexes
> db.books.ensureIndex({title: 1})


> db.books.ensureIndex({genre : 1})


> db.books.ensureIndex({'publication.date': -1})
Finding author by book
> book = db.books.findOne(
            {"title" : "return of the king, the"})

> db.author.findOne({_id: book.author})
{
     "_id" : ObjectId("507ffbb1d94ccab2da652597"),
     "first_name" : "j.r.r.",
     "last_name" : "tolkien",
     "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world as
the creator of The Hobbit and The Lord of the Rings, was a professor of
Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of
Merton College until his retirement in 1959. His chief interest was the
linguistic aspects of the early English written tradition, but even as he
studied these classics he was creating a set of his own."
}
The Big Data
Story
Is actually two stories
Doers & Tellers talking about
different things
                  http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
Tellers
Doers
Doers talk a lot more about
actual solutions
They know it's a two sided
story

           Storage




          Processing
Take aways

• MongoDB and Hadoop
• MongoDB for storage &
 operations
• Hadoop for processing &
 analytics
MongoDB & Data
Processing
Applications have
 complex needs
• MongoDB ideal operational
 database
• MongoDB ideal for BIG data

• Not a data processing engine, but
 provides processing functionality
Many options for Processing
Data
• Process in MongoDB using   Map Reduce


• Process in MongoDB using Aggregation
 Framework


• Process outside MongoDB (using Hadoop)
MongoDB Map
Reduce
MongoDB Map Reduce
• MongoDB map reduce quite capable... but with limits

• - Javascript not best language for processing map
   reduce
• - Javascript limited in external data processing
   libraries
• - Adds load to data store
MongoDB
 Aggregation
• Most uses of MongoDB Map Reduce were for
 aggregation

• Aggregation Framework optimized for aggregate
 queries

• Realtime aggregation similar to SQL GroupBy
MongoDB & Hadoop
Introduction to MongoDB and Hadoop
DEMO
• Install Hadoop MongoDB Plugin
• Import tweets from twitter
• Write mapper
• Write reducer
• Call myself a data scientist
Installing Mongo-
  hadoop
                                                    https://gist.github.com/1887726
hadoop_version '0.23'
hadoop_path="/usr/local/Cellar/hadoop/$hadoop_version.0/libexec/lib"


git clone git://github.com/mongodb/mongo-hadoop.git
cd mongo-hadoop
sed -i '' "s/default/$hadoop_version/g" build.sbt
cd streaming
./build.sh
Groking Twitter
curl 
https://stream.twitter.com/1/statuses/sample.json 
-u<login>:<password> 
| mongoimport -d test -c live




                            ... let it run for about 2 hours
DEMO 1
Map Hashtags in Java
public class TwitterMapper
        extends Mapper<Object, BSONObject, Text, IntWritable> {


    @Override
    public void map( final Object pKey,
                  final BSONObject pValue,
                  final Context pContext )
            throws IOException, InterruptedException{


        BSONObject entities = (BSONObject)pValue.get("entities");
        if(entities == null) return;
        BasicBSONList hashtags = (BasicBSONList)entities.get("hashtags");
        if(hashtags == null) return;


        for(Object o : hashtags){
            String tag = (String)((BSONObject)o).get("text");
            pContext.write( new Text( tag ), new IntWritable( 1 ) );
        }
    }
}
Reduce hashtags in
    Java
public class TwitterReducer

        extends Reducer<Text, IntWritable, Text, IntWritable> {
    @Override

    public void reduce( final Text pKey,

                    final Iterable<IntWritable> pValues,

                    final Context pContext )
            throws IOException, InterruptedException{

        int count = 0;
        for ( final IntWritable value : pValues ){

            count += value.get();

        }


        pContext.write( pKey, new IntWritable( count ) );

    }

}
All together
#!/bin/sh
export HADOOP_HOME="/Users/mike/hadoop/hadoop-1.0.4"
declare -a job_args
cd ..
job_args=("jar" "examples/twitter/target/twitter-example_*.jar")
job_args=(${job_args[@]} "com.mongodb.hadoop.examples.twitter.TwitterConfig ")
job_args=(${job_args[@]} "-D" "mongo.job.verbose=true")
job_args=(${job_args[@]} "-D" "mongo.job.background=false")
job_args=(${job_args[@]} "-D" "mongo.input.key=")
job_args=(${job_args[@]} "-D" "mongo.input.uri=mongodb://localhost:27017/test.live")
job_args=(${job_args[@]} "-D" "mongo.output.uri=mongodb://localhost:27017/test.twit_hashtags")
job_args=(${job_args[@]} "-D" "mongo.input.query=")
job_args=(${job_args[@]} "-D" "mongo.job.mapper=com.mongodb.hadoop.examples.twitter.TwitterMapper")
job_args=(${job_args[@]} "-D" "mongo.job.reducer=com.mongodb.hadoop.examples.twitter.TwitterReducer")
job_args=(${job_args[@]} "-D" "mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat")
job_args=(${job_args[@]} "-D" "mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat")
job_args=(${job_args[@]} "-D" "mongo.job.output.key=org.apache.hadoop.io.Text")
job_args=(${job_args[@]} "-D" "mongo.job.output.value=org.apache.hadoop.io.IntWritable")
job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.key=org.apache.hadoop.io.Text")
job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.value=org.apache.hadoop.io.IntWritable")
job_args=(${job_args[@]} "-D" "mongo.job.combiner=com.mongodb.hadoop.examples.twitter.TwitterReducer")
job_args=(${job_args[@]} "-D" "mongo.job.partitioner=")
job_args=(${job_args[@]} "-D" "mongo.job.sort_comparator=")

#echo "${job_args[@]}"
$HADOOP_HOME/bin/hadoop "${job_args[@]}" "$1"
Popular Hash
Tags
db.twit_hashtags.find().sort( {'count' : -1 })
{   "_id"   :   "YouKnowYoureInLoveIf", "count" : 287 }
{   "_id"   :   "teamfollowback", "count" : 200 }
{   "_id"   :   "RT", "count" : 150 }
{   "_id"   :   "Arsenal", "count" : 148 }
{   "_id"   :   "milars", "count" : 145 }
{   "_id"   :   "sanremo", "count" : 145 }
{   "_id"   :   "LoseMyNumberIf", "count" : 139 }
{   "_id"   :   "RelationshipsShould", "count" : 137 }
{   "_id"   :   "oomf", "count" : 117 }
{   "_id"   :   "TeamFollowBack", "count" : 105 }
{   "_id"   :   "WhyDoPeopleThink", "count" : 102 }
{   "_id"   :   "np", "count" : 100 }
DEMO 2
Aggregation in Mongo
2.2
db.live.aggregate(
    { $unwind : "$entities.hashtags" } ,
    { $match :
      { "entities.hashtags.text" :
         { $exists : true } } } ,
    { $group :
      { _id : "$entities.hashtags.text",
      count : { $sum : 1 } } } ,
    { $sort : { count : -1 } },
    { $limit : 10 }
)
Popular Hash
Tags
db.twit_hashtags.aggregate(a){
    "result" : [
      { "_id" : "YouKnowYoureInLoveIf", "count" : 287 },
      { "_id" : "teamfollowback", "count" : 200 },
      { "_id" : "RT", "count" : 150 },
      { "_id" : "Arsenal", "count" : 148 },
      { "_id" : "milars", "count" : 145 },
      { "_id" : "sanremo","count" : 145 },
      { "_id" : "LoseMyNumberIf", "count" : 139 },
      { "_id" : "RelationshipsShould", "count" : 137 },
    ],"ok" : 1
}
#MongoDB




Questions?
Steve Francia
Chief Evangelist, 10gen
@spf13
Spf13.com

More Related Content

PPTX
Back to Basics Webinar 2: Your First MongoDB Application
KEY
MongoDB and hadoop
PPTX
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
PPTX
Webinar: Back to Basics: Thinking in Documents
PDF
MongoDB, Hadoop and humongous data - MongoSV 2012
PPTX
Webinar: Getting Started with MongoDB - Back to Basics
PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
PPTX
Beyond the Basics 2: Aggregation Framework
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB and hadoop
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Webinar: Back to Basics: Thinking in Documents
MongoDB, Hadoop and humongous data - MongoSV 2012
Webinar: Getting Started with MongoDB - Back to Basics
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Beyond the Basics 2: Aggregation Framework

What's hot (20)

PDF
Hadoop - MongoDB Webinar June 2014
PPTX
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
PDF
Data Processing and Aggregation with MongoDB
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
PPT
Introduction to MongoDB
PPTX
Back to Basics: My First MongoDB Application
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PPTX
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
PPT
Introduction to MongoDB
KEY
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
PPTX
The Aggregation Framework
PDF
Building Apps with MongoDB
PPTX
Back to Basics, webinar 2: La tua prima applicazione MongoDB
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
PPTX
PDF
MongoDB and Python
PDF
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Hadoop - MongoDB Webinar June 2014
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Data Processing and Aggregation with MongoDB
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Introduction to MongoDB
Back to Basics: My First MongoDB Application
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 1: Introduction to NoSQL
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Introduction to MongoDB
Analytics with MongoDB Aggregation Framework and Hadoop Connector
The Aggregation Framework
Building Apps with MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB and Python
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Ad

Viewers also liked (20)

PPTX
MongoDB and Hadoop
PPTX
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
PDF
Using MongoDB + Hadoop Together
PPTX
MongoDB et Hadoop
PPTX
Morning With MongoDB
KEY
MongoDB, Hadoop and Humongous Data
PDF
Matrix methods for Hadoop
PDF
Building Awesome CLI apps in Go
PDF
Time series with apache cassandra strata
KEY
MongoDB vs Mysql. A devops point of view
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PDF
NoSQL into E-Commerce: lessons learned
PDF
7 Common Mistakes in Go (2015)
PPTX
Recommender System at Scale Using HBase and Hadoop
KEY
NoSQL databases and managing big data
PPTX
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
KEY
OSCON 2012 MongoDB Tutorial
PPTX
Aadhaar at 5th_elephant_v3
PPTX
Big Data Testing: Ensuring MongoDB Data Quality
PDF
The Future of the Operating System - Keynote LinuxCon 2015
MongoDB and Hadoop
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Using MongoDB + Hadoop Together
MongoDB et Hadoop
Morning With MongoDB
MongoDB, Hadoop and Humongous Data
Matrix methods for Hadoop
Building Awesome CLI apps in Go
Time series with apache cassandra strata
MongoDB vs Mysql. A devops point of view
MongoDB & Hadoop - Understanding Your Big Data
NoSQL into E-Commerce: lessons learned
7 Common Mistakes in Go (2015)
Recommender System at Scale Using HBase and Hadoop
NoSQL databases and managing big data
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
OSCON 2012 MongoDB Tutorial
Aadhaar at 5th_elephant_v3
Big Data Testing: Ensuring MongoDB Data Quality
The Future of the Operating System - Keynote LinuxCon 2015
Ad

Similar to Introduction to MongoDB and Hadoop (20)

PDF
buildyourfirstmongodbappberlin2013thomas-130313104259-phpapp02.pdf
PDF
Agile Schema Design: An introduction to MongoDB
PPT
Building Your First App with MongoDB
PDF
Building Your First App: An Introduction to MongoDB
PPTX
Building Your First App: An Introduction to MongoDB
PDF
Latinoware
PPTX
Building Your First App with MongoDB
KEY
Mongodb intro
PPTX
Building Your First App: An Introduction to MongoDB
PPTX
Building Your First App: An Introduction to MongoDB
PPTX
Building Your First App with MongoDB
PPTX
Building Your First App with MongoDB
KEY
Managing Social Content with MongoDB
PDF
Full metal mongo
KEY
MongoDB - Ruby document store that doesn't rhyme with ouch
PDF
10gen Presents Schema Design and Data Modeling
PDF
Data as Documents: Overview and intro to MongoDB
PPTX
Einführung in MongoDB
PDF
The emerging world of mongo db csp
PDF
Mongo db
buildyourfirstmongodbappberlin2013thomas-130313104259-phpapp02.pdf
Agile Schema Design: An introduction to MongoDB
Building Your First App with MongoDB
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
Latinoware
Building Your First App with MongoDB
Mongodb intro
Building Your First App: An Introduction to MongoDB
Building Your First App: An Introduction to MongoDB
Building Your First App with MongoDB
Building Your First App with MongoDB
Managing Social Content with MongoDB
Full metal mongo
MongoDB - Ruby document store that doesn't rhyme with ouch
10gen Presents Schema Design and Data Modeling
Data as Documents: Overview and intro to MongoDB
Einführung in MongoDB
The emerging world of mongo db csp
Mongo db

More from Steven Francia (20)

PDF
State of the Gopher Nation - Golang - August 2017
PDF
What every successful open source project needs
PDF
7 Common mistakes in Go and when to avoid them
PDF
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
PDF
Painless Data Storage with MongoDB & Go
PDF
Getting Started with Go
PDF
Modern Database Systems (for Genealogy)
PPTX
Future of data
KEY
Big data for the rest of us
KEY
Replication, Durability, and Disaster Recovery
KEY
Multi Data Center Strategies
KEY
MongoDB for Genealogy
KEY
Hybrid MongoDB and RDBMS Applications
KEY
Building your first application w/mongoDB MongoSV2011
KEY
MongoDB, E-commerce and Transactions
KEY
MongoDB, PHP and the cloud - php cloud summit 2011
KEY
MongoDB and PHP ZendCon 2011
KEY
Blending MongoDB and RDBMS for ecommerce
KEY
Augmenting RDBMS with MongoDB for ecommerce
KEY
MongoDB and Ecommerce : A perfect combination
State of the Gopher Nation - Golang - August 2017
What every successful open source project needs
7 Common mistakes in Go and when to avoid them
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Painless Data Storage with MongoDB & Go
Getting Started with Go
Modern Database Systems (for Genealogy)
Future of data
Big data for the rest of us
Replication, Durability, and Disaster Recovery
Multi Data Center Strategies
MongoDB for Genealogy
Hybrid MongoDB and RDBMS Applications
Building your first application w/mongoDB MongoSV2011
MongoDB, E-commerce and Transactions
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB and PHP ZendCon 2011
Blending MongoDB and RDBMS for ecommerce
Augmenting RDBMS with MongoDB for ecommerce
MongoDB and Ecommerce : A perfect combination

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
cuic standard and advanced reporting.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced IT Governance
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
GamePlan Trading System Review: Professional Trader's Honest Take
cuic standard and advanced reporting.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Modernizing your data center with Dell and AMD
NewMind AI Monthly Chronicles - July 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced IT Governance
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Introduction to MongoDB and Hadoop

  • 1. #MongoDB Introduction to MongoDB & MongoDB + Hadoop Steve Francia Chief Evangelist, 10gen
  • 3. MongoDB is a ___________ database • Document • Open source • High performance • Horizontally scalable • Full featured
  • 4. Document Database • Not for .PDF & .DOC files • A document is essentially an associative array • Document == JSON object • Document == PHP Array • Document == Python Dict • Document == Ruby Hash • etc
  • 5. Open Source • MongoDB is an open source project • On GitHub • Licensed under the AGPL • Started & sponsored by 10gen • Commercial licenses available • Contributions welcome
  • 6. High Performance • Written in C++ • Extensive use of memory-mapped files i.e. read-through write-through memory caching. • Runs nearly everywhere • Data serialized as BSON (fast parsing) • Full support for primary & secondary indexes • Document model = less work
  • 8. Full Featured • Ad Hoc queries • Real time aggregation • Rich query capabilities • Traditionally consistent • Geospatial features • Support for most programming languages • Flexible schema
  • 13. RDBMS MongoDB Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Partition ➜ Shard Terminology
  • 17. Creating an author > db.author.insert({ first_name: 'j.r.r.', last_name: 'tolkien', bio: 'J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own.' })
  • 18. Querying for our author > db.author.findOne( { last_name : 'tolkien' } ) { "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own." }
  • 19. Creating a Book > db.books.insert({ title: 'fellowship of the ring, the', author: ObjectId("507ffbb1d94ccab2da652597"), language: 'english', genre: ['fantasy', 'adventure'], publication: { name: 'george allen & unwin', location: 'London', date: new Date('21 July 1954'), } }) http://society6.com/PastaSoup/The-Fellowship-of-the-Ring-ZZc_Print/
  • 20. Multiple values per key > db.books.findOne({language: 'english'}, {genre: 1}) { "_id" : ObjectId("50804391d94ccab2da652598"), "genre" : [ "fantasy", "adventure" ] }
  • 21. Querying for key with multiple values > db.books.findOne({genre: 'fantasy'}, {title: 1}) { "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the" } Query key with single value or multiple values the same way.
  • 22. Nested Values > db.books.findOne({}, {publication: 1}) { "_id" : ObjectId("50804ec7d94ccab2da65259a"), "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") } }
  • 23. Reach into nested values using dot notation > db.books.findOne( {'publication.date' : { $lt : new Date('21 June 1960')} } ) { "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the", "author" : ObjectId("507ffbb1d94ccab2da652597"), "language" : "english", "genre" : [ "fantasy", "adventure" ], "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") } }
  • 24. Update books > db.books.update( {"_id" : ObjectId("50804391d94ccab2da652598")}, { $set : { isbn: '0547928211', pages: 432 } }) True agile development . Simply change how you work with the data and the database follows
  • 25. The Updated Book record db.books.findOne() { "_id" : ObjectId("50804ec7d94ccab2da65259a"), "author" : ObjectId("507ffbb1d94ccab2da652597"), "genre" : [ "fantasy", "adventure" ], "isbn" : "0395082544", "language" : "english", "pages" : 432, "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") }, "title" : "fellowship of the ring, the" }
  • 26. Creating indexes > db.books.ensureIndex({title: 1}) > db.books.ensureIndex({genre : 1}) > db.books.ensureIndex({'publication.date': -1})
  • 27. Finding author by book > book = db.books.findOne( {"title" : "return of the king, the"}) > db.author.findOne({_id: book.author}) { "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own." }
  • 29. Is actually two stories
  • 30. Doers & Tellers talking about different things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
  • 32. Doers
  • 33. Doers talk a lot more about actual solutions
  • 34. They know it's a two sided story Storage Processing
  • 35. Take aways • MongoDB and Hadoop • MongoDB for storage & operations • Hadoop for processing & analytics
  • 37. Applications have complex needs • MongoDB ideal operational database • MongoDB ideal for BIG data • Not a data processing engine, but provides processing functionality
  • 38. Many options for Processing Data • Process in MongoDB using Map Reduce • Process in MongoDB using Aggregation Framework • Process outside MongoDB (using Hadoop)
  • 40. MongoDB Map Reduce • MongoDB map reduce quite capable... but with limits • - Javascript not best language for processing map reduce • - Javascript limited in external data processing libraries • - Adds load to data store
  • 41. MongoDB Aggregation • Most uses of MongoDB Map Reduce were for aggregation • Aggregation Framework optimized for aggregate queries • Realtime aggregation similar to SQL GroupBy
  • 44. DEMO • Install Hadoop MongoDB Plugin • Import tweets from twitter • Write mapper • Write reducer • Call myself a data scientist
  • 45. Installing Mongo- hadoop https://gist.github.com/1887726 hadoop_version '0.23' hadoop_path="/usr/local/Cellar/hadoop/$hadoop_version.0/libexec/lib" git clone git://github.com/mongodb/mongo-hadoop.git cd mongo-hadoop sed -i '' "s/default/$hadoop_version/g" build.sbt cd streaming ./build.sh
  • 46. Groking Twitter curl https://stream.twitter.com/1/statuses/sample.json -u<login>:<password> | mongoimport -d test -c live ... let it run for about 2 hours
  • 48. Map Hashtags in Java public class TwitterMapper extends Mapper<Object, BSONObject, Text, IntWritable> { @Override public void map( final Object pKey, final BSONObject pValue, final Context pContext ) throws IOException, InterruptedException{ BSONObject entities = (BSONObject)pValue.get("entities"); if(entities == null) return; BasicBSONList hashtags = (BasicBSONList)entities.get("hashtags"); if(hashtags == null) return; for(Object o : hashtags){ String tag = (String)((BSONObject)o).get("text"); pContext.write( new Text( tag ), new IntWritable( 1 ) ); } } }
  • 49. Reduce hashtags in Java public class TwitterReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce( final Text pKey, final Iterable<IntWritable> pValues, final Context pContext ) throws IOException, InterruptedException{ int count = 0; for ( final IntWritable value : pValues ){ count += value.get(); } pContext.write( pKey, new IntWritable( count ) ); } }
  • 50. All together #!/bin/sh export HADOOP_HOME="/Users/mike/hadoop/hadoop-1.0.4" declare -a job_args cd .. job_args=("jar" "examples/twitter/target/twitter-example_*.jar") job_args=(${job_args[@]} "com.mongodb.hadoop.examples.twitter.TwitterConfig ") job_args=(${job_args[@]} "-D" "mongo.job.verbose=true") job_args=(${job_args[@]} "-D" "mongo.job.background=false") job_args=(${job_args[@]} "-D" "mongo.input.key=") job_args=(${job_args[@]} "-D" "mongo.input.uri=mongodb://localhost:27017/test.live") job_args=(${job_args[@]} "-D" "mongo.output.uri=mongodb://localhost:27017/test.twit_hashtags") job_args=(${job_args[@]} "-D" "mongo.input.query=") job_args=(${job_args[@]} "-D" "mongo.job.mapper=com.mongodb.hadoop.examples.twitter.TwitterMapper") job_args=(${job_args[@]} "-D" "mongo.job.reducer=com.mongodb.hadoop.examples.twitter.TwitterReducer") job_args=(${job_args[@]} "-D" "mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat") job_args=(${job_args[@]} "-D" "mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat") job_args=(${job_args[@]} "-D" "mongo.job.output.key=org.apache.hadoop.io.Text") job_args=(${job_args[@]} "-D" "mongo.job.output.value=org.apache.hadoop.io.IntWritable") job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.key=org.apache.hadoop.io.Text") job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.value=org.apache.hadoop.io.IntWritable") job_args=(${job_args[@]} "-D" "mongo.job.combiner=com.mongodb.hadoop.examples.twitter.TwitterReducer") job_args=(${job_args[@]} "-D" "mongo.job.partitioner=") job_args=(${job_args[@]} "-D" "mongo.job.sort_comparator=") #echo "${job_args[@]}" $HADOOP_HOME/bin/hadoop "${job_args[@]}" "$1"
  • 51. Popular Hash Tags db.twit_hashtags.find().sort( {'count' : -1 }) { "_id" : "YouKnowYoureInLoveIf", "count" : 287 } { "_id" : "teamfollowback", "count" : 200 } { "_id" : "RT", "count" : 150 } { "_id" : "Arsenal", "count" : 148 } { "_id" : "milars", "count" : 145 } { "_id" : "sanremo", "count" : 145 } { "_id" : "LoseMyNumberIf", "count" : 139 } { "_id" : "RelationshipsShould", "count" : 137 } { "_id" : "oomf", "count" : 117 } { "_id" : "TeamFollowBack", "count" : 105 } { "_id" : "WhyDoPeopleThink", "count" : 102 } { "_id" : "np", "count" : 100 }
  • 53. Aggregation in Mongo 2.2 db.live.aggregate( { $unwind : "$entities.hashtags" } , { $match : { "entities.hashtags.text" : { $exists : true } } } , { $group : { _id : "$entities.hashtags.text", count : { $sum : 1 } } } , { $sort : { count : -1 } }, { $limit : 10 } )
  • 54. Popular Hash Tags db.twit_hashtags.aggregate(a){ "result" : [ { "_id" : "YouKnowYoureInLoveIf", "count" : 287 }, { "_id" : "teamfollowback", "count" : 200 }, { "_id" : "RT", "count" : 150 }, { "_id" : "Arsenal", "count" : 148 }, { "_id" : "milars", "count" : 145 }, { "_id" : "sanremo","count" : 145 }, { "_id" : "LoseMyNumberIf", "count" : 139 }, { "_id" : "RelationshipsShould", "count" : 137 }, ],"ok" : 1 }

Editor's Notes

  • #7: AGPL – GNU Affero General Public License
  • #8: * Big endian and ARM not supported.
  • #11: Kristine to update this graphic at some point
  • #16: Kristine to update this graphic at some point
  • #17: Kristine to update this graphic at some point
  • #19: Powerful message here. Finally a database that enables rapid &amp; agile development.
  • #21: Creating a book here. A few things to make note of.
  • #26: Powerful message here. Finally a database that enables rapid &amp; agile development.