SlideShare a Scribd company logo
Data Processing and Aggregation
Senior Solutions Architect, MongoDB Inc
massimo@mongodb.com.
Massimo Brignoli
@massimobrignoli
Chi sono?
•  Solutions Architect/Evangelist in MongoDB Inc.
•  24 anni di esperienza nel mondo dei database e dello
sviluppo software
•  Ex dipendente di MySQL e MariaDB
•  In precedenza: web,web,web
Big Data
Innovation
Understanding Big Data – It’s Not Very “Big”
from Big Data Executive Summary – 50+ top executives from Government and F500 firms
64% - Ingest diverse,
new data in real-time
15% - More than 100TB
of data
20% - Less than 100TB
(average of all? <20TB)
“I have not failed. I've just found 10,000 ways that won't work.”
― Thomas A. Edison
Tante grandi innovazioni dal 1970…
Ma usereste una di queste
tecnologie per lanciare un
nuovo business oggi?
Incluso il modello relazionale dei dati!
Per quali computer è stato
pensato il modello
relazionale?
Questi erano i computer!
E lo Storage?
E come si sviluppava il software?
pio, il LISP (LISt Processing language) [24].
A quel tempo, i problemi significativi non ri-
denti con interfacce chiare e componibili. Si
diffusero concetti quali la programmazione
1
ei
gi
Processo Bisogno Linguaggio
1950
1960
1970
1980
1990
2000
Primi tentativi di “ordine”
nello sviluppo
Comprensibilità e portabilità del codice,
per sostenere la sua evoluzione
Organizzazione “industriale”
dello sviluppo dei sistemi software
Impossibilità di definire in modo
preciso il sistema da sviluppare
Sviluppo e distribuzione molto
rapidi e orientati ai sistemi
di comunicazione
Waterfall, a “V”, ...
Incrementale, Spirale, ...
Metodologie agili
Linguaggi assemblativi
Linguaggi di alto livello
Linguaggi strutturati
Linguaggi orientati agli oggetti
Linguaggi per lo sviluppo
dinamico
RDBMS Rende lo Sviluppo Difficile
Relational
Database
Object Relational
Mapping
Application
Code XML Config DB Schema
E Ancora Più Difficile Evolverlo…
New
Table
New
Table
New
Column
Name Pet Phone Email
New
Column
3 months later…
RDBMS
Dalla Complessità alla Semplicità..
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
Che cos’è un Record?
Chiave → Valore
•  Storage mono-dimensionale
•  Il singolo valore e’un blob
•  Le query sono solo per chiave
•  Nessuno schema
•  I valore non può essere aggiornato ma solamente
sovrascritto
Key Blob
Relazionale
•  Storage bi-dimensionale (tuple)
•  Ogni campo contiene solo un valore
•  Query sono su ogni campo
•  Schema molto strutturato (tabelle)
•  Update sul posto
•  Il processo di normalizzazione richiede molte tabelle,
indici e con una pessima localizzazione dei dati.
Primary
Key
Documento
•  Storage N-dimensionale
•  Ogni campo può contenere 0,1,
tanti o valori incapsulati
•  Query su tutti i campi e livelli
•  Schema dinamico
•  Update in linea
•  Incapsulare i dati migliora la localizzazione dei dati,
richiede meno indici e ha migliori performance
_id
For over a decade
Big Data == Custom Software
In the past few years
Open source software has
emerged enabling the rest of
us to handle Big Data
How MongoDB Meets Our Requirements
•  MongoDB is an operational database
•  MongoDB provides high performance for storage and
retrieval at large scale
•  MongoDB has a robust query interface permitting
intelligent operations
•  MongoDB is not a data processing engine,but provides
processing functionality
http://www.flickr.com/photos/torek/4444673930/
MongoDB data processing options
Getting Example Data
The“hello world”of
MapReduce is counting words
in a paragraph of text.
Let’s try something a little more
interesting…
What is the most popular pub name?
#!/usr/bin/env python

# Data Source
# http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]

import re
import sys

from imposm.parser import OSMParser
import pymongo

class Handler(object):
def nodes(self, nodes):
if not nodes:
return
docs = []
for node in nodes:
osm_id, doc, (lon, lat) = node
if "name" not in doc:
node_points[osm_id] = (lon, lat)
continue
doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&")
doc["_id"] = osm_id
doc["location"] = {"type": "Point", "coordinates": [lon, lat]}
docs.append(doc)
collection.insert(docs)


Open Street Map Data
{
"_id" : 451152,
"amenity" : "pub",
"name" : "The Dignity",
"addr:housenumber" : "363",
"addr:street" : "Regents Park Road",
"addr:city" : "London",
"addr:postcode" : "N3 1DH",
"toilets" : "yes",
"toilets:access" : "customers",
"location" : {
"type" : "Point",
"coordinates" : [-0.1945732, 51.6008172]
}
}


Example Pub Data
MongoDB MapReduce• 
MongoDB
map
reduce
finalize
Map Function
> var map = function() { 
emit(this.name, 1);
MongoDB
map
reduce
finalize
Reduce Function
> var reduce = function (key, values) {
var sum = 0;
values.forEach( function (val) {sum +=
val;} );
return sum;
}
MongoDB
map
reduce
finalize
Results
> db.pubs.mapReduce(map, reduce, { out: "pub_names", 
query: { } } )
> db.pub_names.find().sort({value: -1}).limit(10)

{ "_id" : "The Red Lion", "value" : 407 }
{ "_id" : "The Royal Oak", "value" : 328 }
{ "_id" : "The Crown", "value" : 242 }
{ "_id" : "The White Hart", "value" : 214 }
{ "_id" : "The White Horse", "value" : 200 }
{ "_id" : "The New Inn", "value" : 187 }
{ "_id" : "The Plough", "value" : 185 }
{ "_id" : "The Rose & Crown", "value" : 164 }
{ "_id" : "The Wheatsheaf", "value" : 147 }
{ "_id" : "The Swan", "value" : 140 }
Data Processing and Aggregation with MongoDB
> db.pubs.mapReduce(map, reduce, { out: "pub_names", 
query: { 
location: { 
$within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] }
}}
})

{
"result" : "pub_names",
"timeMillis" : 116,
"counts" : {
"input" : 643,
"emit" : 643,
"reduce" : 54,
"output" : 537
},
"ok" : 1,
}


Pub Names in the Center of London
> db.pub_names.find().sort({value: -1}).limit(10)

{ "_id" : "All Bar One", "value" : 11 }
{ "_id" : "The Slug & Lettuce", "value" : 7 }
{ "_id" : "The Coach & Horses", "value" : 6 }
{ "_id" : "The Green Man", "value" : 5 }
{ "_id" : "The Kings Arms", "value" : 5 }
{ "_id" : "The Red Lion", "value" : 5 }
{ "_id" : "Corney & Barrow", "value" : 4 }
{ "_id" : "O'Neills", "value" : 4 }
{ "_id" : "Pitcher & Piano", "value" : 4 }
{ "_id" : "The Crown", "value" : 4 }


Results
MongoDB MapReduce
•  Real-time
•  Output directly to document or collection
•  Runs inside MongoDB on local data
− Adds load to your DB
− In Javascript–debugging can be a challenge
− Translating in and out of C++
Aggregation Framework
Aggregation Framework• 
MongoDB
op1
op2
opN
Aggregation Framework in 60 Seconds
Aggregation Framework Operators
•  $project
•  $match
•  $limit
•  $skip
•  $sort
•  $unwind
•  $group
$match
•  Filter documents
•  Uses existing query syntax
•  If using $geoNear it has to be first in pipeline
•  $where is not supported
Matching Field Values
{
"_id" : 271421,
"amenity" : "pub",
"name" : "Sir Walter Tyrrell",
"location" : {
"type" : "Point",
"coordinates" : [
-1.6192422,
50.9131996
]
}
}

{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}
Matching Field Values
{ "$match": {
"name": "The Red Lion"
}}
{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]}
}
$project
•  Reshape documents
•  Include,exclude or rename fields
•  Inject computed fields
•  Create sub-document fields
Including and Excluding Fields
{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red
Lion",
"location" : {
"type" :
"Point",
"coordinates" :
[
-1.5494749,
50.7837119
]
}
}
{ “$project”: {

“_id”: 0, 
“amenity”: 1, 
“name”: 1, 

}}
{
“amenity” : “pub”,
“name” : “The Red Lion”
}
Reformatting Documents
{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red
Lion",
"location" : {
"type" :
"Point",
"coordinates" :
[
-1.5494749,
50.7837119
]
}
}
{ “$project”: {

“_id”: 0, 
“name”: 1, 
“meta”: {

“type”: “$amenity”}
}}

{
“name” : “The Red Lion”
“meta” : {
“type” : “pub”
}}
$group
•  Group documents by an ID
•  Field reference,object,constant
•  Other output fields are computed
$max,$min,$avg,$sum
$addToSet,$push $first,$last
•  Processes all data in memory
Back to the pub!
•  http://www.offwestend.com/index.php/theatres/pastshows/71
Popular Pub Names
>var popular_pub_names = [
{ $match : location: 

{ $within: { $centerSphere: 

 
[[-0.12, 51.516], 2 / 3959]}}}
}, 
{ $group :

{ _id: “$name”

value: {$sum: 1} }
}, 
{ $sort : {value: -1} },
{ $limit : 10 }
> db.pubs.aggregate(popular_pub_names)
{
"result" : [
{ "_id" : "All Bar One", "value" : 11 }
{ "_id" : "The Slug & Lettuce", "value" : 7 }
{ "_id" : "The Coach & Horses", "value" : 6 }
{ "_id" : "The Green Man", "value" : 5 }
{ "_id" : "The Kings Arms", "value" : 5 }
{ "_id" : "The Red Lion", "value" : 5 }
{ "_id" : "Corney & Barrow", "value" : 4 }
{ "_id" : "O'Neills", "value" : 4 }
{ "_id" : "Pitcher & Piano", "value" : 4 }
{ "_id" : "The Crown", "value" : 4 }
],
"ok" : 1
}

Results
Aggregation Framework Benefits
•  Real-time
•  Simple yet powerful interface
•  Declared in JSON,executes in C++
•  Runs inside MongoDB on local data
− Adds load to your DB
− Limited Operators
− Data output is limited
Analyzing MongoDB Data in
External Systems
MongoDB with Hadoop• 
MongoDB
MongoDB with Hadoop• 
MongoDB warehouse
MongoDB with Hadoop
• 
MongoDBETL
#!/usr/bin/env python
from pymongo_hadoop import BSONMapper

def mapper(documents):
bounds = get_bounds() # ~2 mile polygon
for doc in documents:
geo = get_geo(doc["location"]) # Convert the geo
type
if not geo:
continue
if bounds.intersects(geo):
yield {'_id': doc['name'], 'count': 1}

BSONMapper(mapper)
print >> sys.stderr, "Done Mapping."

Map Pub Names in Python
#!/usr/bin/env python

from pymongo_hadoop import BSONReducer

def reducer(key, values):
_count = 0
for v in values:
_count += v['count']
return {'_id': key, 'value': _count}

BSONReducer(reducer)


Reduce Pub Names in Python
hadoop jar target/mongo-hadoop-streaming-
assembly-1.1.0-rc0.jar  
-mapper examples/pub/map.py 
-reducer examples/pub/reduce.py 
-mongo mongodb://127.0.0.1/demo.pubs 
-outputURI mongodb://127.0.0.1/demo.pub_names


Execute MapReduce
> db.pub_names.find().sort({value: -1}).limit(10)

{ "_id" : "All Bar One", "value" : 11 }
{ "_id" : "The Slug & Lettuce", "value" : 7 }
{ "_id" : "The Coach & Horses", "value" : 6 }
{ "_id" : "The Kings Arms", "value" : 5 }
{ "_id" : "Corney & Barrow", "value" : 4 }
{ "_id" : "O'Neills", "value" : 4 }
{ "_id" : "Pitcher & Piano", "value" : 4 }
{ "_id" : "The Crown", "value" : 4 }
{ "_id" : "The George", "value" : 4 }
{ "_id" : "The Green Man", "value" : 4 }


Popular Pub Names Nearby
MongoDB and Hadoop
•  Away from data store
•  Can leverage existing data processing infrastructure
•  Can horizontally scale your data processing
-  Offline batch processing
-  Requires synchronisation between store & processor
-  Infrastructure is much more complex
The Future of Big Data and
MongoDB
What is Big Data?
Big Data today will be normal
tomorrow
Exponential Data Growth
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
2000 2002 2004 2006 2008 2010 2012
Billions of URLs indexed by Google
MongoDB enables you to
scale big
MongoDB is evolving
so you can process the big
Data Processing with MongoDB
•  Process in MongoDB using Map/Reduce
•  Process in MongoDB using Aggregation Framework
•  Process outside MongoDB using Hadoop and other
external tools
MongoDB Integration
•  Hadoop
https://github.com/mongodb/mongo-hadoop
•  Storm
https://github.com/christkv/mongo-storm
•  Disco
https://github.com/mongodb/mongo-disco
•  Spark
Coming soon!
Questions?
Thanks!
massimo@mongodb.com
Massimo Brignoli
@massimobrignoli
Ad

Recommended

Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Caserta
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
The Aggregation Framework
The Aggregation Framework
MongoDB
 
Aggregation Framework
Aggregation Framework
MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
MongoDB and Python
MongoDB and Python
Norberto Leite
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
MongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 

More Related Content

What's hot (20)

Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
MongoDB and Python
MongoDB and Python
Norberto Leite
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
MongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
Anuj Jain
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
Joe Drumgoole
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
MongoDB
 
Introduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
MongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
MongoDB
 

Viewers also liked (20)

MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
Shafaq Abdullah
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
Takahiro Inoue
 
The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3
vjaquez
 
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Ofir Manor
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
Using MongoDB and Python - Demo
Using MongoDB and Python - Demo
Mike Bright
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
Ontico
 
MongoDB and hadoop
MongoDB and hadoop
Steven Francia
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Machine Learning techniques
Machine Learning techniques
Jigar Patel
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
International Journal of Engineering Inventions www.ijeijournal.com
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routing
butest
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you know
cdathuraliya
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
Gianluca Bontempi
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
mark madsen
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...
zukun
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
potaters
 
Streamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve Productivity
Kevin Fream
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB
 
MongoDB MapReduce Business Intelligence
MongoDB MapReduce Business Intelligence
Shafaq Abdullah
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
Takahiro Inoue
 
The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3
vjaquez
 
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Ofir Manor
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
Using MongoDB and Python - Demo
Using MongoDB and Python - Demo
Mike Bright
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
Ontico
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Machine Learning techniques
Machine Learning techniques
Jigar Patel
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routing
butest
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you know
cdathuraliya
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
Gianluca Bontempi
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
mark madsen
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...
zukun
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
potaters
 
Streamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve Productivity
Kevin Fream
 
Ad

Similar to Data Processing and Aggregation with MongoDB (20)

Mongodb intro
Mongodb intro
christkv
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
Antonio Pintus
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
MongoDB in FS
MongoDB in FS
MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
MongoDB at GUL
MongoDB at GUL
Israel Gutiérrez
 
Building web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
MongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
NETWAYS
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
MongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
MongoDB
 
managing big data
managing big data
Suveeksha
 
Managing Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Mongodb intro
Mongodb intro
christkv
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
Antonio Pintus
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
MongoDB in FS
MongoDB in FS
MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
Building web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
MongoDB Tick Data Presentation
MongoDB Tick Data Presentation
MongoDB
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
NETWAYS
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
MongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
MongoDB
 
managing big data
managing big data
Suveeksha
 
Managing Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Ad

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
Oracle Cloud Infrastructure Generative AI Professional
Oracle Cloud Infrastructure Generative AI Professional
VICTOR MAESTRE RAMIREZ
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 

Data Processing and Aggregation with MongoDB

  • 1. Data Processing and Aggregation Senior Solutions Architect, MongoDB Inc [email protected]. Massimo Brignoli @massimobrignoli
  • 2. Chi sono? •  Solutions Architect/Evangelist in MongoDB Inc. •  24 anni di esperienza nel mondo dei database e dello sviluppo software •  Ex dipendente di MySQL e MariaDB •  In precedenza: web,web,web
  • 5. Understanding Big Data – It’s Not Very “Big” from Big Data Executive Summary – 50+ top executives from Government and F500 firms 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB)
  • 6. “I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
  • 8. Ma usereste una di queste tecnologie per lanciare un nuovo business oggi?
  • 9. Incluso il modello relazionale dei dati!
  • 10. Per quali computer è stato pensato il modello relazionale?
  • 11. Questi erano i computer!
  • 13. E come si sviluppava il software? pio, il LISP (LISt Processing language) [24]. A quel tempo, i problemi significativi non ri- denti con interfacce chiare e componibili. Si diffusero concetti quali la programmazione 1 ei gi Processo Bisogno Linguaggio 1950 1960 1970 1980 1990 2000 Primi tentativi di “ordine” nello sviluppo Comprensibilità e portabilità del codice, per sostenere la sua evoluzione Organizzazione “industriale” dello sviluppo dei sistemi software Impossibilità di definire in modo preciso il sistema da sviluppare Sviluppo e distribuzione molto rapidi e orientati ai sistemi di comunicazione Waterfall, a “V”, ... Incrementale, Spirale, ... Metodologie agili Linguaggi assemblativi Linguaggi di alto livello Linguaggi strutturati Linguaggi orientati agli oggetti Linguaggi per lo sviluppo dinamico
  • 14. RDBMS Rende lo Sviluppo Difficile Relational Database Object Relational Mapping Application Code XML Config DB Schema
  • 15. E Ancora Più Difficile Evolverlo… New Table New Table New Column Name Pet Phone Email New Column 3 months later…
  • 16. RDBMS Dalla Complessità alla Semplicità.. MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 17. Che cos’è un Record?
  • 18. Chiave → Valore •  Storage mono-dimensionale •  Il singolo valore e’un blob •  Le query sono solo per chiave •  Nessuno schema •  I valore non può essere aggiornato ma solamente sovrascritto Key Blob
  • 19. Relazionale •  Storage bi-dimensionale (tuple) •  Ogni campo contiene solo un valore •  Query sono su ogni campo •  Schema molto strutturato (tabelle) •  Update sul posto •  Il processo di normalizzazione richiede molte tabelle, indici e con una pessima localizzazione dei dati. Primary Key
  • 20. Documento •  Storage N-dimensionale •  Ogni campo può contenere 0,1, tanti o valori incapsulati •  Query su tutti i campi e livelli •  Schema dinamico •  Update in linea •  Incapsulare i dati migliora la localizzazione dei dati, richiede meno indici e ha migliori performance _id
  • 21. For over a decade Big Data == Custom Software
  • 22. In the past few years Open source software has emerged enabling the rest of us to handle Big Data
  • 23. How MongoDB Meets Our Requirements •  MongoDB is an operational database •  MongoDB provides high performance for storage and retrieval at large scale •  MongoDB has a robust query interface permitting intelligent operations •  MongoDB is not a data processing engine,but provides processing functionality
  • 26. The“hello world”of MapReduce is counting words in a paragraph of text. Let’s try something a little more interesting…
  • 27. What is the most popular pub name?
  • 28. #!/usr/bin/env python # Data Source # http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59] import re import sys from imposm.parser import OSMParser import pymongo class Handler(object): def nodes(self, nodes): if not nodes: return docs = [] for node in nodes: osm_id, doc, (lon, lat) = node if "name" not in doc: node_points[osm_id] = (lon, lat) continue doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&") doc["_id"] = osm_id doc["location"] = {"type": "Point", "coordinates": [lon, lat]} docs.append(doc) collection.insert(docs) Open Street Map Data
  • 29. { "_id" : 451152, "amenity" : "pub", "name" : "The Dignity", "addr:housenumber" : "363", "addr:street" : "Regents Park Road", "addr:city" : "London", "addr:postcode" : "N3 1DH", "toilets" : "yes", "toilets:access" : "customers", "location" : { "type" : "Point", "coordinates" : [-0.1945732, 51.6008172] } } Example Pub Data
  • 31. Map Function > var map = function() { emit(this.name, 1); MongoDB map reduce finalize
  • 32. Reduce Function > var reduce = function (key, values) { var sum = 0; values.forEach( function (val) {sum += val;} ); return sum; } MongoDB map reduce finalize
  • 33. Results > db.pubs.mapReduce(map, reduce, { out: "pub_names", query: { } } ) > db.pub_names.find().sort({value: -1}).limit(10) { "_id" : "The Red Lion", "value" : 407 } { "_id" : "The Royal Oak", "value" : 328 } { "_id" : "The Crown", "value" : 242 } { "_id" : "The White Hart", "value" : 214 } { "_id" : "The White Horse", "value" : 200 } { "_id" : "The New Inn", "value" : 187 } { "_id" : "The Plough", "value" : 185 } { "_id" : "The Rose & Crown", "value" : 164 } { "_id" : "The Wheatsheaf", "value" : 147 } { "_id" : "The Swan", "value" : 140 }
  • 35. > db.pubs.mapReduce(map, reduce, { out: "pub_names", query: { location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] } }} }) { "result" : "pub_names", "timeMillis" : 116, "counts" : { "input" : 643, "emit" : 643, "reduce" : 54, "output" : 537 }, "ok" : 1, } Pub Names in the Center of London
  • 36. > db.pub_names.find().sort({value: -1}).limit(10) { "_id" : "All Bar One", "value" : 11 } { "_id" : "The Slug & Lettuce", "value" : 7 } { "_id" : "The Coach & Horses", "value" : 6 } { "_id" : "The Green Man", "value" : 5 } { "_id" : "The Kings Arms", "value" : 5 } { "_id" : "The Red Lion", "value" : 5 } { "_id" : "Corney & Barrow", "value" : 4 } { "_id" : "O'Neills", "value" : 4 } { "_id" : "Pitcher & Piano", "value" : 4 } { "_id" : "The Crown", "value" : 4 } Results
  • 37. MongoDB MapReduce •  Real-time •  Output directly to document or collection •  Runs inside MongoDB on local data − Adds load to your DB − In Javascript–debugging can be a challenge − Translating in and out of C++
  • 41. Aggregation Framework Operators •  $project •  $match •  $limit •  $skip •  $sort •  $unwind •  $group
  • 42. $match •  Filter documents •  Uses existing query syntax •  If using $geoNear it has to be first in pipeline •  $where is not supported
  • 43. Matching Field Values { "_id" : 271421, "amenity" : "pub", "name" : "Sir Walter Tyrrell", "location" : { "type" : "Point", "coordinates" : [ -1.6192422, 50.9131996 ] } } { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } Matching Field Values { "$match": { "name": "The Red Lion" }} { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ]} }
  • 44. $project •  Reshape documents •  Include,exclude or rename fields •  Inject computed fields •  Create sub-document fields
  • 45. Including and Excluding Fields { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } { “$project”: { “_id”: 0, “amenity”: 1, “name”: 1, }} { “amenity” : “pub”, “name” : “The Red Lion” }
  • 46. Reformatting Documents { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } { “$project”: { “_id”: 0, “name”: 1, “meta”: { “type”: “$amenity”} }} { “name” : “The Red Lion” “meta” : { “type” : “pub” }}
  • 47. $group •  Group documents by an ID •  Field reference,object,constant •  Other output fields are computed $max,$min,$avg,$sum $addToSet,$push $first,$last •  Processes all data in memory
  • 48. Back to the pub! •  http://www.offwestend.com/index.php/theatres/pastshows/71
  • 49. Popular Pub Names >var popular_pub_names = [ { $match : location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959]}}} }, { $group : { _id: “$name” value: {$sum: 1} } }, { $sort : {value: -1} }, { $limit : 10 }
  • 50. > db.pubs.aggregate(popular_pub_names) { "result" : [ { "_id" : "All Bar One", "value" : 11 } { "_id" : "The Slug & Lettuce", "value" : 7 } { "_id" : "The Coach & Horses", "value" : 6 } { "_id" : "The Green Man", "value" : 5 } { "_id" : "The Kings Arms", "value" : 5 } { "_id" : "The Red Lion", "value" : 5 } { "_id" : "Corney & Barrow", "value" : 4 } { "_id" : "O'Neills", "value" : 4 } { "_id" : "Pitcher & Piano", "value" : 4 } { "_id" : "The Crown", "value" : 4 } ], "ok" : 1 } Results
  • 51. Aggregation Framework Benefits •  Real-time •  Simple yet powerful interface •  Declared in JSON,executes in C++ •  Runs inside MongoDB on local data − Adds load to your DB − Limited Operators − Data output is limited
  • 52. Analyzing MongoDB Data in External Systems
  • 56. #!/usr/bin/env python from pymongo_hadoop import BSONMapper def mapper(documents): bounds = get_bounds() # ~2 mile polygon for doc in documents: geo = get_geo(doc["location"]) # Convert the geo type if not geo: continue if bounds.intersects(geo): yield {'_id': doc['name'], 'count': 1} BSONMapper(mapper) print >> sys.stderr, "Done Mapping." Map Pub Names in Python
  • 57. #!/usr/bin/env python from pymongo_hadoop import BSONReducer def reducer(key, values): _count = 0 for v in values: _count += v['count'] return {'_id': key, 'value': _count} BSONReducer(reducer) Reduce Pub Names in Python
  • 58. hadoop jar target/mongo-hadoop-streaming- assembly-1.1.0-rc0.jar -mapper examples/pub/map.py -reducer examples/pub/reduce.py -mongo mongodb://127.0.0.1/demo.pubs -outputURI mongodb://127.0.0.1/demo.pub_names Execute MapReduce
  • 59. > db.pub_names.find().sort({value: -1}).limit(10) { "_id" : "All Bar One", "value" : 11 } { "_id" : "The Slug & Lettuce", "value" : 7 } { "_id" : "The Coach & Horses", "value" : 6 } { "_id" : "The Kings Arms", "value" : 5 } { "_id" : "Corney & Barrow", "value" : 4 } { "_id" : "O'Neills", "value" : 4 } { "_id" : "Pitcher & Piano", "value" : 4 } { "_id" : "The Crown", "value" : 4 } { "_id" : "The George", "value" : 4 } { "_id" : "The Green Man", "value" : 4 } Popular Pub Names Nearby
  • 60. MongoDB and Hadoop •  Away from data store •  Can leverage existing data processing infrastructure •  Can horizontally scale your data processing -  Offline batch processing -  Requires synchronisation between store & processor -  Infrastructure is much more complex
  • 61. The Future of Big Data and MongoDB
  • 62. What is Big Data? Big Data today will be normal tomorrow
  • 63. Exponential Data Growth 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 2000 2002 2004 2006 2008 2010 2012 Billions of URLs indexed by Google
  • 64. MongoDB enables you to scale big
  • 65. MongoDB is evolving so you can process the big
  • 66. Data Processing with MongoDB •  Process in MongoDB using Map/Reduce •  Process in MongoDB using Aggregation Framework •  Process outside MongoDB using Hadoop and other external tools
  • 67. MongoDB Integration •  Hadoop https://github.com/mongodb/mongo-hadoop •  Storm https://github.com/christkv/mongo-storm •  Disco https://github.com/mongodb/mongo-disco •  Spark Coming soon!