SlideShare a Scribd company logo
distilled
Boris Trofimov
Team Lead@Sigma Ukraine
@b0ris_1
btrofimoff@gmail.com
Agenda
●
Part 1. Why NoSQL
– SQL benefints and critics
– NoSQL challange
●
Part 2. MongoDB
– Overview
– Console and query example
– Java Integration
– Data consistancy
– Scaling
– Tips
Part 1. Why NoSQL
Relational DBMS Benefits
SQL
●
Simplicity
●
Uniform representation
●
Runtime schema modifications
SELECT DISTINCT p.LastName, p.FirstName
FROM Person.Person AS p
JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN
(SELECT Bonus
FROM Sales.SalesPerson AS sp
WHERE e.BusinessEntityID = sp.BusinessEntityID);
Strong schema definition
Strong consistency
SQL features like
Foreign and Primary Keys, Unique
fields
ACID (atomicity, consistency, isolation,
durability) transactions
Business transactions ~ system transactions
RDBMS Criticism
Big gap between domain and
relational model
Performance Issues
JOINS Minimization Choosing right transaction strategyQuery Optimization
Consistency costs too much
Normalization Impact Performance issues
Schema migration issues
Consistency issues
Reinventing bicycle
Involving external tools like DBDeploy
Scaling options
Consistency issues
Poor scaling options
SQL Opposition
●
Object Databases by OMG
●
ORM
●
?
No SQL Yes
●
Transactionaless in usual understanding
●
Schemaless, no migration
●
Closer to domain
●
Focused on aggregates
●
Trully scalable
NoSQL Umbrella
Key-Value Databases
Column-Family Databases
Document-oriented Databases
Graph-oriented Databases
Aggregate oriented Databases
●
Document databases implement idea of Aggregate
oriented database.
●
Aggregate is a storage atom
●
Aggregate oriented databsaes are closer to application
domain.
●
Ensures atomic operations with aggregate
●
Aggregate might be replicated or sharded efficiently
●
Major question: to embed or not to embed
Relations vs Aggregates
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
],
}
Relational Model Document Model
Part 2. MongoDB
MongoDB Basics
MongoDB is document-
oriented and DBMS
MongoDB is Client-Server
DBMS
Mongo DB = Collections + Indexes
JSON/JavaScript is major
language to access
Collections
Simple creating (during first insert).
Two documents from the same
collection might be completly different
Name
Documents
IndexesIndexes
Document
{
"fullName" : "Fedor Buhankin",
"course" : 5,
"univercity" : "ONPU",
"faculty" : "IKS",
"_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" }
"homeAddress" : "Ukraine, Odessa 23/34",
"averageAssessment" : 5,
"subjects" : [
"math",
"literature",
"drawing",
"psychology"
]
}
Identifier (_id)
Body i JSON (Internally BSON)
●
Any part of the ducument can be indexed
●
Max document size is 16M
●
Major bricks: scalar value, map and list
MongoDB Console
Query Examples
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT * FROM ORDERS;
db.orders.find()
Simple Select
SELECT * FROM ORDERS WHERE
customerId = 1;
db.orders.find( {"customerId":1} )
Simple Condition
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT *
FROM orders
WHERE customerId > 1
db.orders.find({ "customerId" : { $gt: 1 } } );
Simple Comparison
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT *
FROM orders
WHERE customerId = 1 AND
orderDate is not NULL
db.orders.find( { customerId:1, orderDate :
{ $exists : true } } );
AND Condition
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT *
FROM orders
WHERE customerId = 100 OR
orderDate is not NULL
db.orders.find( { $or:[ {customerId:100},
{orderDate : { $exists : false }} ] } );
OR Condition
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT orderId, orderDate
FROM orders
WHERE customerId = 1
db.orders.find({customerId:1},
{orderId:1,orderDate:1})
Select fields
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT *
FROM
Orders
WHERE
Orders.id IN (
SELECT id FROM orderItem
WHERE productName LIKE '%iPhone%'
)
db.orders.find(
{"orderItems.productName":/.*iPhone.*/}
)
Inner select
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
SELECT *
FROM orders
WHERE orderDate is NULL
db.orders.find(
{ orderDate : { $exists : false } }
);
NULL checks
// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
]
}
More examples
• db.orders.sort().skip(20).limit(10)
• db.orders.count({ "orderItems.price" : { $gt: 444 })
• db.orders.find( { orderItems: { "productId":47, "price": 444.45,
"productName": "iPhone 5" } } );
• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
Queries between collections
●
Remember, MongoDB = no JOINs
●
1 approach: Perform multiple queries (lazy loading)
●
2 approach: use MapReduce framework
●
3 approach: use Aggregation Framework
Map Reduce Framework
●
Is used to perform complex grouping with collection
documents
●
Is able to manipulate over multiple collections
●
Uses MapReduce pattern
●
Use JavaScript language
●
Support sharded environment
●
The result is similar to materialized views
Map Reduce Concept
a1a1
a2a2
a3a3
a4a4
a5a5
a6a6
anan
......
b1b1
b2b2
b3b3
b4b4
b5b5
b6b6
bnbn
......
Launch map
For every elem
Launch reduce
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
reducereduce cc
f map : A→ B f reduce : B[ ]→C
Implement MAP functionImplement MAP function
Implement REDUCE functionImplement REDUCE function
Execute MAP func:
Mark each document
with specific color
Execute MAP func:
Mark each document
with specific color
Input
Execute REDUCE func:
Merge each colored set
into single element
Execute REDUCE func:
Merge each colored set
into single element
MAP
REDUCE
Output
Collection X
How it works
Take amount of orders for each customer
db.cutomers_orders.remove();
mapUsers = function() {
emit( this.customerId, {count: 1, this.customerId} );
};
function(key, values) {
var result = {count: 0, customerId:key};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
db.customers.mapReduce(mapUsers, reduce, {"out": {"replace"
"cutomers_orders"}});
Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
Aggregation and
Aggregation Framework
●
Simplify most used mapreduce operarions like
group by criteria
●
Restriction on pipeline size is 16MB
●
Support sharded environment (Aggregation
Framework only)
Indexes
●
Anything might be indexed
●
Indexes improve performance
●
Implementation uses B-trees
MongoDB Distilled
Access via API
Mongo m = new Mongo();
// or
Mongo m = new Mongo( "localhost" );
// or
Mongo m = new Mongo( "localhost" , 27017 );
// or, to connect to a replica set, supply a seed list of members
Mongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017),
new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)))
DB db = m.getDB( "mydb" );
DBCollection coll = db.getCollection("customers");
ArrayList list = new ArrayList();
list.add(new BasicDBObject("city", "Odessa"));
BasicDBObject doc= new BasicDBObject();
doc.put("name", "Kaktus");
doc.put("billingAddress", list);
coll.insert(doc);
Use Official MongoDB Java Driver (just include mongo.jar)
Closer to Domain model
●
Morphia http://code.google.com/p/morphia/
●
Spring Data for MongoDB
http://www.springsource.org/spring-data/mongodb
Major features:
●
Type-safe POJO centric model
●
Annotations based mapping behavior
●
Good performance
●
DAO templates
●
Simple criterias
Example with Morphia
@Entity("Customers")
class Customer {
@Id ObjectId id; // auto-generated, if not set (see ObjectId)
@Indexed String name; // value types are automatically persisted
List<Address> billingAddress; // by default fields are @Embedded
Key<Customer> bestFriend; //referenceto external document
@Reference List<Customer> partners = new ArrayList<Customer>(); //refs are
stored and loaded automatically
// ... getters and setters
//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist...
@PostLoad void postLoad(DBObject dbObj) { ... }
}
Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")
morphia.map(Customer.class);
Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));
Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
To embed or not to embed
●
Separate collections are good if you need
to select individual documents, need
more control over querying, or have huge
documents.
●
Embedded documents are good when
you want the entire document, size of the
document is predicted. Embedded
documents provide perfect performance.
Schema migration
●
Schemaless
●
Main focus is how the aplication will behave when
new field will has been added
●
Incremental migration technque (version field)
Use Cases :
– removing field
– renaming fields
– refactoring aggregate
Data Consistency
●
Transactional consistency
– domain design should take into account aggregate atomicity
●
Replication consistency
– Take into account Inconsistency window (sticky sessions)
●
Eventual consistency
●
Accept CAP theorem
– it is impossible for a distributed computer system to simultaneously provide all
three of the following guarantees: consistency, availability and partition
tolerance.
Scaling
Scaling options
●
Autosharding
●
Master-Slave replication
●
Replica Set clusterization
●
Sharding + Replica Set
Sharding
●
MongoDB supports autosharding
●
Just specify shard key and pattern
●
Sharding increases writes
●
Major way for scaling the system
Master-Slave replication
●
One master, many slaves
●
Slaves might be hidden or can be used to read
●
Master-Slave increase
reades and provides
reliability
Replica Set clusterization
●
The replica set automatically elects a primary (master)
●
Master shares the same state between all replicas
●
Limitation (limit: 12 nodes)
●
WriteConcern option
●
Benefits:
– Failover and Reliability
– Distributing read load
– maintance without downtime
Sharding + ReplicaSet
●
Allows to build huge scalable failover database
MongoDB Criticism
●
Dataloss reports on heavy-write configurations
●
Atomic operatons over multiple documents
When not to use
●
Heavy cross-document atomic operations
●
Queries against varying aggregate structure
Tips
●
Do not use autoincrement ids
●
Small names are are preffered
●
By default DAO methods are async
●
Think twise on collection design
●
Use atomic modifications for a document
Out of scope
●
MapReduce options
●
Indexes
●
Capped collections
Further reading
http://www.mongodb.org
Kyle Banker, MongoDB in Action
Martin Fowler NoSQL Distilled
Thank you!

More Related Content

PDF
Audience counting at Scale
PDF
Scalding big ADta
PPTX
Scalding Big (Ad)ta
PDF
Bending Spark towards enterprise needs
PDF
Faster persistent data structures through hashing
PDF
Continuous DB migration based on carbon5 framework
PPTX
Spring AOP Introduction
PDF
So various polymorphism in Scala
Audience counting at Scale
Scalding big ADta
Scalding Big (Ad)ta
Bending Spark towards enterprise needs
Faster persistent data structures through hashing
Continuous DB migration based on carbon5 framework
Spring AOP Introduction
So various polymorphism in Scala

Similar to MongoDB Distilled (20)

PPTX
MongoDB_ppt.pptx
PDF
MongoDB Meetup
PPTX
Introduction to MongoDB and Workshop
PPTX
MongoDb and NoSQL
PDF
2016 feb-23 pyugre-py_mongo
PDF
Using MongoDB and Python
PPTX
introtomongodb
PDF
MongoDB.pdf
PPTX
Introduction to MongoDB
ODP
MongoDB - A Document NoSQL Database
PPTX
Mongo - an intermediate introduction
PDF
Simplifying & accelerating application development with MongoDB's intelligent...
PPTX
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
PDF
MongoDB for Coder Training (Coding Serbia 2013)
PDF
MongoDB FabLab León
PPTX
MongoDB is a document database. It stores data in a type of JSON format calle...
KEY
Modeling Data in MongoDB
PPTX
Webinar: Scaling MongoDB
PDF
Mongo db eveningschemadesign
MongoDB_ppt.pptx
MongoDB Meetup
Introduction to MongoDB and Workshop
MongoDb and NoSQL
2016 feb-23 pyugre-py_mongo
Using MongoDB and Python
introtomongodb
MongoDB.pdf
Introduction to MongoDB
MongoDB - A Document NoSQL Database
Mongo - an intermediate introduction
Simplifying & accelerating application development with MongoDB's intelligent...
SH 2 - SES 3 - MongoDB Aggregation Framework.pptx
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB FabLab León
MongoDB is a document database. It stores data in a type of JSON format calle...
Modeling Data in MongoDB
Webinar: Scaling MongoDB
Mongo db eveningschemadesign
Ad

More from b0ris_1 (9)

PDF
Learning from nature or human body as a source on inspiration for software en...
PDF
Devoxx 2022
PDF
IT Arena-2021
PDF
New accelerators in Big Data - Upsolver
PDF
Learning from nature [slides from Software Architecture meetup]
PDF
Cowboy dating with big data TechDays at Lohika-2020
PDF
Cowboy dating with big data
PDF
Ultimate journey towards realtime data platform with 2.5M events per sec
PPTX
Clustering Java applications with Terracotta and Hazelcast
Learning from nature or human body as a source on inspiration for software en...
Devoxx 2022
IT Arena-2021
New accelerators in Big Data - Upsolver
Learning from nature [slides from Software Architecture meetup]
Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data
Ultimate journey towards realtime data platform with 2.5M events per sec
Clustering Java applications with Terracotta and Hazelcast
Ad

Recently uploaded (20)

PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PDF
Deliverable file - Regulatory guideline analysis.pdf
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PDF
Cours de Système d'information about ERP.pdf
PPTX
DMT - Profile Brief About Business .pptx
PDF
Comments on Crystal Cloud and Energy Star.pdf
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
SBI Securities Weekly Wrap 08-08-2025_250808_205045.pdf
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PPTX
Business Ethics - An introduction and its overview.pptx
PDF
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
PDF
How to Get Funding for Your Trucking Business
Lecture 3344;;,,(,(((((((((((((((((((((((
Tata consultancy services case study shri Sharda college, basrur
Module 2 - Modern Supervison Challenges - Student Resource.pdf
ICG2025_ICG 6th steering committee 30-8-24.pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
Deliverable file - Regulatory guideline analysis.pdf
Probability Distribution, binomial distribution, poisson distribution
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
Cours de Système d'information about ERP.pdf
DMT - Profile Brief About Business .pptx
Comments on Crystal Cloud and Energy Star.pdf
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
SBI Securities Weekly Wrap 08-08-2025_250808_205045.pdf
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Power and position in leadershipDOC-20250808-WA0011..pdf
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Business Ethics - An introduction and its overview.pptx
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
How to Get Funding for Your Trucking Business

MongoDB Distilled

  • 2. Agenda ● Part 1. Why NoSQL – SQL benefints and critics – NoSQL challange ● Part 2. MongoDB – Overview – Console and query example – Java Integration – Data consistancy – Scaling – Tips
  • 3. Part 1. Why NoSQL
  • 5. SQL ● Simplicity ● Uniform representation ● Runtime schema modifications SELECT DISTINCT p.LastName, p.FirstName FROM Person.Person AS p JOIN HumanResources.Employee AS e ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN (SELECT Bonus FROM Sales.SalesPerson AS sp WHERE e.BusinessEntityID = sp.BusinessEntityID);
  • 7. Strong consistency SQL features like Foreign and Primary Keys, Unique fields ACID (atomicity, consistency, isolation, durability) transactions Business transactions ~ system transactions
  • 9. Big gap between domain and relational model
  • 10. Performance Issues JOINS Minimization Choosing right transaction strategyQuery Optimization Consistency costs too much Normalization Impact Performance issues
  • 11. Schema migration issues Consistency issues Reinventing bicycle Involving external tools like DBDeploy Scaling options Consistency issues Poor scaling options
  • 12. SQL Opposition ● Object Databases by OMG ● ORM ● ?
  • 13. No SQL Yes ● Transactionaless in usual understanding ● Schemaless, no migration ● Closer to domain ● Focused on aggregates ● Trully scalable
  • 19. Aggregate oriented Databases ● Document databases implement idea of Aggregate oriented database. ● Aggregate is a storage atom ● Aggregate oriented databsaes are closer to application domain. ● Ensures atomic operations with aggregate ● Aggregate might be replicated or sharded efficiently ● Major question: to embed or not to embed
  • 21. // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ], } Relational Model Document Model
  • 23. MongoDB Basics MongoDB is document- oriented and DBMS MongoDB is Client-Server DBMS Mongo DB = Collections + Indexes JSON/JavaScript is major language to access
  • 24. Collections Simple creating (during first insert). Two documents from the same collection might be completly different Name Documents IndexesIndexes
  • 25. Document { "fullName" : "Fedor Buhankin", "course" : 5, "univercity" : "ONPU", "faculty" : "IKS", "_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" } "homeAddress" : "Ukraine, Odessa 23/34", "averageAssessment" : 5, "subjects" : [ "math", "literature", "drawing", "psychology" ] } Identifier (_id) Body i JSON (Internally BSON) ● Any part of the ducument can be indexed ● Max document size is 16M ● Major bricks: scalar value, map and list
  • 28. // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] } SELECT * FROM ORDERS; db.orders.find() Simple Select
  • 29. SELECT * FROM ORDERS WHERE customerId = 1; db.orders.find( {"customerId":1} ) Simple Condition // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 30. SELECT * FROM orders WHERE customerId > 1 db.orders.find({ "customerId" : { $gt: 1 } } ); Simple Comparison // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 31. SELECT * FROM orders WHERE customerId = 1 AND orderDate is not NULL db.orders.find( { customerId:1, orderDate : { $exists : true } } ); AND Condition // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 32. SELECT * FROM orders WHERE customerId = 100 OR orderDate is not NULL db.orders.find( { $or:[ {customerId:100}, {orderDate : { $exists : false }} ] } ); OR Condition // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 33. SELECT orderId, orderDate FROM orders WHERE customerId = 1 db.orders.find({customerId:1}, {orderId:1,orderDate:1}) Select fields // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 34. SELECT * FROM Orders WHERE Orders.id IN ( SELECT id FROM orderItem WHERE productName LIKE '%iPhone%' ) db.orders.find( {"orderItems.productName":/.*iPhone.*/} ) Inner select // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 35. SELECT * FROM orders WHERE orderDate is NULL db.orders.find( { orderDate : { $exists : false } } ); NULL checks // in customers { "id":1, "name":"Medvedev", "billingAddress":[{"city":"Moscow"}] } // in orders { "id":99, "customerId":1, "orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ], "shippingAddress":[{"city":"Moscow"}], "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ] }
  • 36. More examples • db.orders.sort().skip(20).limit(10) • db.orders.count({ "orderItems.price" : { $gt: 444 }) • db.orders.find( { orderItems: { "productId":47, "price": 444.45, "productName": "iPhone 5" } } ); • db.orders.find()._addSpecial( "$comment" , "this is tagged query" )
  • 37. Queries between collections ● Remember, MongoDB = no JOINs ● 1 approach: Perform multiple queries (lazy loading) ● 2 approach: use MapReduce framework ● 3 approach: use Aggregation Framework
  • 38. Map Reduce Framework ● Is used to perform complex grouping with collection documents ● Is able to manipulate over multiple collections ● Uses MapReduce pattern ● Use JavaScript language ● Support sharded environment ● The result is similar to materialized views
  • 39. Map Reduce Concept a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 anan ...... b1b1 b2b2 b3b3 b4b4 b5b5 b6b6 bnbn ...... Launch map For every elem Launch reduce mapmap mapmap mapmap mapmap mapmap mapmap mapmap reducereduce cc f map : A→ B f reduce : B[ ]→C
  • 40. Implement MAP functionImplement MAP function Implement REDUCE functionImplement REDUCE function Execute MAP func: Mark each document with specific color Execute MAP func: Mark each document with specific color Input Execute REDUCE func: Merge each colored set into single element Execute REDUCE func: Merge each colored set into single element MAP REDUCE Output Collection X How it works
  • 41. Take amount of orders for each customer db.cutomers_orders.remove(); mapUsers = function() { emit( this.customerId, {count: 1, this.customerId} ); }; function(key, values) { var result = {count: 0, customerId:key}; values.forEach(function(value) { result.count += value.count; }); return result; } db.customers.mapReduce(mapUsers, reduce, {"out": {"replace" "cutomers_orders"}}); Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]
  • 42. Aggregation and Aggregation Framework ● Simplify most used mapreduce operarions like group by criteria ● Restriction on pipeline size is 16MB ● Support sharded environment (Aggregation Framework only)
  • 43. Indexes ● Anything might be indexed ● Indexes improve performance ● Implementation uses B-trees
  • 45. Access via API Mongo m = new Mongo(); // or Mongo m = new Mongo( "localhost" ); // or Mongo m = new Mongo( "localhost" , 27017 ); // or, to connect to a replica set, supply a seed list of members Mongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019))) DB db = m.getDB( "mydb" ); DBCollection coll = db.getCollection("customers"); ArrayList list = new ArrayList(); list.add(new BasicDBObject("city", "Odessa")); BasicDBObject doc= new BasicDBObject(); doc.put("name", "Kaktus"); doc.put("billingAddress", list); coll.insert(doc); Use Official MongoDB Java Driver (just include mongo.jar)
  • 46. Closer to Domain model ● Morphia http://code.google.com/p/morphia/ ● Spring Data for MongoDB http://www.springsource.org/spring-data/mongodb Major features: ● Type-safe POJO centric model ● Annotations based mapping behavior ● Good performance ● DAO templates ● Simple criterias
  • 47. Example with Morphia @Entity("Customers") class Customer { @Id ObjectId id; // auto-generated, if not set (see ObjectId) @Indexed String name; // value types are automatically persisted List<Address> billingAddress; // by default fields are @Embedded Key<Customer> bestFriend; //referenceto external document @Reference List<Customer> partners = new ArrayList<Customer>(); //refs are stored and loaded automatically // ... getters and setters //Lifecycle methods -- Pre/PostLoad, Pre/PostPersist... @PostLoad void postLoad(DBObject dbObj) { ... } } Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB") morphia.map(Customer.class); Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...)); Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();
  • 48. To embed or not to embed ● Separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. ● Embedded documents are good when you want the entire document, size of the document is predicted. Embedded documents provide perfect performance.
  • 49. Schema migration ● Schemaless ● Main focus is how the aplication will behave when new field will has been added ● Incremental migration technque (version field) Use Cases : – removing field – renaming fields – refactoring aggregate
  • 50. Data Consistency ● Transactional consistency – domain design should take into account aggregate atomicity ● Replication consistency – Take into account Inconsistency window (sticky sessions) ● Eventual consistency ● Accept CAP theorem – it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance.
  • 52. Scaling options ● Autosharding ● Master-Slave replication ● Replica Set clusterization ● Sharding + Replica Set
  • 53. Sharding ● MongoDB supports autosharding ● Just specify shard key and pattern ● Sharding increases writes ● Major way for scaling the system
  • 54. Master-Slave replication ● One master, many slaves ● Slaves might be hidden or can be used to read ● Master-Slave increase reades and provides reliability
  • 55. Replica Set clusterization ● The replica set automatically elects a primary (master) ● Master shares the same state between all replicas ● Limitation (limit: 12 nodes) ● WriteConcern option ● Benefits: – Failover and Reliability – Distributing read load – maintance without downtime
  • 56. Sharding + ReplicaSet ● Allows to build huge scalable failover database
  • 57. MongoDB Criticism ● Dataloss reports on heavy-write configurations ● Atomic operatons over multiple documents When not to use ● Heavy cross-document atomic operations ● Queries against varying aggregate structure
  • 58. Tips ● Do not use autoincrement ids ● Small names are are preffered ● By default DAO methods are async ● Think twise on collection design ● Use atomic modifications for a document
  • 59. Out of scope ● MapReduce options ● Indexes ● Capped collections
  • 60. Further reading http://www.mongodb.org Kyle Banker, MongoDB in Action Martin Fowler NoSQL Distilled