MongoDB Distilled

distilled
Boris Trofimov
Team Lead@Sigma Ukraine
@b0ris_1
btrofimoff@gmail.com

Agenda
●
Part 1. Why NoSQL
– SQL benefints and critics
– NoSQL challange
●
Part 2. MongoDB
– Overview
– Console and query example
– Java Integration
– Data consistancy
– Scaling
– Tips

SQL
●
Simplicity
●
Uniform representation
●
Runtime schema modifications
SELECT DISTINCT p.LastName, p.FirstName
FROM Person.Person AS p
JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN
(SELECT Bonus
FROM Sales.SalesPerson AS sp
WHERE e.BusinessEntityID = sp.BusinessEntityID);

Strong consistency
SQL features like
Foreign and Primary Keys, Unique
fields
ACID (atomicity, consistency, isolation,
durability) transactions
Business transactions ~ system transactions

Big gap between domain and
relational model

Performance Issues
JOINS Minimization Choosing right transaction strategyQuery Optimization
Consistency costs too much
Normalization Impact Performance issues

Schema migration issues
Consistency issues
Reinventing bicycle
Involving external tools like DBDeploy
Scaling options
Consistency issues
Poor scaling options

SQL Opposition
●
Object Databases by OMG
●
ORM
●
?

No SQL Yes
●
Transactionaless in usual understanding
●
Schemaless, no migration
●
Closer to domain
●
Focused on aggregates
●
Trully scalable

Aggregate oriented Databases
●
Document databases implement idea of Aggregate
oriented database.
●
Aggregate is a storage atom
●
Aggregate oriented databsaes are closer to application
domain.
●
Ensures atomic operations with aggregate
●
Aggregate might be replicated or sharded efficiently
●
Major question: to embed or not to embed

// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
],
}
Relational Model Document Model

MongoDB Basics
MongoDB is document-
oriented and DBMS
MongoDB is Client-Server
DBMS
Mongo DB = Collections + Indexes
JSON/JavaScript is major
language to access

Collections
Simple creating (during first insert).
Two documents from the same
collection might be completly different
Name
Documents
IndexesIndexes

Document
{
"fullName" : "Fedor Buhankin",
"course" : 5,
"univercity" : "ONPU",
"faculty" : "IKS",
"_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" }
"homeAddress" : "Ukraine, Odessa 23/34",
"averageAssessment" : 5,
"subjects" : [
"math",
"literature",
"drawing",
"psychology"
]
}
Identifier (_id)
Body i JSON (Internally BSON)
●
Any part of the ducument can be indexed
●
Max document size is 16M
●
Major bricks: scalar value, map and list

// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}
SELECT * FROM ORDERS;
db.orders.find()
Simple Select

SELECT * FROM ORDERS WHERE
customerId = 1;
db.orders.find( {"customerId":1} )
Simple Condition
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT *
FROM orders
WHERE customerId > 1
db.orders.find({ "customerId" : { $gt: 1 } } );
Simple Comparison
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT *
FROM orders
WHERE customerId = 1 AND
orderDate is not NULL
db.orders.find( { customerId:1, orderDate :
{ $exists : true } } );
AND Condition
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT *
FROM orders
WHERE customerId = 100 OR
orderDate is not NULL
db.orders.find( { $or:[ {customerId:100},
{orderDate : { $exists : false }} ] } );
OR Condition
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT orderId, orderDate
FROM orders
WHERE customerId = 1
db.orders.find({customerId:1},
{orderId:1,orderDate:1})
Select fields
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT *
FROM
Orders
WHERE
Orders.id IN (
SELECT id FROM orderItem
WHERE productName LIKE '%iPhone%'
)
db.orders.find(
{"orderItems.productName":/.*iPhone.*/}
)
Inner select
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

SELECT *
FROM orders
WHERE orderDate is NULL
db.orders.find(
{ orderDate : { $exists : false } }
);
NULL checks
// in customers
{
"id":1,
"name":"Medvedev",
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

More examples
• db.orders.sort().skip(20).limit(10)
• db.orders.count({ "orderItems.price" : { $gt: 444 })
• db.orders.find( { orderItems: { "productId":47, "price": 444.45,
"productName": "iPhone 5" } } );
• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )

Queries between collections
●
Remember, MongoDB = no JOINs
●
1 approach: Perform multiple queries (lazy loading)
●
2 approach: use MapReduce framework
●
3 approach: use Aggregation Framework

Map Reduce Framework
●
Is used to perform complex grouping with collection
documents
●
Is able to manipulate over multiple collections
●
Uses MapReduce pattern
●
Use JavaScript language
●
Support sharded environment
●
The result is similar to materialized views

Map Reduce Concept
a1a1
a2a2
a3a3
a4a4
a5a5
a6a6
anan
......
b1b1
b2b2
b3b3
b4b4
b5b5
b6b6
bnbn
......
Launch map
For every elem
Launch reduce
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
mapmap
reducereduce cc
f map : A→ B f reduce : B[ ]→C

Implement MAP functionImplement MAP function
Implement REDUCE functionImplement REDUCE function
Execute MAP func:
Mark each document
with specific color
Execute MAP func:
Mark each document
with specific color
Input
Execute REDUCE func:
Merge each colored set
into single element
Execute REDUCE func:
Merge each colored set
into single element
MAP
REDUCE
Output
Collection X
How it works

Take amount of orders for each customer
db.cutomers_orders.remove();
mapUsers = function() {
emit( this.customerId, {count: 1, this.customerId} );
};
function(key, values) {
var result = {count: 0, customerId:key};
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
db.customers.mapReduce(mapUsers, reduce, {"out": {"replace"
"cutomers_orders"}});
Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]

Aggregation and
Aggregation Framework
●
Simplify most used mapreduce operarions like
group by criteria
●
Restriction on pipeline size is 16MB
●
Support sharded environment (Aggregation
Framework only)

Indexes
●
Anything might be indexed
●
Indexes improve performance
●
Implementation uses B-trees

Access via API
Mongo m = new Mongo();
// or
Mongo m = new Mongo( "localhost" );
// or
Mongo m = new Mongo( "localhost" , 27017 );
// or, to connect to a replica set, supply a seed list of members
Mongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017),
new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)))
DB db = m.getDB( "mydb" );
DBCollection coll = db.getCollection("customers");
ArrayList list = new ArrayList();
list.add(new BasicDBObject("city", "Odessa"));
BasicDBObject doc= new BasicDBObject();
doc.put("name", "Kaktus");
doc.put("billingAddress", list);
coll.insert(doc);
Use Official MongoDB Java Driver (just include mongo.jar)

Closer to Domain model
●
Morphia http://code.google.com/p/morphia/
●
Spring Data for MongoDB
http://www.springsource.org/spring-data/mongodb
Major features:
●
Type-safe POJO centric model
●
Annotations based mapping behavior
●
Good performance
●
DAO templates
●
Simple criterias

Example with Morphia
@Entity("Customers")
class Customer {
@Id ObjectId id; // auto-generated, if not set (see ObjectId)
@Indexed String name; // value types are automatically persisted
List<Address> billingAddress; // by default fields are @Embedded
Key<Customer> bestFriend; //referenceto external document
@Reference List<Customer> partners = new ArrayList<Customer>(); //refs are
stored and loaded automatically
// ... getters and setters
//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist...
@PostLoad void postLoad(DBObject dbObj) { ... }
}
Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")
morphia.map(Customer.class);
Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));
Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();

To embed or not to embed
●
Separate collections are good if you need
to select individual documents, need
more control over querying, or have huge
documents.
●
Embedded documents are good when
you want the entire document, size of the
document is predicted. Embedded
documents provide perfect performance.

Schema migration
●
Schemaless
●
Main focus is how the aplication will behave when
new field will has been added
●
Incremental migration technque (version field)
Use Cases :
– removing field
– renaming fields
– refactoring aggregate

Data Consistency
●
Transactional consistency
– domain design should take into account aggregate atomicity
●
Replication consistency
– Take into account Inconsistency window (sticky sessions)
●
Eventual consistency
●
Accept CAP theorem
– it is impossible for a distributed computer system to simultaneously provide all
three of the following guarantees: consistency, availability and partition
tolerance.

Scaling options
●
Autosharding
●
Master-Slave replication
●
Replica Set clusterization
●
Sharding + Replica Set

Sharding
●
MongoDB supports autosharding
●
Just specify shard key and pattern
●
Sharding increases writes
●
Major way for scaling the system

Master-Slave replication
●
One master, many slaves
●
Slaves might be hidden or can be used to read
●
Master-Slave increase
reades and provides
reliability

Replica Set clusterization
●
The replica set automatically elects a primary (master)
●
Master shares the same state between all replicas
●
Limitation (limit: 12 nodes)
●
WriteConcern option
●
Benefits:
– Failover and Reliability
– Distributing read load
– maintance without downtime

Sharding + ReplicaSet
●
Allows to build huge scalable failover database

MongoDB Criticism
●
Dataloss reports on heavy-write configurations
●
Atomic operatons over multiple documents
When not to use
●
Heavy cross-document atomic operations
●
Queries against varying aggregate structure

Tips
●
Do not use autoincrement ids
●
Small names are are preffered
●
By default DAO methods are async
●
Think twise on collection design
●
Use atomic modifications for a document

Out of scope
●
MapReduce options
●
Indexes
●
Capped collections

Further reading
http://www.mongodb.org
Kyle Banker, MongoDB in Action
Martin Fowler NoSQL Distilled

MongoDB Distilled

More Related Content

Similar to MongoDB Distilled (20)

More from b0ris_1 (9)

Recently uploaded (20)

MongoDB Distilled