SlideShare a Scribd company logo
Webinar: Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
Joe.Drumgoole@mongodb.com
V1.2
Before We Begin
• This webinar is being recorded
• Use The Chat Window for
• Technical assistance
• Q&A
• MongoDB Team will answer quick questions in real time
• “Common” questions will be reviewed at the end of the
webinar
Who is your Presenter?
• Programmer
• Developer Manager
• Entrepreneur
• Geek
• Some time pre-sales guy
MongoDB: The New Default Database
Document
Data Model
Open-
Source
Fully Featured
High Performance
Scalable
{
name: “John Smith”,
pfxs: [“Dr.”,”Mr.”],
address: “10 3rd St.”,
phone: {
home: 1234567890,
mobile: 1234568138 }
}
6
It’s a JSON Database
{u'_id': ObjectId('58511bfbb26a8803b6b4d56c'),
u'batchID': 108,
u'member': {u'chapters': [{u'id': 1775736,
u'name': u'London MongoDB User Group',
u'urlname': u'London-MongoDB-User-Group'},
{u'id': 1780459,
u'name': u'Stockholm MongoDB User Group',
u'urlname': u'Stockholm-MongoDB-User-Group'},
{u'id': 3478392,
u'name': u'Dublin MongoDB User Group',
u'urlname': u'DublinMUG'},
u'urlname': u'Mannheim-MongoDB-User-Group'}],
u'city': u'Dublin',
u'country': u'Ireland',
u'events_attended': 13,
u'is_organizer': True,
u'join_time': datetime.datetime(2013, 10, 30, 17, 5, 31),
u'last_access_time': datetime.datetime(2016, 12, 13, 15, 45, 27),
u'location': {u'coordinates': [-6.25, 53.33000183105469],
u'type': u'Point'},
u'member_id': 99473492,
u'member_name': u'Joe Drumgoole',
u'photo_thumb_url': u'http://photos2.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg'},
u'timestamp': datetime.datetime(2016, 12, 14, 10, 16, 27, 607000)}
Typed
Hierarchical, with lists and maps
Geo-Spatial
Functions of a Database
• Durable data storage
• Structural representation of data
• CRUD operations
• Authentication and authorization
• Programmer Efficiency?
What Are Your Developers Doing All Day?
1964 - IMS
1977 - Oracle
1984 - dBASE
1991 - MySQL
2009 - MongoDB
The Challenge is Product Development
1976 2016
Business Data
Goals
Process Payroll Monthly Process real-time billing to the minute for 1m
customers
Release
Schedule
Semi-Annually Monthly
Application/Code COBOL, Fortran, Algol, PL/1,
assembler, proprietary tools
Python, Java, Node.js, Ruby, PHP, Perl, Scala,
Erlang and the rest
Tools None Apache, LAMP, Mean, Eclipse, Intellij,
Sourceforge etc.
Database I/VSAM, early RDBMS RDBMS, NoSQL
Rectangles are 1976. Maps and Lists are 2016
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
type : “work”,
number: “1-800-555-1212”
},
{ type : “home”,
number: “1-800-555-1313”,
DNC: true
},
{ type : “home”,
number: “1-800-555-1414”,
DNC: true
}
]
}
AnActual Code Example
Let’s compare and contrast RDBMS/SQL to MongoDB
development using Java over the course of a few weeks.
Some ground rules:
1. Observe rules of Software Engineering 101: Assume separation of application,
Data Access Layer, and database implementation
2. Data Access Layer must be able to
a. Expose simple, functional, data-only interfaces to the application
• No ORM, frameworks, compile-time bindings, special tools
b. Exploit high performance features of database
3. Focus on core data handling code and avoid distractions that require the same
amount of work in both technologies
a. No exception or error handling
b. Leave out DB connection and other setup resources
4. Day counts are a proxy for progress, not actual time to complete indicated task
5. Don’t expect to cut and paste this code 
The Task: Saving and Fetching Contact data
Map m = new HashMap();
m.put(“name”, “Joe D”);
m.put(“id”, “K1”);
Start with this simple,
flat shape in the Data
Access Layer:
id = save(Map m)
And assume we
save it in this way:
Map m = fetch(String id)
And assume we
fetch one by primary
key in this way:
Brace yourself…..
MongoDBSQL
DDL: create table contact ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}
Day 1: Initial efforts for both technologies
DDL: none
Map fetch(String id)
{
Map m = null;
c = collection.find(eq( “id”, id ));
if( c.hasNext())
m = (Map) c.next();
}
return m;
}
save( Map m )
{
collection.insert(Document( m ));
}
{“name” : ”Joe D”,
“id” : ”K1” }
Day 2: Add simple fields
m.put(“name”, “Joe D”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new
business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
SQL Day 2 (changes in bold)
DDL: alter table contact add title varchar(8);
alter table contact add hireDate date;
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values
( ?,?,?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate from contact where id =
?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}
Consequences:
1. Code release schedule linked
to database upgrade (new
code cannot run on old
schema)
2. Issues with case sensitivity
starting to creep in (many
RDBMS are case insensitive
for column names, but code is
case sensitive)
3. Changes require careful mods
in 4 places
4. Beginning of technical debt
MongoDB Day 2
save( Map m )
{
collection.insert(Document( m ));
}
Map fetch(String id)
{
Map m = null;
c = collection.find(eq( “id”, id ));
if( c.hasNext())
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on
overhead code
2. Code and database not physically
linked
3. New material with more fields can
be added into existing collections;
backfill is optional
4. Names of fields in database
precisely match key names in
code layer and directly match on
name, not indirectly via positional
offset
5. No technical debt is created
✔ NO
CHANGE
Day 3: Add list of phone numbers
m.put(“name”, “Joe D”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
n1.put(“type”, “work”);
n1.put(“number”, “1-800-555-1212”));
list.add(n1);
n2.put(“type”, “home”));
n2.put(“number”, “1-866-444-3131”));
list.add(n2);
m.put(“phones”, list);
• It was still pretty easy to add this data to the structure
• .. but meanwhile, in the persistence code …
REALLY brace yourself…
SQL Day 3 changes: Option 1: Assume just 1
work and 1 home phone number
DDL: alter table contact add work_phone varchar(16);
alter table contact add home_phone varchar(16);
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate,
work_phone, home_phone ) values ( ?,?,?,?,?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate, work_phone,
home_phone from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
String t = onePhone.get(“type”);
String n = onePhone.get(“number”);
if(t.equals(“work”)) {
contactInsertStmt.setString(5, n);
} else if(t.equals(“home”)) {
contactInsertStmt.setString(6, n);
}
}
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
Map onePhone;
onePhone = new HashMap();
onePhone.put(“type”, “work”);
onePhone.put(“number”, rs.getString(5));
list.add(onePhone);
onePhone = new HashMap();
onePhone.put(“type”, “home”);
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
m.put(“phones”, list);
}
This is just plain bad….
SQL Day 3 changes: Option 2:
Proper approach with multiple phone numbers
DDL: create table phones ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate )
values ( ?,?,?,? )”);
c2stmt = connection.prepareStatement(“insert into
phones (id, type, number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate, type, number from
contact, phones where phones.id = contact.id and
contact.id = ?”);
}
save(Map m)
{
startTrans();
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
contactInsertStmt.execute();
endTrans();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
m.put(“phones”, list);
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
i++;
}
return m;
}
This took time and money
SQL Day 5: Zero or More Entries
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate )
values ( ?,?,?,? )”);
c2stmt = connection.prepareStatement(“insert into
phones (id, type, number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select A.id, A.name, A.title, A.hiredate, B.type,
B.number from contact A left outer join phones B on
(A.id = B. id) where A.id = ?”);
}
Whoops! And it’s also wrong!
We did not design the query accounting
for contacts that have no phone number.
Thus, we have to change the join to an
outer join.
But this ALSO means we have to change
the unwind logic
This took more time and
money!
while (rs.next()) {
if(i == 0) {
// …
}
String s = rs.getString(5);
if(s != null) {
Map onePhone = new HashMap();
onePhone.put(“type”, s);
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
}
}
…but at least we have a DAL…
right?
MongoDB Day 3
Advantages:
1. Zero time and money spent on
overhead code
2. No need to fear fields that are
“naturally occurring” lists
containing data specific to the
parent structure and thus do not
benefit from normalization and
referential integrity
3. Safe from “Zero or More” entities
save( Map m )
{
collection.insert(Document( m ));
}
Map fetch(String id)
{
Map m = null;
c = collection.find(eq( “id”, id ));
if( c.hasNext())
m = (Map) c.next();
}
return m;
}
✔ NO
CHANGE
By Day 14, our structure looks like this:
n4.put(“geo”, “US-EAST”);
n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );
list2.add(n4);
n4.put(“geo”, “EMEA”);
n4.put(“startupApps”, new String[] { “app6” } );
n4.put(“useLocalNumberFormats”, false):
list2.add(n4);
m.put(“preferences”, list2)
n6.put(“optOut”, true);
n6.put(“assertDate”, someDate);
seclist.add(n6);
m.put(“attestations”, seclist)
m.put(“security”, mapOfDataCreatedByExternalSource);
SQL Day 14
Error: Could not fit all the code into this space.
But very likely, among other things:
• n4.put(“startupApps”,new
String[]{“app1”,“app2”,“app3”});
was implemented as a single semi-colon delimited string or we
had to create another table and change the DAL
• m.put(“security”, anotherMapOfData);
was implemented by flattening it out and storing a subset of
fields or as a blob
MongoDB Day 14 – and every other day
Advantages:
1. Zero time and money spent on
overhead code
2. Persistence is so easy and flexible
and backward compatible that the
persistor does not upward-
influence the shapes we want to
persist i.e. the tail does not wag
the dog
save( Map m )
{
collection.insert(Document( m ));
}
Map fetch(String id)
{
Map m = null;
c = collection.find(eq( “id”, id ));
if( c.hasNext())
m = (Map) c.next();
}
return m;
}
✔ NO
CHANGE
But what if we must do a join?
Both RDBMS and MongoDB will have a PhoneTransactions table/collection
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
type : “work”,
number: “1-800-555-1212”
},
{ type : “home”,
number: “1-800-555-1313”,
DNC: true
},
{ type : “home”,
number: “1-800-555-1414”,
DNC: true
}
]
}
{ number: “1-800-555-1212”,
target: “1-999-238-3423”,
duration: 20
}
{ number: “1-800-555-1212”,
target: “1-444-785-6611”,
duration: 243
}
{ number: “1-800-555-1414”,
target: “1-645-331-4345”,
duration: 132
}
{ number: “1-800-555-1414”,
target: “1-990-875-2134”,
duration: 71
}
PhoneTransactions
SQL JoinAttempt #1
select A.id, A.lname, B.type, B.number, C.target, C.duration
from contact A, phones B, phonestx C
where A.id = B.id and B.number = C.number
id | lname | type | number | target | duration
-----+--------------+------+----------------+----------------+----------
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7
g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9
How to turn this into a list of names –
each with a list of numbers, each of those with a list of target
numbers?
SQL UnwindAttempt #1
Map idmap = new HashMap();
ResultSet rs = fetchStmt.execute();
while (rs.next()) {
String id = rs.getString(“id");
String nmbr = rs.getString("number");
List tnum;
Map snum;
if((snum = (List) idmap.get(id)) == null) {
snum = new HashMap();
idmap.put(did, snum);
}
if((tnum = snum.get(nmbr)) == null) {
tnum = new ArrayList();
snum.put(number, tnum);
}
Map info = new HashMap();
info.put("target", rs.getString("target"));
info.put("duration", rs.getInteger("duration"));
tnum.add(info);
}
// idmap[“g9”][“1-900-555-1212”] = ({target:1-222-707-7070,duration:23…)
SQL JoinAttempt #2
select A.id, A.lname, B.type, B.number, C.target, C.duration
from contact A, phones B, phonestx C
where A.id = B.id and B.number = C.number order by A.id, B.number
id | lname | type | number | target | duration
-----+--------------+------+----------------+----------------+----------
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23
g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23
g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7
g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9
g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23
“Early bail out” from cursor is now possible –
but logic to construct list of source and target numbers is similar
SQL is about Disassembly
String s = “select A, B, C, D, E, F from T1,T2,T3
where T1.col = T2.col and T2.col2 = T3.col2 and X =
Y and X2 != Y2 and G > 10 and G < 100 and TO_DATE(‘
…”;
ResultSet rs = execute(s);
while(ResultSet.next()) {
if(new column1 value from T1) {
set up new Object;
}
if(new column2 value from T2) {
set up new Object2
}
if(new column3 value from T3) {
set up new Object3
}
populate maps, lists and scalars
}
Design a Big Query
including business logic
to grab all the data up
front
Throw it at the engine
Disassemble Big
Rectangle into usable
objects with logic implicit
in change in column
values
MongoDB is aboutAssembly
Cursor c = coll1.find({“X”:”Y”});
while(c.hasNext()) {
populate maps, lists and scalars;
Cursor c2 = coll2.find(logic+key from c);
while(c2.hasNext()) {
populate maps, lists and scalars;
Cursor c3 = coll3.find(logic+key from c2);
while(c3.hasNext()) {
populate maps, lists and scalars;
}
}
DIY:
OR assemble usable objects incrementally with
explicit calls to $lookup and $graphLookup
MongoDB ”Join”
db.contacts.aggregate([
{$unwind: "$phones"}
,{$lookup: {
from: "phonestx”,
localField: "phones.number”,
foreignField: "number",
as:"TX"}}
]);
{
"customer_id" : 1,
"first_name" : "Mark",
"last_name" : "Smith",
"city" : "San Francisco",
"phones" : {
"type" : "home",
"number" : "1-800-555-1414",
"DNC" : true
},
"TX" : [
{
"number" : "1-800-555-1414",
"target" : "1-645-331-4345",
"duration" : 132
},
{
"number" : "1-800-555-1414",
"target" : "1-990-875-2134",
"duration" : 71
}
]
}
But what about “real” queries?
• MongoDB query language is a physical map-of-
map based structure, not a String
• Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are
keys and values in a cascade of Maps
• No grammar to parse, no templates to fill in, no whitespace,
no escaping quotes, no parentheses, no punctuation
• Same paradigm to manipulate data is used to
manipulate query expressions
• …which is also, by the way, the same paradigm
for working with MongoDB metadata and
explain()
33
Mongo Shell
JD10Gen:mugalyser jdrumgoole$ mongo
MongoDB shell version: 3.2.7
connecting to: test
MongoDB Enterprise > use MUGS
switched to db MUGS
MongoDB Enterprise > show collections
attendees
audit
groups
members
past_events
upcoming_events
MongoDB Enterprise >
MongoDB Enterprise > db.members.find( { "batchID" : 108, "member.member_name" : "Joe Drumgoole" }).pretty()
{
"_id" : ObjectId("58511bfbb26a8803b6b4d56c"),
"member" : {
"city" : "Dublin",
"events_attended" : 13,
"last_access_time" : ISODate("2016-12-13T15:45:27Z"),
"country" : "Ireland",
"member_id" : 99473492,
"chapters" : [
{
"urlname" : "London-MongoDB-User-Group",
"name" : "London MongoDB User Group",
"id" : 1775736
…
MongoDB Query Examples
SQL CLI select * from contact A, phones B where
A.did = B.did and B.type = 'work’;
MongoDB CLI db.contact.find({"phones.type”:”work”});
SQL in Java String s = “select * from contact A, phones B where
A.did = B.did and B.type = 'work’”;
ResultSet rs = execute(s);
MongoDB via
Java driver
Cursor c = contact.find(eq( “phones.type”, “work” ));
Find all contacts with at least one work phone
MongoDB Query Examples
SQL select A.did, A.lname, A.hiredate, B.type, B.number from contact A
left outer join phones B on (B.did = A.did) where b.type = 'work' or
A.hiredate > '2014-02-02'::date
Java db.contacts.find( or( eq( “phones.type’ : ”work” ),
gt( “hiredate”, Date( 2014, 2, 2 ))
CLI db.contacts.find(
{ $or : [ { “phones.type” : “work” },
{ “hiredate” : new Date("2014-02-02T00:00:00.000Z")]})
Find all contacts with at least one work phone or
hired after 2014-02-02
…and before you ask…
Yes, MongoDB query expressions
support
1. Sorting
2. Cursor size limit
3. Projection (asking for only parts of the rich
shape to be returned)
4. Aggregation (“GROUP BY”) functions
Maybe even MORE powerful than SQL…?
> db.results.values.aggregate([
{$match: { runnum:23, timeSeriesPath: "CDSSpread.12M//1909468128” }
,{$project: { timeSeriesPath: "$timeSeriesPath", values: foml }}
,{$unwind: {path: "$values", idx: "v_idx"}}
,{$match: {values: {$gt: 60}, {$or: [ {idx: 0}, {idx: {$size: . . .}
,{$group: {_id: {a: "$timeSeriesPath", b: term: "$idx"},
n: {$sum:1}, max: {$max: "$values"}, min: {$min: "$values"}},
sdev: {$stdDevPop: "$values"}}
,{$lookup: { from: ”deskLimits", localField: ”instID", foreignField:
”instID", as: ”inst"}}
,{$match: {maxDeskLimit: {$gt: {$cond: [ {$gt: [2, $max]}, 2, $max]}}}}
,{$group: {_id: "$deskID", total: {$sum: “$max”}}}
]);
What is an Aggregation Pipeline
Match
Project
Join
Graph
Sort
View
39
Aggregation Pipeline
$match
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
40
Aggregation Pipeline
$match $project
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
{=d+s}
41
Aggregation Pipeline
$match $project
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
{★}
{★}
{★}
{=d+s}
42
Aggregation Pipeline
$match $project $lookup
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
{★}
{★}
{★}
{★}
{★}
{★}
{★}
{=d+s}
43
Aggregation Pipeline
$match $project $lookup
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
{★}
{★}
{★}
{★}
{★}
{★}
{★}
{=d+s}
{★[]}
{★[]}
{★}
44
Aggregation Pipeline
$match $project $lookup $group
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{★ds}
{}
{★ds}
{★ds}
{★ds}
{★}
{★}
{★}
{★}
{★}
{★}
{★}
{=d+s}
{
Σ λ σ}
{
Σ λ σ}
{
Σ λ σ}
{★[]}
{★[]}
{★}
Aggregation Pipeline Stages
• $match
Filter documents
• $geoNear
Geospherical query
• $project
Reshape documents
• $lookup
Left-outer equi joins
• $unwind
Expand documents
• $group
Summarize documents
• $sample
Randomly selects a subset of
documents
• $sort
Order documents
• $skip
Jump over a number of documents
• $limit
Limit number of documents
• $redact
Restrict documents
• $out
Sends results to a new collection
The Fundamental Change with mongoDB
RDBMS designed in era when:
• CPU and disk was slow &
expensive
• Memory was VERY expensive
• Network? What network?
• Languages had limited means to
dynamically reflect on their types
• Languages had poor support for
richly structured types
Thus, the database had to
• Act as combiner-coordinator of
simpler types
• Define a rigid schema
• (Together with the code) optimize
at compile-time, not run-time
In mongoDB, the
data is the schema!
MongoDB and the Rich Map Ecosystem
Generic comparison of two
records
Map expr = new HashMap();
expr.put("myKey", "K1");
DBObject a = collection.findOne(expr);
expr.put("myKey", "K2");
DBObject b = collection.findOne(expr);
List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b);
Getting default values for a thing
on a certain date and then
overlaying user preferences (like
for a calculation run)
Map expr = new HashMap();
expr.put("myKey", "DEFAULT");
expr.put("createDate", new Date(2013, 11, 1));
DBObject a = collection.findOne(expr);
expr.clear();
expr.put("myKey", "user1");
DBObject b = otherCollectionPerhaps.findOne(expr);
MapStack s = new MapStack();
s.push((Map)a);
s.push((Map)b);
Map merged = s.project();
Runtime reflection of Maps and Lists enables generic powerful utilities
(MapDiff, MapStack) to be created once and used for all kinds of shapes,
saving time and money
Lastly: ACLI with teeth
> db.contact.find({"SeqNum": {"$gt”:10000}}).explain();
{
"cursor" : "BasicCursor",
"n" : 200000,
//...
"millis" : 223
}
Try a query and show the
diagnostics
> for(v=[],i=0;i<3;i++) {
… n = i*50000;
… expr = {"SeqNum": {"$gt”: n}};
… v.push( [n, db.contact.find(expr).explain().millis)] }
Run it 3 times with smaller and
smaller chunks and create a
vector of timing result pairs
(size,time)
> v
[ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ]
Let’s see that vector
> load(“jStat.js”)
> jStat.stdev(v.map(function(p){return p[1];}))
2.0548046676563256
Use any other javascript you
want inside the shell
> for(i=0;i<3;i++) {
… expr = {"SeqNum": {"$gt":i*1000}};
… db.foo.insert(db.contact.find(expr).explain()); }
Party trick: save the explain()
output back into a collection!
And There is More – Compass and Atlas
Compass
Atlas
50
What Does This Add Up To?
Relational
NoSQL
Expressive Query
Language
Strong
Consistency
Secondary Indexes
Flexibility
Scalability
Performance
Webinar: Transitioning from SQL to MongoDB

More Related Content

PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
KEY
MongoDB Java Development - MongoBoston 2010
PDF
Webinar: Building Your First App with MongoDB and Java
PPTX
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
PPS
MongoDB crud
PPTX
Webinar: Back to Basics: Thinking in Documents
PDF
Webinar: Working with Graph Data in MongoDB
PPTX
Indexing Strategies to Help You Scale
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB Java Development - MongoBoston 2010
Webinar: Building Your First App with MongoDB and Java
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
MongoDB crud
Webinar: Back to Basics: Thinking in Documents
Webinar: Working with Graph Data in MongoDB
Indexing Strategies to Help You Scale

What's hot (20)

PPTX
Mongo db queries
PPTX
Database Trends for Modern Applications: Why the Database You Choose Matters
PDF
An introduction to MongoDB
PPTX
Beyond the Basics 2: Aggregation Framework
PPTX
MongoDB + Java - Everything you need to know
PPTX
Back to Basics: My First MongoDB Application
PDF
MongoDB Europe 2016 - Graph Operations with MongoDB
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
PDF
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
PDF
Hadoop - MongoDB Webinar June 2014
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PPTX
Indexing with MongoDB
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
PDF
When to Use MongoDB
PDF
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
PPTX
Reducing Development Time with MongoDB vs. SQL
PPTX
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
KEY
Introduction to Restkit
Mongo db queries
Database Trends for Modern Applications: Why the Database You Choose Matters
An introduction to MongoDB
Beyond the Basics 2: Aggregation Framework
MongoDB + Java - Everything you need to know
Back to Basics: My First MongoDB Application
MongoDB Europe 2016 - Graph Operations with MongoDB
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
Hadoop - MongoDB Webinar June 2014
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 1: Introduction to NoSQL
Indexing with MongoDB
Back to Basics Webinar 3: Schema Design Thinking in Documents
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
When to Use MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
Reducing Development Time with MongoDB vs. SQL
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Introduction to Restkit
Ad

Viewers also liked (20)

PDF
Webinar: 10-Step Guide to Creating a Single View of your Business
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
PPTX
How Auto Trader enables the UK's largest digital automotive marketplace
PPTX
Back to Basics 2017: Introduction to Sharding
PDF
Creating a Modern Data Architecture for Digital Transformation
PDF
The Rise of Microservices
PDF
The importance of efficient data management for Digital Transformation
PPTX
Back to Basics Webinar 3: Introduction to Replica Sets
PDF
Introduction to MongoDB
PPTX
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
PDF
Intro To MongoDB
PPT
Introduction to MongoDB
PPTX
The Aggregation Framework
PDF
MongoDB World 2016: Poster Sessions eBook
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
PDF
Mongo DB
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PPTX
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
PDF
Using MongoDB as a high performance graph database
KEY
PHP, Lithium and MongoDB
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
How Auto Trader enables the UK's largest digital automotive marketplace
Back to Basics 2017: Introduction to Sharding
Creating a Modern Data Architecture for Digital Transformation
The Rise of Microservices
The importance of efficient data management for Digital Transformation
Back to Basics Webinar 3: Introduction to Replica Sets
Introduction to MongoDB
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
Intro To MongoDB
Introduction to MongoDB
The Aggregation Framework
MongoDB World 2016: Poster Sessions eBook
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
Mongo DB
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
Using MongoDB as a high performance graph database
PHP, Lithium and MongoDB
Ad

Similar to Webinar: Transitioning from SQL to MongoDB (20)

PPTX
Transitioning from SQL to MongoDB
PPTX
Benefits of Using MongoDB Over RDBMSs
PPTX
Webinar: Migrating from RDBMS to MongoDB
PDF
Json within a relational database
PPTX
Webinar: Strongly Typed Languages and Flexible Schemas
PDF
Open Source World June '21 -- JSON Within a Relational Database
PDF
Strongly Typed Languages and Flexible Schemas
PPTX
Discover the Power of the NoSQL + SQL with MySQL
PPTX
Discover The Power of NoSQL + MySQL with MySQL
PDF
MySQL Without the SQL -- Oh My!
PDF
Datacon LA - MySQL without the SQL - Oh my!
PDF
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
PPTX
mongodb introduction11111111111111111111
PPTX
Back to Basics 2017 - Introduction to NoSQL
PDF
MySQL Document Store -- SCaLE 17x Presentation
ODP
Polyglot persistence with Spring Data
PPTX
MySQL Without the SQL - Oh My! -> MySQL Document Store -- Confoo.CA 2019
PDF
Best Practices for Migrating RDBMS to MongoDB
PPT
MongoDB
PDF
Migrating from RDBMS to MongoDB
Transitioning from SQL to MongoDB
Benefits of Using MongoDB Over RDBMSs
Webinar: Migrating from RDBMS to MongoDB
Json within a relational database
Webinar: Strongly Typed Languages and Flexible Schemas
Open Source World June '21 -- JSON Within a Relational Database
Strongly Typed Languages and Flexible Schemas
Discover the Power of the NoSQL + SQL with MySQL
Discover The Power of NoSQL + MySQL with MySQL
MySQL Without the SQL -- Oh My!
Datacon LA - MySQL without the SQL - Oh my!
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
mongodb introduction11111111111111111111
Back to Basics 2017 - Introduction to NoSQL
MySQL Document Store -- SCaLE 17x Presentation
Polyglot persistence with Spring Data
MySQL Without the SQL - Oh My! -> MySQL Document Store -- Confoo.CA 2019
Best Practices for Migrating RDBMS to MongoDB
MongoDB
Migrating from RDBMS to MongoDB

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to Data Science and Data Analysis
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
Lecture1 pattern recognition............
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
[EN] Industrial Machine Downtime Prediction
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Data Science and Data Analysis
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction-to-Cloud-ComputingFinal.pptx
IB Computer Science - Internal Assessment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
climate analysis of Dhaka ,Banglades.pptx
Supervised vs unsupervised machine learning algorithms
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data_Analytics_and_PowerBI_Presentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx

Webinar: Transitioning from SQL to MongoDB

  • 2. Transitioning from SQL to MongoDB Joe Drumgoole Director of Developer Advocacy, EMEA @jdrumgoole [email protected] V1.2
  • 3. Before We Begin • This webinar is being recorded • Use The Chat Window for • Technical assistance • Q&A • MongoDB Team will answer quick questions in real time • “Common” questions will be reviewed at the end of the webinar
  • 4. Who is your Presenter? • Programmer • Developer Manager • Entrepreneur • Geek • Some time pre-sales guy
  • 5. MongoDB: The New Default Database Document Data Model Open- Source Fully Featured High Performance Scalable { name: “John Smith”, pfxs: [“Dr.”,”Mr.”], address: “10 3rd St.”, phone: { home: 1234567890, mobile: 1234568138 } }
  • 6. 6 It’s a JSON Database {u'_id': ObjectId('58511bfbb26a8803b6b4d56c'), u'batchID': 108, u'member': {u'chapters': [{u'id': 1775736, u'name': u'London MongoDB User Group', u'urlname': u'London-MongoDB-User-Group'}, {u'id': 1780459, u'name': u'Stockholm MongoDB User Group', u'urlname': u'Stockholm-MongoDB-User-Group'}, {u'id': 3478392, u'name': u'Dublin MongoDB User Group', u'urlname': u'DublinMUG'}, u'urlname': u'Mannheim-MongoDB-User-Group'}], u'city': u'Dublin', u'country': u'Ireland', u'events_attended': 13, u'is_organizer': True, u'join_time': datetime.datetime(2013, 10, 30, 17, 5, 31), u'last_access_time': datetime.datetime(2016, 12, 13, 15, 45, 27), u'location': {u'coordinates': [-6.25, 53.33000183105469], u'type': u'Point'}, u'member_id': 99473492, u'member_name': u'Joe Drumgoole', u'photo_thumb_url': u'http://photos2.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg'}, u'timestamp': datetime.datetime(2016, 12, 14, 10, 16, 27, 607000)} Typed Hierarchical, with lists and maps Geo-Spatial
  • 7. Functions of a Database • Durable data storage • Structural representation of data • CRUD operations • Authentication and authorization • Programmer Efficiency?
  • 8. What Are Your Developers Doing All Day? 1964 - IMS 1977 - Oracle 1984 - dBASE 1991 - MySQL 2009 - MongoDB
  • 9. The Challenge is Product Development 1976 2016 Business Data Goals Process Payroll Monthly Process real-time billing to the minute for 1m customers Release Schedule Semi-Annually Monthly Application/Code COBOL, Fortran, Algol, PL/1, assembler, proprietary tools Python, Java, Node.js, Ruby, PHP, Perl, Scala, Erlang and the rest Tools None Apache, LAMP, Mean, Eclipse, Intellij, Sourceforge etc. Database I/VSAM, early RDBMS RDBMS, NoSQL
  • 10. Rectangles are 1976. Maps and Lists are 2016 { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { type : “work”, number: “1-800-555-1212” }, { type : “home”, number: “1-800-555-1313”, DNC: true }, { type : “home”, number: “1-800-555-1414”, DNC: true } ] }
  • 11. AnActual Code Example Let’s compare and contrast RDBMS/SQL to MongoDB development using Java over the course of a few weeks. Some ground rules: 1. Observe rules of Software Engineering 101: Assume separation of application, Data Access Layer, and database implementation 2. Data Access Layer must be able to a. Expose simple, functional, data-only interfaces to the application • No ORM, frameworks, compile-time bindings, special tools b. Exploit high performance features of database 3. Focus on core data handling code and avoid distractions that require the same amount of work in both technologies a. No exception or error handling b. Leave out DB connection and other setup resources 4. Day counts are a proxy for progress, not actual time to complete indicated task 5. Don’t expect to cut and paste this code 
  • 12. The Task: Saving and Fetching Contact data Map m = new HashMap(); m.put(“name”, “Joe D”); m.put(“id”, “K1”); Start with this simple, flat shape in the Data Access Layer: id = save(Map m) And assume we save it in this way: Map m = fetch(String id) And assume we fetch one by primary key in this way: Brace yourself…..
  • 13. MongoDBSQL DDL: create table contact ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); } return m; } Day 1: Initial efforts for both technologies DDL: none Map fetch(String id) { Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext()) m = (Map) c.next(); } return m; } save( Map m ) { collection.insert(Document( m )); } {“name” : ”Joe D”, “id” : ”K1” }
  • 14. Day 2: Add simple fields m.put(“name”, “Joe D”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); • Capturing title and hireDate is part of adding a new business feature • It was pretty easy to add two fields to the structure • …but now we have to change our persistence code
  • 15. SQL Day 2 (changes in bold) DDL: alter table contact add title varchar(8); alter table contact add hireDate date; init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); } return m; } Consequences: 1. Code release schedule linked to database upgrade (new code cannot run on old schema) 2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive) 3. Changes require careful mods in 4 places 4. Beginning of technical debt
  • 16. MongoDB Day 2 save( Map m ) { collection.insert(Document( m )); } Map fetch(String id) { Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext()) m = (Map) c.next(); } return m; } Advantages: 1. Zero time and money spent on overhead code 2. Code and database not physically linked 3. New material with more fields can be added into existing collections; backfill is optional 4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset 5. No technical debt is created ✔ NO CHANGE
  • 17. Day 3: Add list of phone numbers m.put(“name”, “Joe D”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); n1.put(“type”, “work”); n1.put(“number”, “1-800-555-1212”)); list.add(n1); n2.put(“type”, “home”)); n2.put(“number”, “1-866-444-3131”)); list.add(n2); m.put(“phones”, list); • It was still pretty easy to add this data to the structure • .. but meanwhile, in the persistence code … REALLY brace yourself…
  • 18. SQL Day 3 changes: Option 1: Assume just 1 work and 1 home phone number DDL: alter table contact add work_phone varchar(16); alter table contact add home_phone varchar(16); init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate, work_phone, home_phone ) values ( ?,?,?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, work_phone, home_phone from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { String t = onePhone.get(“type”); String n = onePhone.get(“number”); if(t.equals(“work”)) { contactInsertStmt.setString(5, n); } else if(t.equals(“home”)) { contactInsertStmt.setString(6, n); } } contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); Map onePhone; onePhone = new HashMap(); onePhone.put(“type”, “work”); onePhone.put(“number”, rs.getString(5)); list.add(onePhone); onePhone = new HashMap(); onePhone.put(“type”, “home”); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); m.put(“phones”, list); } This is just plain bad….
  • 19. SQL Day 3 changes: Option 2: Proper approach with multiple phone numbers DDL: create table phones ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”); } save(Map m) { startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { c2stmt.setString(1, m.get(“id”)); c2stmt.setString(2, onePhone.get(“type”)); c2stmt.setString(3, onePhone.get(“number”)); c2stmt.execute(); } contactInsertStmt.execute(); endTrans(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) { if(i == 0) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); m.put(“phones”, list); } Map onePhone = new HashMap(); onePhone.put(“type”, rs.getString(5)); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); i++; } return m; } This took time and money
  • 20. SQL Day 5: Zero or More Entries init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”); } Whoops! And it’s also wrong! We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join. But this ALSO means we have to change the unwind logic This took more time and money! while (rs.next()) { if(i == 0) { // … } String s = rs.getString(5); if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); } } …but at least we have a DAL… right?
  • 21. MongoDB Day 3 Advantages: 1. Zero time and money spent on overhead code 2. No need to fear fields that are “naturally occurring” lists containing data specific to the parent structure and thus do not benefit from normalization and referential integrity 3. Safe from “Zero or More” entities save( Map m ) { collection.insert(Document( m )); } Map fetch(String id) { Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext()) m = (Map) c.next(); } return m; } ✔ NO CHANGE
  • 22. By Day 14, our structure looks like this: n4.put(“geo”, “US-EAST”); n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } ); list2.add(n4); n4.put(“geo”, “EMEA”); n4.put(“startupApps”, new String[] { “app6” } ); n4.put(“useLocalNumberFormats”, false): list2.add(n4); m.put(“preferences”, list2) n6.put(“optOut”, true); n6.put(“assertDate”, someDate); seclist.add(n6); m.put(“attestations”, seclist) m.put(“security”, mapOfDataCreatedByExternalSource);
  • 23. SQL Day 14 Error: Could not fit all the code into this space. But very likely, among other things: • n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”}); was implemented as a single semi-colon delimited string or we had to create another table and change the DAL • m.put(“security”, anotherMapOfData); was implemented by flattening it out and storing a subset of fields or as a blob
  • 24. MongoDB Day 14 – and every other day Advantages: 1. Zero time and money spent on overhead code 2. Persistence is so easy and flexible and backward compatible that the persistor does not upward- influence the shapes we want to persist i.e. the tail does not wag the dog save( Map m ) { collection.insert(Document( m )); } Map fetch(String id) { Map m = null; c = collection.find(eq( “id”, id )); if( c.hasNext()) m = (Map) c.next(); } return m; } ✔ NO CHANGE
  • 25. But what if we must do a join? Both RDBMS and MongoDB will have a PhoneTransactions table/collection { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { type : “work”, number: “1-800-555-1212” }, { type : “home”, number: “1-800-555-1313”, DNC: true }, { type : “home”, number: “1-800-555-1414”, DNC: true } ] } { number: “1-800-555-1212”, target: “1-999-238-3423”, duration: 20 } { number: “1-800-555-1212”, target: “1-444-785-6611”, duration: 243 } { number: “1-800-555-1414”, target: “1-645-331-4345”, duration: 132 } { number: “1-800-555-1414”, target: “1-990-875-2134”, duration: 71 } PhoneTransactions
  • 26. SQL JoinAttempt #1 select A.id, A.lname, B.type, B.number, C.target, C.duration from contact A, phones B, phonestx C where A.id = B.id and B.number = C.number id | lname | type | number | target | duration -----+--------------+------+----------------+----------------+---------- g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9 How to turn this into a list of names – each with a list of numbers, each of those with a list of target numbers?
  • 27. SQL UnwindAttempt #1 Map idmap = new HashMap(); ResultSet rs = fetchStmt.execute(); while (rs.next()) { String id = rs.getString(“id"); String nmbr = rs.getString("number"); List tnum; Map snum; if((snum = (List) idmap.get(id)) == null) { snum = new HashMap(); idmap.put(did, snum); } if((tnum = snum.get(nmbr)) == null) { tnum = new ArrayList(); snum.put(number, tnum); } Map info = new HashMap(); info.put("target", rs.getString("target")); info.put("duration", rs.getInteger("duration")); tnum.add(info); } // idmap[“g9”][“1-900-555-1212”] = ({target:1-222-707-7070,duration:23…)
  • 28. SQL JoinAttempt #2 select A.id, A.lname, B.type, B.number, C.target, C.duration from contact A, phones B, phonestx C where A.id = B.id and B.number = C.number order by A.id, B.number id | lname | type | number | target | duration -----+--------------+------+----------------+----------------+---------- g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7072 | 9 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7070 | 23 g10 | Kalan | work | 1-999-444-9999 | 1-222-907-7071 | 7 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7072 | 9 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7070 | 23 g9 | Moschetti | home | 1-777-999-1212 | 1-222-807-7071 | 7 g9 | Moschetti | work | 1-800-989-2231 | 1-987-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7071 | 7 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7072 | 9 g9 | Moschetti | home | 1-900-555-1212 | 1-222-707-7070 | 23 “Early bail out” from cursor is now possible – but logic to construct list of source and target numbers is similar
  • 29. SQL is about Disassembly String s = “select A, B, C, D, E, F from T1,T2,T3 where T1.col = T2.col and T2.col2 = T3.col2 and X = Y and X2 != Y2 and G > 10 and G < 100 and TO_DATE(‘ …”; ResultSet rs = execute(s); while(ResultSet.next()) { if(new column1 value from T1) { set up new Object; } if(new column2 value from T2) { set up new Object2 } if(new column3 value from T3) { set up new Object3 } populate maps, lists and scalars } Design a Big Query including business logic to grab all the data up front Throw it at the engine Disassemble Big Rectangle into usable objects with logic implicit in change in column values
  • 30. MongoDB is aboutAssembly Cursor c = coll1.find({“X”:”Y”}); while(c.hasNext()) { populate maps, lists and scalars; Cursor c2 = coll2.find(logic+key from c); while(c2.hasNext()) { populate maps, lists and scalars; Cursor c3 = coll3.find(logic+key from c2); while(c3.hasNext()) { populate maps, lists and scalars; } } DIY: OR assemble usable objects incrementally with explicit calls to $lookup and $graphLookup
  • 31. MongoDB ”Join” db.contacts.aggregate([ {$unwind: "$phones"} ,{$lookup: { from: "phonestx”, localField: "phones.number”, foreignField: "number", as:"TX"}} ]); { "customer_id" : 1, "first_name" : "Mark", "last_name" : "Smith", "city" : "San Francisco", "phones" : { "type" : "home", "number" : "1-800-555-1414", "DNC" : true }, "TX" : [ { "number" : "1-800-555-1414", "target" : "1-645-331-4345", "duration" : 132 }, { "number" : "1-800-555-1414", "target" : "1-990-875-2134", "duration" : 71 } ] }
  • 32. But what about “real” queries? • MongoDB query language is a physical map-of- map based structure, not a String • Operators (e.g. AND, OR, GT, EQ, etc.) and arguments are keys and values in a cascade of Maps • No grammar to parse, no templates to fill in, no whitespace, no escaping quotes, no parentheses, no punctuation • Same paradigm to manipulate data is used to manipulate query expressions • …which is also, by the way, the same paradigm for working with MongoDB metadata and explain()
  • 33. 33 Mongo Shell JD10Gen:mugalyser jdrumgoole$ mongo MongoDB shell version: 3.2.7 connecting to: test MongoDB Enterprise > use MUGS switched to db MUGS MongoDB Enterprise > show collections attendees audit groups members past_events upcoming_events MongoDB Enterprise > MongoDB Enterprise > db.members.find( { "batchID" : 108, "member.member_name" : "Joe Drumgoole" }).pretty() { "_id" : ObjectId("58511bfbb26a8803b6b4d56c"), "member" : { "city" : "Dublin", "events_attended" : 13, "last_access_time" : ISODate("2016-12-13T15:45:27Z"), "country" : "Ireland", "member_id" : 99473492, "chapters" : [ { "urlname" : "London-MongoDB-User-Group", "name" : "London MongoDB User Group", "id" : 1775736 …
  • 34. MongoDB Query Examples SQL CLI select * from contact A, phones B where A.did = B.did and B.type = 'work’; MongoDB CLI db.contact.find({"phones.type”:”work”}); SQL in Java String s = “select * from contact A, phones B where A.did = B.did and B.type = 'work’”; ResultSet rs = execute(s); MongoDB via Java driver Cursor c = contact.find(eq( “phones.type”, “work” )); Find all contacts with at least one work phone
  • 35. MongoDB Query Examples SQL select A.did, A.lname, A.hiredate, B.type, B.number from contact A left outer join phones B on (B.did = A.did) where b.type = 'work' or A.hiredate > '2014-02-02'::date Java db.contacts.find( or( eq( “phones.type’ : ”work” ), gt( “hiredate”, Date( 2014, 2, 2 )) CLI db.contacts.find( { $or : [ { “phones.type” : “work” }, { “hiredate” : new Date("2014-02-02T00:00:00.000Z")]}) Find all contacts with at least one work phone or hired after 2014-02-02
  • 36. …and before you ask… Yes, MongoDB query expressions support 1. Sorting 2. Cursor size limit 3. Projection (asking for only parts of the rich shape to be returned) 4. Aggregation (“GROUP BY”) functions
  • 37. Maybe even MORE powerful than SQL…? > db.results.values.aggregate([ {$match: { runnum:23, timeSeriesPath: "CDSSpread.12M//1909468128” } ,{$project: { timeSeriesPath: "$timeSeriesPath", values: foml }} ,{$unwind: {path: "$values", idx: "v_idx"}} ,{$match: {values: {$gt: 60}, {$or: [ {idx: 0}, {idx: {$size: . . .} ,{$group: {_id: {a: "$timeSeriesPath", b: term: "$idx"}, n: {$sum:1}, max: {$max: "$values"}, min: {$min: "$values"}}, sdev: {$stdDevPop: "$values"}} ,{$lookup: { from: ”deskLimits", localField: ”instID", foreignField: ”instID", as: ”inst"}} ,{$match: {maxDeskLimit: {$gt: {$cond: [ {$gt: [2, $max]}, 2, $max]}}}} ,{$group: {_id: "$deskID", total: {$sum: “$max”}}} ]);
  • 38. What is an Aggregation Pipeline Match Project Join Graph Sort View
  • 42. 42 Aggregation Pipeline $match $project $lookup {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {} {★ds} {★ds} {★ds} {★} {★} {★} {★} {★} {★} {★} {=d+s}
  • 43. 43 Aggregation Pipeline $match $project $lookup {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {} {★ds} {★ds} {★ds} {★} {★} {★} {★} {★} {★} {★} {=d+s} {★[]} {★[]} {★}
  • 44. 44 Aggregation Pipeline $match $project $lookup $group {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {★ds} {} {★ds} {★ds} {★ds} {★} {★} {★} {★} {★} {★} {★} {=d+s} { Σ λ σ} { Σ λ σ} { Σ λ σ} {★[]} {★[]} {★}
  • 45. Aggregation Pipeline Stages • $match Filter documents • $geoNear Geospherical query • $project Reshape documents • $lookup Left-outer equi joins • $unwind Expand documents • $group Summarize documents • $sample Randomly selects a subset of documents • $sort Order documents • $skip Jump over a number of documents • $limit Limit number of documents • $redact Restrict documents • $out Sends results to a new collection
  • 46. The Fundamental Change with mongoDB RDBMS designed in era when: • CPU and disk was slow & expensive • Memory was VERY expensive • Network? What network? • Languages had limited means to dynamically reflect on their types • Languages had poor support for richly structured types Thus, the database had to • Act as combiner-coordinator of simpler types • Define a rigid schema • (Together with the code) optimize at compile-time, not run-time In mongoDB, the data is the schema!
  • 47. MongoDB and the Rich Map Ecosystem Generic comparison of two records Map expr = new HashMap(); expr.put("myKey", "K1"); DBObject a = collection.findOne(expr); expr.put("myKey", "K2"); DBObject b = collection.findOne(expr); List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b); Getting default values for a thing on a certain date and then overlaying user preferences (like for a calculation run) Map expr = new HashMap(); expr.put("myKey", "DEFAULT"); expr.put("createDate", new Date(2013, 11, 1)); DBObject a = collection.findOne(expr); expr.clear(); expr.put("myKey", "user1"); DBObject b = otherCollectionPerhaps.findOne(expr); MapStack s = new MapStack(); s.push((Map)a); s.push((Map)b); Map merged = s.project(); Runtime reflection of Maps and Lists enables generic powerful utilities (MapDiff, MapStack) to be created once and used for all kinds of shapes, saving time and money
  • 48. Lastly: ACLI with teeth > db.contact.find({"SeqNum": {"$gt”:10000}}).explain(); { "cursor" : "BasicCursor", "n" : 200000, //... "millis" : 223 } Try a query and show the diagnostics > for(v=[],i=0;i<3;i++) { … n = i*50000; … expr = {"SeqNum": {"$gt”: n}}; … v.push( [n, db.contact.find(expr).explain().millis)] } Run it 3 times with smaller and smaller chunks and create a vector of timing result pairs (size,time) > v [ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ] Let’s see that vector > load(“jStat.js”) > jStat.stdev(v.map(function(p){return p[1];})) 2.0548046676563256 Use any other javascript you want inside the shell > for(i=0;i<3;i++) { … expr = {"SeqNum": {"$gt":i*1000}}; … db.foo.insert(db.contact.find(expr).explain()); } Party trick: save the explain() output back into a collection!
  • 49. And There is More – Compass and Atlas Compass Atlas
  • 50. 50 What Does This Add Up To? Relational NoSQL Expressive Query Language Strong Consistency Secondary Indexes Flexibility Scalability Performance

Editor's Notes

  • #8: Relational really helped programmer efficency by abstracting software away from hardware.
  • #9: Bad craziness with object orietned databases in the 90’s and XML databases in the early 00’s. The fight to abstract away from the nuts and bolts. Only one of these databases is not being sold anymore. But I bet someone out there is still using dBASE. My first paying job was data entry into a dBASE database.
  • #10: …but for the past 40 years, innovation at the business & application layers has outpaced innovation at the database layer
  • #11: Denormalisation.
  • #13: You can solve any problem in computing with a Hashmap or a combination of Hashmaps. Remember all problems in computing can be solved by adding another layer of indirection.
  • #14: We don’t need a prepared statement for the query
  • #23: It was still pretty easy to add this data to the structure Want to guess what the SQL persistence code looks like? How about the MongoDB persistence code?
  • #28: This SCENARIO works ONLY for the whole thing or ONE DID. It does not work for other access pat
  • #30: CONSEQUENCE: Logic split across SQL query and disassembly FURTHER complicated with prepared statements & paramterization because it splits up the Big Query
  • #36: Db.contacts.find( or( eq( “phones.type’ : ”work” ), gt( “hiredate”, Date( 2014, 2, 2 ))
  • #39: A Series of Document Transformations Executed in stages Original input is a collection Output as a cursor or a collection Rich Library of Functions Filter, compute, group, and summarize data Output of one stage sent to input of next Operations executed in sequential order
  • #40: Projection should create a new key rather than removing some
  • #41: Projection should create a new key rather than removing some
  • #42: Projection should create a new key rather than removing some
  • #43: Projection should create a new key rather than removing some
  • #44: Projection should create a new key rather than removing some
  • #45: Projection should create a new key rather than removing some
  • #51: MongoDB was built to address the way the world has changed while preserving the core database capabilities required to build functional apps MongoDB is the only database that harnesses the innovations of NoSQL and maintains the foundation of relational databases