SlideShare a Scribd company logo
Graph Databases
             and Neo4j
                          twitter: @thobe / #neo4j
Tobias Ivarsson           email: tobias@neotechnology.com
                          web: http://www.neo4j.org/
Hacker @ Neo Technology   web: http://www.thobe.org/
NOSQL - Why now?
    Four trends


                  2
Trend 1: Data size
               ExaBytes (10¹⁸) of data stored per year
                                                             988
1000
         Each year more and
         more digital data is
         created. Over t wo
 750     years we create more
         digital data than all                623
         the data created in
         history before that.
 500
                                  397

                            253
 250    161


   0
       2006                2007   2008        2009           2010
                                     Data source: IDC 2007     3
Trend 2: Connectedness
                                                                                                                    Giant
                                                                                                                    Global
                                                                                                                 Graph (GGG)


                                    Over time data has evolved to                                   Ontologies
                                    be more and more interlinked
                                    and connected.
                                                                                           RDF
                                    Hypertext has links,
                                    Blogs have pingback,
                                    Tagging groups all related data                                       Folksonomies
  Information connectivity




                                                                                        Tagging


                                                                        Wikis            User-generated
                                                                                            content
                                                                                Blogs


                                                                      RSS


                                                  Hypertext


                         Text documents
                                                         web 1.0                  web 2.0                        “web 3.0”

                                             1990                     2000                        2010                   2020   4
Trend 3: Semi-structure
๏ Individualization of content
   • In the salary lists of the 1970s, all elements had exactly one job
   • In Or 15? lists of the 2000s, we need 5 job columns! Or 8?
        the salary


๏ All encompassing “entire world views”
   • Store more data about each entity
๏ Trend accelerated by the decentralization of content generation
     that is the hallmark of the age of participation (“web 2.0”)



                                                                    5
Trend 4: Architecture

              1980s: Mainframe applications


                       Application




                           DB




                                              6
Trend 4: Architecture

             1990s: Database as integration hub


          Application   Application    Application




                            DB




                                                     7
Trend 4: Architecture

         2000s: (moving towards) Decoupled services
                        with their own backend

          Application       Application          Application




              DB                 DB                  DB




                                                               8
Why NOSQL Now?

๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture

                           9
RDBMS performance
               Salary List                                        Relational database

                                                                  Requirement of application
 Performance




                                         Majority of
                                         Webapps



                                                       Social network
               We are building




                                                            }
               applications today that
                                                                              Semantic Trading
               have complexity
               requirements that a
               Relational Database
               cannot handle with
               sufficient performance
                                                                        custom



                                                            Data complexity                      10
Scaling to size vs. Scaling to complexity
    Size
       Key/Value stores

                          Bigtable clones

                                            Document databases

                                                                 Graph databases
                                                                             Billions of nodes
                                                                             and relationships




                                > 90% of use cases

                                                                           Complexity

                                                                                   11
Graph Databases focuses on structure of data
                                   Graph databases focus
                                   on the structure of the
                                   data, scaling to the
                                   complexity of the data
                                   and of the application.




                                                 12
What is Neo4j?
๏ Neo4j is a Graph Database
   • Non-relational (“#nosql”), transactional (ACID), embedded
   • Data is stored as a Graph / Network
      ‣Nodes and relationships with properties
      ‣“Property Graph” or “edge-labeled multidigraph”
   • Schema free, bottom-up data model design
๏ Neo4j is Open Source / Free (as in speech) Software
                                                            Prices are available at
                                                            http://neotechnology.com/



   • AGPLv3
                                                            Contact us if you have
                                                            questions and/or special
                                                            license needs (e.g. if you


   • Commercial (“dual license”) license available
                                                            want an evaluation license)




      ‣First server is free (as in beer), next is inexpensive         13
More about Neo4j
๏ Neo4j is stable
   • In 24/7 operation since 2003
๏ Neo4j is in active development
   • Neo Technology received VC funding October 2009
๏ Neo4j delivers high performance graph operations
   • traverses 1’000’000+ relationships / second
       on commodity hardware




                                                       14
The Neo4j Graph data model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties              15
The Neo4j Graph data model




•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties              15
The Neo4j Graph data model


                                                      LIVES WITH
                                                               LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                        15
The Neo4j Graph data model

                                                                 LOVES

                                                      LIVES WITH
                                                               LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels
•Relationships are directed, but traversed at
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                        15
The Neo4j Graph data model
                                                                                name: “Mary”
                                                                 LOVES
             name: “James”                                                      age: 35
             age: 32                                  LIVES WITH
             twitter: “@spam”                                  LOVES



                                         OWNS
                                                                       DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels                                     brand: “Volvo”
•Relationships are directed, but traversed at                  model: “V70”
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                                 15
The Neo4j Graph data model
                                                                                name: “Mary”
                                                                 LOVES
             name: “James”                                                      age: 35
             age: 32                                  LIVES WITH
             twitter: “@spam”                                  LOVES



                                         OWNS
                                     item type: “car”                  DRIVES

•Nodes
•Relationships bet ween Nodes
•Relationships have Labels                                     brand: “Volvo”
•Relationships are directed, but traversed at                  model: “V70”
equal speed in both directions
•The semantics of the direction is up to the
application (LIVES WITH is reflexive, LOVES is not)
•Nodes have key-value properties
•Relationships have key-value properties                                                 15
Graphs are all around us
          A                        B           C             D           ...
   1              17                  3.14          3   17.79333333333

   2              42               10.11           14            30.33

   3           316                    6.66          1          2104.56

   4              32                  9.11     592      0.492432432432

   5      Even if this spreadsheet looks
          like it could be a fit for a RDBMS
                                                        2153.175765766
          it isn’t:
          •RDBMSes have problems with
  ...     extending indefinitely on both
          rows and columns
          •Formulas and data
          dependencies would quickly lead
          to heavy join operations

                                                                         16
Graphs are all around us
                 A                B      C         D            ...
   1            17               3.14     3    = A1 * B1 / C1

   2            42               10.11   14    = A2 * B2 / C2

   3           316               6.66     1    = A3 * B3 / C3

   4            32               9.11    592   = A4 * B4 / C4

   5                                           = SUM(D2:D5)
        With data dependencies
  ...   the spread sheet turns
        out to be a graph.




                                                                17
Graphs are all around us
                 A                B      C         D            ...
   1            17               3.14     3    = A1 * B1 / C1

   2            42               10.11   14    = A2 * B2 / C2

   3           316               6.66     1    = A3 * B3 / C3

   4            32               9.11    592   = A4 * B4 / C4

   5                                           = SUM(D2:D5)
        With data dependencies
  ...   the spread sheet turns
        out to be a graph.




                                                                17
Graphs are all around us                      If we add external data
                                              sources the problem
                                              becomes even more
                                              interesting...




          17     3.14       3    = A1 * B1 / C1

          42     10.11     14    = A2 * B2 / C2

          316    6.66       1    = A3 * B3 / C3

          32     9.11      592   = A4 * B4 / C4

                                 = SUM(D2:D5)




                                                      18
Graphs are all around us                      If we add external data
                                              sources the problem
                                              becomes even more
                                              interesting...




          17     3.14       3    = A1 * B1 / C1

          42     10.11     14    = A2 * B2 / C2

          316    6.66       1    = A3 * B3 / C3

          32     9.11      592   = A4 * B4 / C4

                                 = SUM(D2:D5)




                                                      18
Graphs are whiteboard friendly                  An application domain model
                                                outlined on a whiteboard or piece
                                                of paper would be translated to
                                                an ER-diagram, then normalized
                                                to fit a Relational Database.
                                                With a Graph Database the model
                                                from the whiteboard is
                                                implemented directly.




                         Image credits: Tobias Ivarsson            19
Graphs are whiteboard friendly                         An application domain model
                                                       outlined on a whiteboard or piece
                                                       of paper would be translated to
                                                       an ER-diagram, then normalized
                                                       to fit a Relational Database.
                                                       With a Graph Database the model
                                                       from the whiteboard is
                                                       implemented directly.

                            *
                    1
                                          *
            *           1




            *                                 1
                        *

                   1
                            *


                                Image credits: Tobias Ivarsson            19
Graphs are whiteboard friendly                         An application domain model
                                                       outlined on a whiteboard or piece
                                                       of paper would be translated to
                                                       an ER-diagram, then normalized
                                                       to fit a Relational Database.
                                                       With a Graph Database the model
                                                       from the whiteboard is
                                                       implemented directly.
                        thobe



                                       Joe project blog


                                     Wardrobe Strength


                 Hello Joe

                 Modularizing Jython

                    Neo4j performance analysis
                                Image credits: Tobias Ivarsson            19
Query Languages
๏ Traversal APIs
   • Neo4j core traversers
   • Blueprint pipes
๏ SPARQL - “SQL for linked data” - query by graph pattern matching
   SELECT ?person WHERE {                                                      Find all persons that
       ?person neo4j:KNOWS ?friend .                                           KNOWS a friend that
       ?friend neo4j:KNOWS ?foe .                                              KNOWS someone named
                                                                               “Larry Ellison”.
       ?foe neo4j:name "Larry Ellison" .
   }

๏ Gremlin - “perl for graphs” - query by traversal
   ./outE[@label='KNOWS']/inV[@age > 30]/@name

          Give me the names of all the people I know that are older than 30.                           20
Data manipulation API
GraphDatabaseService graphDb = getGraphDbInstanceSomehow();


   // Create Thomas 'Neo' Anderson
   Node mrAnderson = graphDb.createNode();
   mrAnderson.setProperty( "name", "Thomas Anderson" );
   mrAnderson.setProperty( "age", 29 );

   // Create Morpheus
   Node morpheus = graphDb.createNode();
   morpheus.setProperty( "name", "Morpheus" );
   morpheus.setProperty( "rank", "Captain" );
   morpheus.setProperty( "occupation", "Total bad ass" );

   // Create relationship representing they know each other
   mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
   // ... similarly for Trinity, Cypher, Agent Smith, Architect


                                                          21
Data manipulation API
GraphDatabaseService graphDb = getGraphDbInstanceSomehow();
Transaction tx = graphDb.beginTx();
try {
   // Create Thomas 'Neo' Anderson
   Node mrAnderson = graphDb.createNode();
   mrAnderson.setProperty( "name", "Thomas Anderson" );
   mrAnderson.setProperty( "age", 29 );

   // Create Morpheus
   Node morpheus = graphDb.createNode();
   morpheus.setProperty( "name", "Morpheus" );
   morpheus.setProperty( "rank", "Captain" );
   morpheus.setProperty( "occupation", "Total bad ass" );

   // Create relationship representing they know each other
   mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
   // ... similarly for Trinity, Cypher, Agent Smith, Architect
    tx.success();
} finally {
   tx.finish();                                          21
}
Graph traversals


                                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”




                                                                                           22
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
                                                                             Agent Smith (@ depth=3)
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Graph traversals                                                                  name: “The Architect”
                                    disclosure: “public”
name: “Thomas Anderson”
age: 29                                                     name: “Cypher”
                                                            last name: “Reagan”
                   KNOWS name: “Morpheus”
             KNOWS                                  KNOWS
                         rank: “Captain”                                                CODED BY
       LOVES             occupation: “Total badass”                        KNOWS
                           KNOWS
         name: “Trinity”                            disclosure: “secret”
                                                                              name: “Agent Smith”
                                                                              version: “1.0b”
 since: “meeting the oracle”       since: “a year before the movie”
                                                                              language: “C++”
                                   cooperates on: “The Nebuchadnezzar”
import neo4j
class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j
   types = [ neo4j.Outgoing.KNOWS ]               Morpheus (@ depth=1)
   order = neo4j.BREADTH_FIRST                    Trinity (@ depth=1)
   stop = neo4j.STOP_AT_END_OF_GRAPH
                                                  Cypher (@ depth=2)
   returnable = neo4j.RETURN_ALL_BUT_START_NODE
                                                                             Agent Smith (@ depth=3)
for friend_node in Friends(mr_anderson):
   print "%s (@ depth=%s)" % ( friend_node["name"],
     friend_node.depth )
                                                                                           23
Finding a place to start
๏ Traversals need a Node to start from
    • QUESTION: How do I find the start Node?
    • ANSWER:You use an Index
๏ Indexes in Neo4j are different from Indexes in Relational Databases
    • RDBMSes use them for Joining
    • Neo4j use them for simple lookup
IndexService index = getGraphDbIndexServiceSomehow();

Node mrAnderson = index.getSingleNode( "name",
                                        "Thomas Anderson" );

performTraversalFrom( mrAnderson );
                                                              24
Indexes in Neo4j
๏ The Graph *is* the main index
   • Use relationship labels for navigation
   • Build index structures *in the graph*
     ‣Search trees, tag clouds, geospatial indexes, et.c.
     ‣Linked/skip lists or other data structures in the graph
     ‣We have utility libraries for this
๏ External indexes used *for lookup*
   • Finding a (number of) points to start traversals from
   • Major difference from RDBMS that use indexes for everything
                                                                25
A domain object implemented in Neo4j
public interface Person {
   String getName();
   void setName( String firstName, String lastName );
}

public final class PersonImpl implements Person {
   private final Node underlyingNode;
   public PersonImpl( Node underlyingNode ) {
       this.underlyingNode = underlyingNode;
   }
   public String getName() {
       return String.format("%s %s",
          underlyingNode.getProperty("first name"),
          underlyingNode.getProperty("last name") );
   }
   public String setName(String firstName, String lastName) {
       underlyingNode.setProperty("first name", firstName);
       underlyingNode.setProperty("last name", lastName);
   }
}                                                         26
Neo4j as Software Transactional Memory
๏ Implement objects as wrappers around Nodes and Relationships
   • Neo4j is fast enough to allow you to read all state from the
      Node/Relationship
๏ Mutating operations require transactions
   • The changes are isolated from all other threads until committed
   • Multiple mutations can be committed atomically
๏ Nested transactions are flattened
   • Makes it possible to have methods open their own transaction
๏ Fits nicely with the OO paradigm
   • More focus on data than on objects (comp. Object DBs)    27
Why not use an O/R mapper?
๏ Model evolution in ORMs is a hard problem
   • virtually unsupported in most ORM systems
๏ SQL is “compatible” across many RDBMSs
   • data is still locked in
๏ Each ORM maps object models differently
   • Moving to another ORM == legacy schema support
      ‣except your legacy schema is a strange auto-generated one
๏ Object/Graph Mapping is always done the same way
   • allows you to keep your data through application changes
   • or share data between multiple implementations         28
What an ORM doesn’t do

๏Deep traversals
๏Graph algorithms
๏Shortest path(s)
๏Routing
๏etc.
                          29
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
Path exists in social network
๏ Each person has on average 50 friends      The performance impact
                                             in Neo4j depends only on
                                             the degree of each node. in
             Tobias                          an RDBMS it depends on
                                             the number of entries in
                                             the tables involved in the
                                             join(s).
                                   Emil



                 Johan
                                                Peter


        Database               # persons query time
  Relational database                 1 000      2 000 ms
  Neo4j Graph Database                1 000          2 ms
  Neo4j Graph Database            1 000 000          2 ms
  Relational database             1 000 000 way too long...
                                                                    30
On-line real time routing with Neo4j
๏ 20 million Nodes - represents places
๏ 62 million Edges - represents direct roads between places
   • These edges have a length property, for the length of the road
๏ Average optimal route, 100 separate roads, found in 100ms
๏ Worst case route we could find:
   • Optimal route is 5500 separate roads
   • Total length ~770km                             There’s a difference


   • Found in less than 3 seconds
                                                     bet ween least
                                                     number of hops and
                                                     least cost.

๏ Uses A* “best first” search
                                                                    31
Routing with Neo4j - using Neo4j Graph-Algos
# The cost evaluator - for choosing the best next node
class GeoCostEvaluator
    include EstimateEvaluator
    def getCost(node, goal)
        straight_path_distance(
           node.getProperty("lat"), node.getProperty("lon"),
           goal.getProperty("lat"), goal.getProperty("lon") )
    end
end

# Instantiate the A* search function
path_finder = AStar.new( Neo4j::instance,
   RelationshipExpander.forTypes(
       DynamicRelationshipType.withName("road"),
          Direction::BOTH ),
   DoubleEvaluator.new("length"), GeoCostEvaluator.new )

# Find the best path between New York City and San Francisco
best_path = path_finder.findSinglePath( NYC, SF )
                                                           32
Newest addition: Neo4j lets you REST
๏ Hello Neo4j REST server - Neo4j no longer needs to be embedded
๏ Opens up Neo4j to your favorite platform (even if that isn’t Java)
   • PHP, .NET, et.c. - libraries already exists!
   • http://wiki.neo4j.org/content/Getting_Started_REST
๏ Uses JSON for state transfer + browsable HTML for introspection
๏ Atomic modification operations
๏ Brand new declarative traversal framework
   • Extensible using your favorite scripting language
      ‣javascript is included. Jython, JRuby, et.c. supported
                                                                33
Other cool Graph Databases
๏ Sones GraphDB
   • Graph Query Language - a SQL-like query language for graphs
๏ Franz Inc. AllegroGraph
๏ HypergraphDB
๏ InfoGrid
๏ Twitter’s FlockDB
   • Optimized for the Twitter use case - one level relationships
๏ Interestingly we all have different approaches
                                                               34
Up until recently there was
                                                   only one Database, the
                                                   RDBMS.
                                                   The days of a single database
                                                   that rules all is over.




One database to rule them all


            Image credits: The Lord of the Rings, New Line Cinema

                                                                        35
Use best suited storage for each kind of data
                                                      The era of using
                                                      RDBMSes for all
                                                      problems is over.
                                                      Instead we should use
                                                      the database most
                                                      suited for the problem
                                                      at hand.




                             Image credits: Unknown :’(        36
Polyglot persistence
                                    ... we could even use
                                    multiple databases in
                                    conjunction, and let
                                    each database handle
                                    the things it does best.




                       Document
                            {...}


                            {...}


                            {...}
                                             37
Polyglot persistence
                 SQL && NOSQL


                                            Document
                                                 {...}


                                                 {...}

      All databases are welcome!
      SQL and NOSQL - it is Not Only SQL!        {...}
                                                         38
Finding out more
๏ http://neo4j.org/ - project website
      ‣http://api.neo4j.org/ and http://components.neo4j.org/
      ‣http://wiki.neo4j.org/ - HowTos, Tutorials, Examples, FAQ, et.c.
      ‣http://planet.neo4j.org/ - aggregation of blogs about Neo4j
๏ http://neotechnology.com/ - commercial licensing
๏ http://twitter.com/neo4j/team - follow the Neo4j team
๏ http://nosql.mypopescu.com/ - good source for news on NOSQL
     monitors Neo4j and other NOSQL solutions
๏ http://highscalability.com/ - has published a few articles about Neo4j
                                                                39
Buzzword summary                                                      http://neo4j.org/


                                                   Semi structured
                        SPARQL
      AGPLv3
                                                                 ACID transactions
                                         Open Source

               Object mapping                          Gremlin        Shortest path
In-Graph indexes                           NOSQL
             A* routing
                                                       whiteboard friendly
                               RESTful
       Traversal
                                         Query language

                 Embedded
                                                           Beer
                                                                       Schema free
                                   Software Transactional Memory
Right tool for the right job
           Scaling to complexity
                                                   Free Software

                         Polyglot persistence
                                                                             40
http://neotechnology.com

More Related Content

PPTX
Intro to Neo4j
PDF
Graph database Use Cases
PPTX
Introduction to Neo4j
PDF
Introduction to Neo4j for the Emirates & Bahrain
PDF
Intro to Neo4j and Graph Databases
PPTX
Apache HBase™
PDF
Graph based data models
PPT
Graph database
Intro to Neo4j
Graph database Use Cases
Introduction to Neo4j
Introduction to Neo4j for the Emirates & Bahrain
Intro to Neo4j and Graph Databases
Apache HBase™
Graph based data models
Graph database

What's hot (20)

PPTX
Hadoop File system (HDFS)
PPT
Introduction to RDF
PDF
The Graph Database Universe: Neo4j Overview
PDF
Neo4j: The path to success with Graph Database and Graph Data Science
PPTX
Free Training: How to Build a Lakehouse
PPTX
Introduction to Graph Databases
PDF
RDBMS to Graph
PPTX
Log analysis using elk
PDF
Natural Language Processing with Graph Databases and Neo4j
PDF
Neo4j in Depth
PDF
Achieving Lakehouse Models with Spark 3.0
PDF
Data ingestion and distribution with apache NiFi
PDF
Building Knowledge Graphs in 10 steps
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
PDF
An overview of Neo4j Internals
PDF
Intro to Graphs and Neo4j
PPTX
Graph databases
PPT
Neo4J : Introduction to Graph Database
PDF
Intro to Delta Lake
PPTX
Batch Processing vs Stream Processing Difference
Hadoop File system (HDFS)
Introduction to RDF
The Graph Database Universe: Neo4j Overview
Neo4j: The path to success with Graph Database and Graph Data Science
Free Training: How to Build a Lakehouse
Introduction to Graph Databases
RDBMS to Graph
Log analysis using elk
Natural Language Processing with Graph Databases and Neo4j
Neo4j in Depth
Achieving Lakehouse Models with Spark 3.0
Data ingestion and distribution with apache NiFi
Building Knowledge Graphs in 10 steps
Security and Data Governance using Apache Ranger and Apache Atlas
An overview of Neo4j Internals
Intro to Graphs and Neo4j
Graph databases
Neo4J : Introduction to Graph Database
Intro to Delta Lake
Batch Processing vs Stream Processing Difference
Ad

Viewers also liked (20)

PDF
Data Modeling with Neo4j
PDF
Graph database super star
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
PDF
GraphTalks Rome - Introducing Neo4j
PDF
Working With a Real-World Dataset in Neo4j: Import and Modeling
PPTX
OrientDB vs Neo4j - Comparison of query/speed/functionality
PPTX
Neo4j - graph database for recommendations
PDF
Graph Databases: Trends in the Web of Data
PDF
Relational to Big Graph
PPT
Natural Language Processing with Neo4j
PPT
Big Graph Analytics on Neo4j with Apache Spark
PDF
Neo4j PartnerDay Amsterdam 2017
PDF
Intro To MongoDB
PDF
Neo4j - 5 cool graph examples
PPTX
Graph Databases
PDF
Use Neo4j In Your Next Java Project
PDF
The Panama Papers: analysing it with neo4j and neo4j spatial - MINC 2016
PDF
The Definition of GraphDB
PPTX
Vbug nov 2010 Visio Validation
PDF
Graph databases in PHP @ PHPCon Poland 10-22-2011
Data Modeling with Neo4j
Graph database super star
An Introduction to NOSQL, Graph Databases and Neo4j
GraphTalks Rome - Introducing Neo4j
Working With a Real-World Dataset in Neo4j: Import and Modeling
OrientDB vs Neo4j - Comparison of query/speed/functionality
Neo4j - graph database for recommendations
Graph Databases: Trends in the Web of Data
Relational to Big Graph
Natural Language Processing with Neo4j
Big Graph Analytics on Neo4j with Apache Spark
Neo4j PartnerDay Amsterdam 2017
Intro To MongoDB
Neo4j - 5 cool graph examples
Graph Databases
Use Neo4j In Your Next Java Project
The Panama Papers: analysing it with neo4j and neo4j spatial - MINC 2016
The Definition of GraphDB
Vbug nov 2010 Visio Validation
Graph databases in PHP @ PHPCon Poland 10-22-2011
Ad

Similar to NOSQLEU - Graph Databases and Neo4j (20)

PDF
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
PDF
Django and Neo4j - Domain modeling that kicks ass
PPTX
No Sql Movement
PDF
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
PDF
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
PDF
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
PPTX
An Introduction to Big Data, NoSQL and MongoDB
PDF
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
PPT
Big Data = Big Decisions
PPT
NoSQL Basics - a quick tour
KEY
Spring Data Neo4j Intro SpringOne 2011
PPTX
Anti-social Databases
PPT
Mongodb open source_high_performance_database
PDF
Web 3.0: The Upcoming Revolution
PDF
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
PDF
Spring Into the Cloud
PDF
Oracle unified directory_11g
PDF
CloudFest Denver When Worlds Collide: HTML5 Meets the Cloud
PDF
An overview of NOSQL (JFokus 2011)
PPTX
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
Django and Neo4j - Domain modeling that kicks ass
No Sql Movement
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
An Introduction to Big Data, NoSQL and MongoDB
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Big Data = Big Decisions
NoSQL Basics - a quick tour
Spring Data Neo4j Intro SpringOne 2011
Anti-social Databases
Mongodb open source_high_performance_database
Web 3.0: The Upcoming Revolution
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
Spring Into the Cloud
Oracle unified directory_11g
CloudFest Denver When Worlds Collide: HTML5 Meets the Cloud
An overview of NOSQL (JFokus 2011)

More from Tobias Lindaaker (9)

PDF
NOSQL Overview
PDF
Building Applications with a Graph Database
PDF
JDK Power Tools
PDF
Choosing the right NOSQL database
PDF
[JavaOne 2011] Models for Concurrent Programming
PDF
Persistent graphs in Python with Neo4j
PDF
A Better Python for the JVM
PDF
A Better Python for the JVM
PDF
Exploiting Concurrency with Dynamic Languages
NOSQL Overview
Building Applications with a Graph Database
JDK Power Tools
Choosing the right NOSQL database
[JavaOne 2011] Models for Concurrent Programming
Persistent graphs in Python with Neo4j
A Better Python for the JVM
A Better Python for the JVM
Exploiting Concurrency with Dynamic Languages

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
1. Introduction to Computer Programming.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
August Patch Tuesday
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TLE Review Electricity (Electricity).pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Reach Out and Touch Someone: Haptics and Empathic Computing
1. Introduction to Computer Programming.pptx
cloud_computing_Infrastucture_as_cloud_p
MIND Revenue Release Quarter 2 2025 Press Release
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
August Patch Tuesday
Univ-Connecticut-ChatGPT-Presentaion.pdf
Tartificialntelligence_presentation.pptx
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
TLE Review Electricity (Electricity).pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf

NOSQLEU - Graph Databases and Neo4j

  • 1. Graph Databases and Neo4j twitter: @thobe / #neo4j Tobias Ivarsson email: [email protected] web: http://www.neo4j.org/ Hacker @ Neo Technology web: http://www.thobe.org/
  • 2. NOSQL - Why now? Four trends 2
  • 3. Trend 1: Data size ExaBytes (10¹⁸) of data stored per year 988 1000 Each year more and more digital data is created. Over t wo 750 years we create more digital data than all 623 the data created in history before that. 500 397 253 250 161 0 2006 2007 2008 2009 2010 Data source: IDC 2007 3
  • 4. Trend 2: Connectedness Giant Global Graph (GGG) Over time data has evolved to Ontologies be more and more interlinked and connected. RDF Hypertext has links, Blogs have pingback, Tagging groups all related data Folksonomies Information connectivity Tagging Wikis User-generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020 4
  • 5. Trend 3: Semi-structure ๏ Individualization of content • In the salary lists of the 1970s, all elements had exactly one job • In Or 15? lists of the 2000s, we need 5 job columns! Or 8? the salary ๏ All encompassing “entire world views” • Store more data about each entity ๏ Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”) 5
  • 6. Trend 4: Architecture 1980s: Mainframe applications Application DB 6
  • 7. Trend 4: Architecture 1990s: Database as integration hub Application Application Application DB 7
  • 8. Trend 4: Architecture 2000s: (moving towards) Decoupled services with their own backend Application Application Application DB DB DB 8
  • 9. Why NOSQL Now? ๏Trend 1: Size ๏Trend 2: Connectedness ๏Trend 3: Semi-structure ๏Trend 4: Architecture 9
  • 10. RDBMS performance Salary List Relational database Requirement of application Performance Majority of Webapps Social network We are building } applications today that Semantic Trading have complexity requirements that a Relational Database cannot handle with sufficient performance custom Data complexity 10
  • 11. Scaling to size vs. Scaling to complexity Size Key/Value stores Bigtable clones Document databases Graph databases Billions of nodes and relationships > 90% of use cases Complexity 11
  • 12. Graph Databases focuses on structure of data Graph databases focus on the structure of the data, scaling to the complexity of the data and of the application. 12
  • 13. What is Neo4j? ๏ Neo4j is a Graph Database • Non-relational (“#nosql”), transactional (ACID), embedded • Data is stored as a Graph / Network ‣Nodes and relationships with properties ‣“Property Graph” or “edge-labeled multidigraph” • Schema free, bottom-up data model design ๏ Neo4j is Open Source / Free (as in speech) Software Prices are available at http://neotechnology.com/ • AGPLv3 Contact us if you have questions and/or special license needs (e.g. if you • Commercial (“dual license”) license available want an evaluation license) ‣First server is free (as in beer), next is inexpensive 13
  • 14. More about Neo4j ๏ Neo4j is stable • In 24/7 operation since 2003 ๏ Neo4j is in active development • Neo Technology received VC funding October 2009 ๏ Neo4j delivers high performance graph operations • traverses 1’000’000+ relationships / second on commodity hardware 14
  • 15. The Neo4j Graph data model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 16. The Neo4j Graph data model •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 17. The Neo4j Graph data model LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 18. The Neo4j Graph data model LOVES LIVES WITH LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels •Relationships are directed, but traversed at equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 19. The Neo4j Graph data model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 20. The Neo4j Graph data model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS item type: “car” DRIVES •Nodes •Relationships bet ween Nodes •Relationships have Labels brand: “Volvo” •Relationships are directed, but traversed at model: “V70” equal speed in both directions •The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not) •Nodes have key-value properties •Relationships have key-value properties 15
  • 21. Graphs are all around us A B C D ... 1 17 3.14 3 17.79333333333 2 42 10.11 14 30.33 3 316 6.66 1 2104.56 4 32 9.11 592 0.492432432432 5 Even if this spreadsheet looks like it could be a fit for a RDBMS 2153.175765766 it isn’t: •RDBMSes have problems with ... extending indefinitely on both rows and columns •Formulas and data dependencies would quickly lead to heavy join operations 16
  • 22. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) With data dependencies ... the spread sheet turns out to be a graph. 17
  • 23. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) With data dependencies ... the spread sheet turns out to be a graph. 17
  • 24. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 18
  • 25. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 18
  • 26. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Image credits: Tobias Ivarsson 19
  • 27. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. * 1 * * 1 * 1 * 1 * Image credits: Tobias Ivarsson 19
  • 28. Graphs are whiteboard friendly An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. thobe Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis Image credits: Tobias Ivarsson 19
  • 29. Query Languages ๏ Traversal APIs • Neo4j core traversers • Blueprint pipes ๏ SPARQL - “SQL for linked data” - query by graph pattern matching SELECT ?person WHERE { Find all persons that ?person neo4j:KNOWS ?friend . KNOWS a friend that ?friend neo4j:KNOWS ?foe . KNOWS someone named “Larry Ellison”. ?foe neo4j:name "Larry Ellison" . } ๏ Gremlin - “perl for graphs” - query by traversal ./outE[@label='KNOWS']/inV[@age > 30]/@name Give me the names of all the people I know that are older than 30. 20
  • 30. Data manipulation API GraphDatabaseService graphDb = getGraphDbInstanceSomehow(); // Create Thomas 'Neo' Anderson Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create relationship representing they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ... similarly for Trinity, Cypher, Agent Smith, Architect 21
  • 31. Data manipulation API GraphDatabaseService graphDb = getGraphDbInstanceSomehow(); Transaction tx = graphDb.beginTx(); try { // Create Thomas 'Neo' Anderson Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); // Create Morpheus Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); morpheus.setProperty( "occupation", "Total bad ass" ); // Create relationship representing they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ... similarly for Trinity, Cypher, Agent Smith, Architect tx.success(); } finally { tx.finish(); 21 }
  • 32. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” 22
  • 33. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 34. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 35. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 36. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 37. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 38. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 39. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREADTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print "%s (@ depth=%s)" % ( friend_node["name"], friend_node.depth ) 23
  • 40. Finding a place to start ๏ Traversals need a Node to start from • QUESTION: How do I find the start Node? • ANSWER:You use an Index ๏ Indexes in Neo4j are different from Indexes in Relational Databases • RDBMSes use them for Joining • Neo4j use them for simple lookup IndexService index = getGraphDbIndexServiceSomehow(); Node mrAnderson = index.getSingleNode( "name", "Thomas Anderson" ); performTraversalFrom( mrAnderson ); 24
  • 41. Indexes in Neo4j ๏ The Graph *is* the main index • Use relationship labels for navigation • Build index structures *in the graph* ‣Search trees, tag clouds, geospatial indexes, et.c. ‣Linked/skip lists or other data structures in the graph ‣We have utility libraries for this ๏ External indexes used *for lookup* • Finding a (number of) points to start traversals from • Major difference from RDBMS that use indexes for everything 25
  • 42. A domain object implemented in Neo4j public interface Person { String getName(); void setName( String firstName, String lastName ); } public final class PersonImpl implements Person { private final Node underlyingNode; public PersonImpl( Node underlyingNode ) { this.underlyingNode = underlyingNode; } public String getName() { return String.format("%s %s", underlyingNode.getProperty("first name"), underlyingNode.getProperty("last name") ); } public String setName(String firstName, String lastName) { underlyingNode.setProperty("first name", firstName); underlyingNode.setProperty("last name", lastName); } } 26
  • 43. Neo4j as Software Transactional Memory ๏ Implement objects as wrappers around Nodes and Relationships • Neo4j is fast enough to allow you to read all state from the Node/Relationship ๏ Mutating operations require transactions • The changes are isolated from all other threads until committed • Multiple mutations can be committed atomically ๏ Nested transactions are flattened • Makes it possible to have methods open their own transaction ๏ Fits nicely with the OO paradigm • More focus on data than on objects (comp. Object DBs) 27
  • 44. Why not use an O/R mapper? ๏ Model evolution in ORMs is a hard problem • virtually unsupported in most ORM systems ๏ SQL is “compatible” across many RDBMSs • data is still locked in ๏ Each ORM maps object models differently • Moving to another ORM == legacy schema support ‣except your legacy schema is a strange auto-generated one ๏ Object/Graph Mapping is always done the same way • allows you to keep your data through application changes • or share data between multiple implementations 28
  • 45. What an ORM doesn’t do ๏Deep traversals ๏Graph algorithms ๏Shortest path(s) ๏Routing ๏etc. 29
  • 46. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 47. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 48. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 49. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 50. Path exists in social network ๏ Each person has on average 50 friends The performance impact in Neo4j depends only on the degree of each node. in Tobias an RDBMS it depends on the number of entries in the tables involved in the join(s). Emil Johan Peter Database # persons query time Relational database 1 000 2 000 ms Neo4j Graph Database 1 000 2 ms Neo4j Graph Database 1 000 000 2 ms Relational database 1 000 000 way too long... 30
  • 51. On-line real time routing with Neo4j ๏ 20 million Nodes - represents places ๏ 62 million Edges - represents direct roads between places • These edges have a length property, for the length of the road ๏ Average optimal route, 100 separate roads, found in 100ms ๏ Worst case route we could find: • Optimal route is 5500 separate roads • Total length ~770km There’s a difference • Found in less than 3 seconds bet ween least number of hops and least cost. ๏ Uses A* “best first” search 31
  • 52. Routing with Neo4j - using Neo4j Graph-Algos # The cost evaluator - for choosing the best next node class GeoCostEvaluator include EstimateEvaluator def getCost(node, goal) straight_path_distance( node.getProperty("lat"), node.getProperty("lon"), goal.getProperty("lat"), goal.getProperty("lon") ) end end # Instantiate the A* search function path_finder = AStar.new( Neo4j::instance, RelationshipExpander.forTypes( DynamicRelationshipType.withName("road"), Direction::BOTH ), DoubleEvaluator.new("length"), GeoCostEvaluator.new ) # Find the best path between New York City and San Francisco best_path = path_finder.findSinglePath( NYC, SF ) 32
  • 53. Newest addition: Neo4j lets you REST ๏ Hello Neo4j REST server - Neo4j no longer needs to be embedded ๏ Opens up Neo4j to your favorite platform (even if that isn’t Java) • PHP, .NET, et.c. - libraries already exists! • http://wiki.neo4j.org/content/Getting_Started_REST ๏ Uses JSON for state transfer + browsable HTML for introspection ๏ Atomic modification operations ๏ Brand new declarative traversal framework • Extensible using your favorite scripting language ‣javascript is included. Jython, JRuby, et.c. supported 33
  • 54. Other cool Graph Databases ๏ Sones GraphDB • Graph Query Language - a SQL-like query language for graphs ๏ Franz Inc. AllegroGraph ๏ HypergraphDB ๏ InfoGrid ๏ Twitter’s FlockDB • Optimized for the Twitter use case - one level relationships ๏ Interestingly we all have different approaches 34
  • 55. Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over. One database to rule them all Image credits: The Lord of the Rings, New Line Cinema 35
  • 56. Use best suited storage for each kind of data The era of using RDBMSes for all problems is over. Instead we should use the database most suited for the problem at hand. Image credits: Unknown :’( 36
  • 57. Polyglot persistence ... we could even use multiple databases in conjunction, and let each database handle the things it does best. Document {...} {...} {...} 37
  • 58. Polyglot persistence SQL && NOSQL Document {...} {...} All databases are welcome! SQL and NOSQL - it is Not Only SQL! {...} 38
  • 59. Finding out more ๏ http://neo4j.org/ - project website ‣http://api.neo4j.org/ and http://components.neo4j.org/ ‣http://wiki.neo4j.org/ - HowTos, Tutorials, Examples, FAQ, et.c. ‣http://planet.neo4j.org/ - aggregation of blogs about Neo4j ๏ http://neotechnology.com/ - commercial licensing ๏ http://twitter.com/neo4j/team - follow the Neo4j team ๏ http://nosql.mypopescu.com/ - good source for news on NOSQL monitors Neo4j and other NOSQL solutions ๏ http://highscalability.com/ - has published a few articles about Neo4j 39
  • 60. Buzzword summary http://neo4j.org/ Semi structured SPARQL AGPLv3 ACID transactions Open Source Object mapping Gremlin Shortest path In-Graph indexes NOSQL A* routing whiteboard friendly RESTful Traversal Query language Embedded Beer Schema free Software Transactional Memory Right tool for the right job Scaling to complexity Free Software Polyglot persistence 40