SlideShare a Scribd company logo
A quick review of
Python and Graph
Databases
NIC CROUCH
@FPHHOTCHIPS
Who am I?
◦ Consultant at Deloitte Melbourne
in Enterprise Information Management
◦ Recent graduate of Flinders University in Adelaide
◦ Casual/Enthusiast reviewer of Graph Databases
What is a graph?
“A set of objects connected by links” – Wikipedia
Objects: Vertices, nodes, points
Links: Edges, arcs, lines, relationships
Prior Work on Graphs in Python
Graph Database Patterns in Python – Elizabeth Ramirez, PyCon US 2015
Practical Graph/Network Analysis Made Simple – Eric Ma, PyCon US 2015
Graphs, Networks and Python: The Power of Interconnection – Lachlan Blackhall, PyCon AU
2014
An introduction to Python and graph databases with Neo4j - Holger Spill, PyCon NZ 2014
Mogwai: Graph Databases in your App – Cody Lee, PyTexas 2014
Today: Pythonic Graphs
An exploration of graph storage in Python:
◦ API must be Pythonic
◦ execute(“<Not Python>”) doesn’t count.
◦ As little configuration as possible
Caveats:
◦ No configuration means no tuning
◦ Can’t compare distributed performance on a single node
◦ Limited to rough comparisons of performance – not a lab environment!
The Simple
1) Set up a dictionary of nodes
2) Each node keeps a list of relationships (or two, if you want a directed graph)
3) Set up add and get convenience methods
Pros:
• Sometimes the simplest ways are the best
• Very quick
Cons:
• Not consistent
• Probably going to need to be
maintained
• Not persistent
The (slightly less) Simple
1) Set up a Shelf of nodes
2) Each node keeps a list of relationships (or two, if you want a directed graph)
3) Set up add and get convenience methods
Pros:
• Still reasonably quick
Cons:
• Not consistent
• Probably going to need to be
maintained
Off-topic: NetworkX
All the advantages of using a dictionary with none of the custom code.
◦ Comes with graph generators
◦ BSD Licenced
◦ Loads of standard analysis algorithms
◦ 90% test coverage
◦ … no persistence (except Pickle).
The Popularity Test
DBMS
Score
Jul
2015
Neo4j 31.34
OrientDB 4.46
Titan 3.89
ArangoDB 1.29
Giraph 1.03
The Incumbent: Neo4j
Released in 2007
Written in Java
GPLv3/AGPLv3 or a commercial license
Runs as a server that exposes a REST Interface
Natively uses Cypher – an in-house developed graph query language
Best established, most popular graph-database
Easy to install – unzip and run a script
High Availability, but a little difficult to scale
Neo4j from Python
Py2Neo:
◦ Built by Nigel Small from Neo4j
◦ Actively maintained
Neo4j-rest-client
◦ Javier de la Rosa from University of Western Ontario
◦ Maintained through 9 months ago
neo4jdb-python
◦ Jacob Hansson of Neo4j
◦ Maintained through 8 months ago
◦ Mostly just wrappers around Cypher
Bulbflow:
◦ Built by James Thornton of Pipem/Espeed
◦ Maintained to 8 months ago
◦ Connects to multiple backends
Py2Neo: Syntax
Set up a connection:
◦ graph=Graph("http://neo4j:password@localhost:7474/db/data/")
Create a node:
◦ graph.create(Node("node_label", name=node_name))
◦ Node labels are like classes
Find a node:
◦ graph.find_one("node_label", property_key="name",property_value=node_name)
Create a relationship:
◦ graph.create(Relationship(node1, relationship, node2))
Find a relationship:
o graph.match_one(node1, relationship, node2, bidirectional=False)
Py2Neo: Good and Bad
The good:
Simple API
Well documented
Easy to connect and get started.
Cool (if preliminary) spatial support
Not so much:
◦ Skinny API
◦ No transaction support for Pythonic calls
◦ Performance struggles on large inputs
◦ No ORM (kinda)
neo4j-rest-client Syntax
Set up a connection:
◦ graph=GraphDatabase("http://localhost:7474/db/data/", username="username",
password="password")
Create a node:
◦ node=graph.nodes.create(name=node_name)
◦ Node labels are like classes
Find a node:
◦ graph.nodes.filter(Q("name", iexact=node)).elements[0]
Create a relationship:
◦ relationship=node1.is_related_to(node2)
neo4j-rest-client:
Good and Bad
Transaction support with a context manager*
Strong filtering syntax
Very strong labelling syntax – searchable tags for nodes
Lazy evaluation of queries
Still REST based – still difficult to make it perform
*Seemingly. Somewhat difficult to make it work.
Py2Neo vs Neo4j-Rest-Client:
Performance
100 nodes with 20% connection:
Loading:
Py2Neo: ~8 seconds
Neo4j-rest-client: ~5 seconds
Postgres: 4s
Retrieving:
Py2Neo: ~6 seconds
Neo4j-rest-client: ~5 seconds
Postgres: 4s
1000 nodes with 20% connection:
Loading:
Py2Neo: ~7 minutes
Neo4j-rest-client: ~50 minutes
Postgres: 6 minutes
Retrieving:
Py2Neo: ~7 minutes
Neo4j-rest-client: ~50 minutes
Postgres: 6 minutes
Machine:
AWS Memory Optimised
xLarge node (30GB RAM)
on Ubuntu Server using
iPython2 3.0.0
Important note
Completely unoptimised! No indexes, no attempt to chunk, only
a couple OS optimisations.
OrientDB
PyOrient:
◦ Official OrientDB Driver for Python
◦ Binary Driver
◦ Not Pythonic
Released in 2011
More NoSQL than Neo and Titan (Documents as well as graphs)
Scalable across multiple servers
Supports SQL
Titan
First released in 2012
Written in Java
Licenced under Apache Licence
Many storage backends, including Cassandra, HBase and BerkeleyDB
Hadoop integration
Large amount of search back-ends
Built for scalability
Commercially supported by DataStax (formerly Aurelius)
Titan and Python
Mogwai:
◦ Written by Cody Lee of wellaware
◦ Binary Driver for RexPro Server
◦ Very pythonic!
Bulbflow:
◦ Built by James Thornton of Pipem/Espeed
◦ REST-based interface
◦ Maintained to 8 months ago
◦ Connects to multiple backends
RexPro and the
Tinkerpop Stack
Apache Incubator Open Source Graph Framework
◦ Built around Gremlin
◦ Written in Java
◦ Extensively documented
Mogwai Performance
100 nodes with 20% connection:
Loading:
14 seconds
Retrieving:
18 seconds
1000 nodes with 20% connection:
Loading:
~9 minutes
Retrieving:
~25 minutes
So, what should I use?*
Neo4j:
◦ Good, relatively quick
bindings
◦ Well supported
◦ Could be expensive
◦ May not scale
*The full title of this slide is “What should I research further to ensure it meets my specific needs and then
consider using?” In any case, the answer is still “It depends”
It depends.
Titan:
◦ Good bindings
◦ Support in doubt
◦ Should be cheaper
◦ Proven scalability
Orient:
◦ Poor bindings
◦ Well supported
◦ Open pricing structure
◦ Should scale well
What about Python Graph Databases?
Not just Python bindings –pure(ish) Python.
GrapheekDB: https://bitbucket.org/nidusfr/grapheekdb
◦ Uses local memory, Kyoto Cabinet or Symas LMDB as backend
◦ Under active development
◦ Exposes client/server interface
◦ Code is Beta quality at best
◦ Documentation is very spotty
Ajgu: https://bitbucket.org/amirouche/ajgu-graphdb/
◦ Uses Berkeley Database backend
◦ Under active development
◦ “This program is alpha becarful”
◦ Python 3 only
Ajgu
Set up a connection:
◦ graph = GraphDatabase(Storage('./BSDDB/graph'))
Create a node:
◦ transaction = self.graph.transaction(sync=True)
◦ node = transaction.vertex.create(node)
Find a node:
◦ transaction.vertex.label(start)
Create a relationship:
◦ relationship=transaction.edge.create(node1,node2)
Take-aways
Graphs match plenty of data sets
The big three Graph Databases are Neo4j, Titan and Orient
All three have upsides and downsides – depending on the usecase.
If you want to have a bit more fun, try Ajgu or Grapheek!
Thanks!
Questions?
nic@niccrouch.com
@fphhotchips
Py2Neo: Performance and
Transactional Support
Large imports should be done in one transaction to decrease overhead:
Graph.create(long_list_of_nodes_and_relationships)
This kills the client (essentially hangs in string processing).
So:
for chunk in izip_longest(*[iter(iterator)]*size, fillvalue=''):
try:
chunk = chunk[0:chunk.index('')]
except ValueError:
pass
try:
self.graph.create(*chunk)
except Exception as ex:
pass #chunk dividing goes here
We lose ACID at this point.
What if this fails? Have to chunk it up again to find what
failed.

More Related Content

PDF
Graph Databases in Python (PyCon Canada 2012)
PDF
Linked Process
PPTX
The openCypher Project - An Open Graph Query Language
PPTX
Large scale, interactive ad-hoc queries over different datastores with Apache...
PDF
TinkerPop: a story of graphs, DBs, and graph DBs
PDF
Gerry McNicol Graph Databases
PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
PPTX
Graph databases: Tinkerpop and Titan DB
Graph Databases in Python (PyCon Canada 2012)
Linked Process
The openCypher Project - An Open Graph Query Language
Large scale, interactive ad-hoc queries over different datastores with Apache...
TinkerPop: a story of graphs, DBs, and graph DBs
Gerry McNicol Graph Databases
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Graph databases: Tinkerpop and Titan DB

What's hot (20)

PDF
Using PostgreSQL with Bibliographic Data
PPTX
Big Data Science with H2O in R
PDF
data.table and H2O at LondonR with Matt Dowle
PDF
Performance comparison: Multi-Model vs. MongoDB and Neo4j
PPTX
Neo, Titan & Cassandra
ODP
Cool bonsai cool - an introduction to ElasticSearch
PDF
Sem tech 2010_integrity_constraints
PPT
Hands on Training – Graph Database with Neo4j
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
PDF
Bids talk 9.18
PPTX
HUG France - Apache Drill
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
PDF
OrientDB & Node.js Overview - JS.Everywhere() KW
PPTX
data science toolkit 101: set up Python, Spark, & Jupyter
PDF
Combine Spring Data Neo4j and Spring Boot to quickl
PPTX
Intro to Python Data Analysis in Wakari
PDF
Stardog Linked Data Catalog
PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
PDF
A general introduction to Spring Data / Neo4J
PDF
How Graph Databases efficiently store, manage and query connected data at s...
Using PostgreSQL with Bibliographic Data
Big Data Science with H2O in R
data.table and H2O at LondonR with Matt Dowle
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Neo, Titan & Cassandra
Cool bonsai cool - an introduction to ElasticSearch
Sem tech 2010_integrity_constraints
Hands on Training – Graph Database with Neo4j
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Bids talk 9.18
HUG France - Apache Drill
Data Day Texas 2017: Scaling Data Science at Stitch Fix
OrientDB & Node.js Overview - JS.Everywhere() KW
data science toolkit 101: set up Python, Spark, & Jupyter
Combine Spring Data Neo4j and Spring Boot to quickl
Intro to Python Data Analysis in Wakari
Stardog Linked Data Catalog
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
A general introduction to Spring Data / Neo4J
How Graph Databases efficiently store, manage and query connected data at s...
Ad

Viewers also liked (20)

PDF
Persistent graphs in Python with Neo4j
PDF
Word Puzzles with Neo4j and Py2neo
PDF
Odessapy2013 - Graph databases and Python
PDF
Introduction to py2neo
PDF
Why we love ArangoDB. The hunt for the right NosQL Database
PDF
Creative Data Analysis with Python
PPT
Knowledge structure
PDF
Natural Language Processing and Graph Databases in Lumify
KEY
Round pegs and square holes
PDF
Graph Analyses with Python and NetworkX
PDF
Django and Neo4j - Domain modeling that kicks ass
PDF
Natural language processing (Python)
PPTX
Airflow - a data flow engine
PDF
ArangoDB – A different approach to NoSQL
PDF
Building social network with Neo4j and Python
KEY
Graphs in the Database: Rdbms In The Social Networks Age
PDF
Designing and Building a Graph Database Application – Architectural Choices, ...
PDF
Natural Language Processing with Graph Databases and Neo4j
PPTX
Introduction to Graph Databases
PPTX
Genetic Algorithm by Example
Persistent graphs in Python with Neo4j
Word Puzzles with Neo4j and Py2neo
Odessapy2013 - Graph databases and Python
Introduction to py2neo
Why we love ArangoDB. The hunt for the right NosQL Database
Creative Data Analysis with Python
Knowledge structure
Natural Language Processing and Graph Databases in Lumify
Round pegs and square holes
Graph Analyses with Python and NetworkX
Django and Neo4j - Domain modeling that kicks ass
Natural language processing (Python)
Airflow - a data flow engine
ArangoDB – A different approach to NoSQL
Building social network with Neo4j and Python
Graphs in the Database: Rdbms In The Social Networks Age
Designing and Building a Graph Database Application – Architectural Choices, ...
Natural Language Processing with Graph Databases and Neo4j
Introduction to Graph Databases
Genetic Algorithm by Example
Ad

Similar to A quick review of Python and Graph Databases (20)

PDF
GR8Conf 2011: Neo4j Plugin
PDF
Grails and Neo4j
PDF
Neo4j Database and Graph Platform Overview
PDF
Netty training
PDF
PPTX
GraphQL-ify your APIs - Devoxx UK 2021
PDF
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
PDF
Using React with Grails 3
PPTX
Untangling - fall2017 - week 9
PPTX
Beginners Node.js
PDF
kranonit S06E01 Игорь Цинько: High load
PDF
AIDevWorldApacheNiFi101
PDF
Operating PostgreSQL at Scale with Kubernetes
PDF
Evaluating Cloud Native Storage Vendors - DoK Talks #147
PDF
web2py:Web development like a boss
PDF
Code for Startup MVP (Ruby on Rails) Session 1
PDF
Introduction to Chainer
PDF
Introduction to Chainer
PPTX
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
PPTX
SQL to NoSQL: Top 6 Questions
GR8Conf 2011: Neo4j Plugin
Grails and Neo4j
Neo4j Database and Graph Platform Overview
Netty training
GraphQL-ify your APIs - Devoxx UK 2021
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Using React with Grails 3
Untangling - fall2017 - week 9
Beginners Node.js
kranonit S06E01 Игорь Цинько: High load
AIDevWorldApacheNiFi101
Operating PostgreSQL at Scale with Kubernetes
Evaluating Cloud Native Storage Vendors - DoK Talks #147
web2py:Web development like a boss
Code for Startup MVP (Ruby on Rails) Session 1
Introduction to Chainer
Introduction to Chainer
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
SQL to NoSQL: Top 6 Questions

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
sap open course for s4hana steps from ECC to s4
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.

A quick review of Python and Graph Databases

  • 1. A quick review of Python and Graph Databases NIC CROUCH @FPHHOTCHIPS
  • 2. Who am I? ◦ Consultant at Deloitte Melbourne in Enterprise Information Management ◦ Recent graduate of Flinders University in Adelaide ◦ Casual/Enthusiast reviewer of Graph Databases
  • 3. What is a graph? “A set of objects connected by links” – Wikipedia Objects: Vertices, nodes, points Links: Edges, arcs, lines, relationships
  • 4. Prior Work on Graphs in Python Graph Database Patterns in Python – Elizabeth Ramirez, PyCon US 2015 Practical Graph/Network Analysis Made Simple – Eric Ma, PyCon US 2015 Graphs, Networks and Python: The Power of Interconnection – Lachlan Blackhall, PyCon AU 2014 An introduction to Python and graph databases with Neo4j - Holger Spill, PyCon NZ 2014 Mogwai: Graph Databases in your App – Cody Lee, PyTexas 2014
  • 5. Today: Pythonic Graphs An exploration of graph storage in Python: ◦ API must be Pythonic ◦ execute(“<Not Python>”) doesn’t count. ◦ As little configuration as possible Caveats: ◦ No configuration means no tuning ◦ Can’t compare distributed performance on a single node ◦ Limited to rough comparisons of performance – not a lab environment!
  • 6. The Simple 1) Set up a dictionary of nodes 2) Each node keeps a list of relationships (or two, if you want a directed graph) 3) Set up add and get convenience methods Pros: • Sometimes the simplest ways are the best • Very quick Cons: • Not consistent • Probably going to need to be maintained • Not persistent
  • 7. The (slightly less) Simple 1) Set up a Shelf of nodes 2) Each node keeps a list of relationships (or two, if you want a directed graph) 3) Set up add and get convenience methods Pros: • Still reasonably quick Cons: • Not consistent • Probably going to need to be maintained
  • 8. Off-topic: NetworkX All the advantages of using a dictionary with none of the custom code. ◦ Comes with graph generators ◦ BSD Licenced ◦ Loads of standard analysis algorithms ◦ 90% test coverage ◦ … no persistence (except Pickle).
  • 9. The Popularity Test DBMS Score Jul 2015 Neo4j 31.34 OrientDB 4.46 Titan 3.89 ArangoDB 1.29 Giraph 1.03
  • 10. The Incumbent: Neo4j Released in 2007 Written in Java GPLv3/AGPLv3 or a commercial license Runs as a server that exposes a REST Interface Natively uses Cypher – an in-house developed graph query language Best established, most popular graph-database Easy to install – unzip and run a script High Availability, but a little difficult to scale
  • 11. Neo4j from Python Py2Neo: ◦ Built by Nigel Small from Neo4j ◦ Actively maintained Neo4j-rest-client ◦ Javier de la Rosa from University of Western Ontario ◦ Maintained through 9 months ago neo4jdb-python ◦ Jacob Hansson of Neo4j ◦ Maintained through 8 months ago ◦ Mostly just wrappers around Cypher Bulbflow: ◦ Built by James Thornton of Pipem/Espeed ◦ Maintained to 8 months ago ◦ Connects to multiple backends
  • 12. Py2Neo: Syntax Set up a connection: ◦ graph=Graph("http://neo4j:password@localhost:7474/db/data/") Create a node: ◦ graph.create(Node("node_label", name=node_name)) ◦ Node labels are like classes Find a node: ◦ graph.find_one("node_label", property_key="name",property_value=node_name) Create a relationship: ◦ graph.create(Relationship(node1, relationship, node2)) Find a relationship: o graph.match_one(node1, relationship, node2, bidirectional=False)
  • 13. Py2Neo: Good and Bad The good: Simple API Well documented Easy to connect and get started. Cool (if preliminary) spatial support Not so much: ◦ Skinny API ◦ No transaction support for Pythonic calls ◦ Performance struggles on large inputs ◦ No ORM (kinda)
  • 14. neo4j-rest-client Syntax Set up a connection: ◦ graph=GraphDatabase("http://localhost:7474/db/data/", username="username", password="password") Create a node: ◦ node=graph.nodes.create(name=node_name) ◦ Node labels are like classes Find a node: ◦ graph.nodes.filter(Q("name", iexact=node)).elements[0] Create a relationship: ◦ relationship=node1.is_related_to(node2)
  • 15. neo4j-rest-client: Good and Bad Transaction support with a context manager* Strong filtering syntax Very strong labelling syntax – searchable tags for nodes Lazy evaluation of queries Still REST based – still difficult to make it perform *Seemingly. Somewhat difficult to make it work.
  • 16. Py2Neo vs Neo4j-Rest-Client: Performance 100 nodes with 20% connection: Loading: Py2Neo: ~8 seconds Neo4j-rest-client: ~5 seconds Postgres: 4s Retrieving: Py2Neo: ~6 seconds Neo4j-rest-client: ~5 seconds Postgres: 4s 1000 nodes with 20% connection: Loading: Py2Neo: ~7 minutes Neo4j-rest-client: ~50 minutes Postgres: 6 minutes Retrieving: Py2Neo: ~7 minutes Neo4j-rest-client: ~50 minutes Postgres: 6 minutes Machine: AWS Memory Optimised xLarge node (30GB RAM) on Ubuntu Server using iPython2 3.0.0 Important note Completely unoptimised! No indexes, no attempt to chunk, only a couple OS optimisations.
  • 17. OrientDB PyOrient: ◦ Official OrientDB Driver for Python ◦ Binary Driver ◦ Not Pythonic Released in 2011 More NoSQL than Neo and Titan (Documents as well as graphs) Scalable across multiple servers Supports SQL
  • 18. Titan First released in 2012 Written in Java Licenced under Apache Licence Many storage backends, including Cassandra, HBase and BerkeleyDB Hadoop integration Large amount of search back-ends Built for scalability Commercially supported by DataStax (formerly Aurelius)
  • 19. Titan and Python Mogwai: ◦ Written by Cody Lee of wellaware ◦ Binary Driver for RexPro Server ◦ Very pythonic! Bulbflow: ◦ Built by James Thornton of Pipem/Espeed ◦ REST-based interface ◦ Maintained to 8 months ago ◦ Connects to multiple backends
  • 20. RexPro and the Tinkerpop Stack Apache Incubator Open Source Graph Framework ◦ Built around Gremlin ◦ Written in Java ◦ Extensively documented
  • 21. Mogwai Performance 100 nodes with 20% connection: Loading: 14 seconds Retrieving: 18 seconds 1000 nodes with 20% connection: Loading: ~9 minutes Retrieving: ~25 minutes
  • 22. So, what should I use?* Neo4j: ◦ Good, relatively quick bindings ◦ Well supported ◦ Could be expensive ◦ May not scale *The full title of this slide is “What should I research further to ensure it meets my specific needs and then consider using?” In any case, the answer is still “It depends” It depends. Titan: ◦ Good bindings ◦ Support in doubt ◦ Should be cheaper ◦ Proven scalability Orient: ◦ Poor bindings ◦ Well supported ◦ Open pricing structure ◦ Should scale well
  • 23. What about Python Graph Databases? Not just Python bindings –pure(ish) Python. GrapheekDB: https://bitbucket.org/nidusfr/grapheekdb ◦ Uses local memory, Kyoto Cabinet or Symas LMDB as backend ◦ Under active development ◦ Exposes client/server interface ◦ Code is Beta quality at best ◦ Documentation is very spotty Ajgu: https://bitbucket.org/amirouche/ajgu-graphdb/ ◦ Uses Berkeley Database backend ◦ Under active development ◦ “This program is alpha becarful” ◦ Python 3 only
  • 24. Ajgu Set up a connection: ◦ graph = GraphDatabase(Storage('./BSDDB/graph')) Create a node: ◦ transaction = self.graph.transaction(sync=True) ◦ node = transaction.vertex.create(node) Find a node: ◦ transaction.vertex.label(start) Create a relationship: ◦ relationship=transaction.edge.create(node1,node2)
  • 25. Take-aways Graphs match plenty of data sets The big three Graph Databases are Neo4j, Titan and Orient All three have upsides and downsides – depending on the usecase. If you want to have a bit more fun, try Ajgu or Grapheek!
  • 27. Py2Neo: Performance and Transactional Support Large imports should be done in one transaction to decrease overhead: Graph.create(long_list_of_nodes_and_relationships) This kills the client (essentially hangs in string processing). So: for chunk in izip_longest(*[iter(iterator)]*size, fillvalue=''): try: chunk = chunk[0:chunk.index('')] except ValueError: pass try: self.graph.create(*chunk) except Exception as ex: pass #chunk dividing goes here We lose ACID at this point. What if this fails? Have to chunk it up again to find what failed.