SlideShare a Scribd company logo
Graph Processing with
Apache TinkerPop (incubating)
Jason Plurad
Software Engineer, IBM | Committer, Apache TinkerPop
โ€ข Project	Update
โ€ข Graph	Landscape
โ€ข A	Graph	Problem
โ€ข Hands-On	Graph
http://tinkerpop.apache.org
About	Me
โ€ข Twitter	@pluradj
โ€ข GitHub	@pluradj
โ€ข Open	channels
โ€“ TinkerPop	mailing	lists
โ€“ Titan	mailing	list
โ€“ Stack	Overflow
(Apache)	TinkerPop (incubating)
โ€ข 2009:	Inception
โ€ข 2012:	TinkerPop 2
โ€ข 2015:	Apache	Incubator
โ€ข 2016:	Top	Level	Project?
โ€“ TLP	VOTE	passed!
โ€“ Waiting	on	board	meeting	
to	establish	TLP
Podling Releases
โ€ข 3.0	โ€“ Major	refactor,	Java	8	lambda	expressions,
Gremlin	Server,	OLAP	graph	computers
โ€ข 3.1	โ€“ Hadoop	2	support,	persisted	RDDs
โ€ข 3.2	โ€“ OLAP	job	chaining,	OLAP	graph	filters,
performance	improvements
Common	graph	data	domains
โ€ข Social	Network	Analysis
โ€ข Configuration	Management	Database
โ€ข Master	Data	Management
โ€ข Recommendation	Engines
โ€ข Knowledge	Graphs
โ€ข Internet	of	Things
Property	Graph	and	Gremlin
โ€ข Structure
โ€“ Vertex
โ€“ Edge
โ€“ Properties
โ€ข Gremlin
โ€“ Domain	specific	language	(DSL)	for	graph
โ€“ Data	flow:	forward	and	backward
โ€“ Traversal	Steps
โ€“ Bindings	for	non-JVM	languages
Apache	TinkerPop
Graph	Computing	Framework
Graph	Landscape
โ€ข Graph	database	vs	Graph	processor
โ€“ OLTP	vs	OLAP
โ€“ Neighborhood	vs	whole	graph
โ€ข Multi-model:	not	the	only	store	in	your	app
IBM Graph (Beta)
โ€ข Managed	Graph-as-a-Service	(OLTP)
โ€ข Focus	on	your	data,	not	install	and	operations
โ€ข #sleepMore
http://ibm.biz/IBMGraph
What	is	this?
module.exports = xxxxxxx;
function xxxxxxx (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
while (++i < len) {
str = ch + str;
}
return str;
}
A	Graph	Problem:
Dependency	Management
โ€ข On	March	22,	2016	npm broke	the	Internet
โ€ข Left-pad	was	unpublished
โ€“ 11	lines	of	code
โ€“ WTFPL	license
โ€“ Hundreds	of	breaking	builds	per	minute
โ€“ http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
โ€ข Are	we	safe	with	Apache?
Questions	for	the	graph
โ€ข Which	dependencies	are	at	risk?
โ€ข Which	ones	should	be	refactored	to	avoid?
โ€ข Risk	factors
โ€“ Unsuitable	license
โ€“ Single	developer
โ€“ Too	little	code	/	Too	much	code
โ€“ Changes	too	frequently	/	Code	is	stagnant
โ€“ Nobody	else	is	using	it
Letโ€™s	go	for	a	ride!
Titan	(Aurelius)
โ€ข Pick	a	graph	database	for	OLTPโ€ฆ
โ€“ Apache	license	but	not	in	ASF
โ€ข Code	has	stagnated	in	the	open
โ€“ DataStax Enterprise	(DSE)	Graph
โ€“ Wide	open	opportunities
โ€ข Genesis	Graph	is	up	next!
โ€ข Apache	S2Graph	(incubating)
โ€ข Apache	Flink (Gelly)
โ€ข Apache	Solr (GraphQuery)
Apache	Spark	or	Apache	Giraph
โ€ข Pick	a	graph	processor	for	OLAPโ€ฆ
โ€“ Spark	is	the	new	hotness
โ€“ Giraph is	better	suited	for	gigantic	graphs
โ€ข By	using	Apache	TinkerPop and	Gremlin,	we	
can	use	either	one	seamlessly
Vagrant	and	Virtualbox
โ€ข Developers	donโ€™t	always	get	keys	to	the	cloud
โ€ข Virtual	machines	to	the	rescue
โ€“ Host:	16	GB	RAM	or	more
โ€“ 3-4	VMs	with	3	GB	RAM
โ€ข Prove	out	your	graph	algorithms	on	a	small	data	set	
before	wasting	time	on	a	big	data	set
Apache	Ambari
โ€ข Simple	install	for	Apache	Hadoop	and	related	
Apache	big	data	packages
โ€“ HDFS,	YARN,	MapReduce,	HBase,	Spark,	etc
โ€ข Management	and	monitoring	dashboard
โ€ข Enables	integration	of	other	software
Getting	the	data
โ€ข NPM	registry	runs	on	Apache	CouchDB
โ€ข Replication	in	Apache	CouchDB is	awesome
โ€“ https://skimdb.npmjs.com/registry
Transform	the	data
โ€ข Apache	CouchDB is	a	document	store
โ€ข Dependencies	are	graph	data
โ€ข Other	things	can	be	too
โ€“ Users
โ€“ Keywords
โ€“ License
โ€ข Graph	model	depends	on	the	questions	you	want	
to	ask	of	the	graph
NPM	Graph	Schema
Document
250K
Package
1.5M
Keyword
81K
License
2K
Person
125K
license
dependency
devDependency
Hands-On:	Gremlin	Console
https://asciinema.org/a/21qk1rn9yt6tt7sour9w9ynxn
The	GraphComputer
Anatomy	of	a	Vertex	Program
โ€ข Vertex-centric	graph	logic
โ€ข Parallel	execution	(BSP)
Out	of	the	box	Vertex	Programs
โ€ข Traversal
โ€ข BulkLoader
โ€ข BulkDumper
โ€ข PageRank
โ€ข PeerPressure
Hands-On:	Graph	Program
OLAP Traversal Sources
> graph = GraphFactory.open('conf/npmgraph-
olap.properties')
> g = graph.traversal().withComputer(SparkGraphComputer)
> g = graph.traversal().withComputer(GiraphGraphComputer)
Graph Statistics via TraversalVertexProgram
> g.V().count() // vertex count
> g.E().count() // edge count
> g.V().label().groupCount() // vertex label distribution
> g.E().label().groupCount() // edge label distribution
> g.V().properties().key().groupCount() // vertex
property distribution
Next	stop?	More	data!
โ€ข Graphs	are	for	connecting	data!
โ€ข Consume	data	from	GitHub
โ€“ User	data
โ€“ Static	code	analysis
โ€“ Code	usage	analysis
โ€ข Consume	data	from	Twitter
โ€“ Trending	news
โ€“ Security	alerts
Summary
โ€ข Apache	TinkerPop is	for	graph	computing
โ€ข OLTP	vs OLAP	is	an	important	distinction
โ€“ Gremlin	allows	you	to	seamless	bridge	the	two
โ€ข Graph	thinking	is	different	than	relational
โ€“ Is	the	future	multi-model?
โ€ข Many	opportunities	to	innovate	in	this	space
Acknowledgements
โ€ข Marko	Rodriguez
โ€“ Gremlin	language,	Gremlin	OLAP
โ€ข Ketrina Yim
โ€“ Illustrator,	creator	of	Gremlin	and	friends
โ€ข Stephen	Mallette
โ€“ TinkerPop release	manager,	Gremlin	applications
โ€ข Daniel	Kuppitz
โ€“ Gremlin	language	guru
โ€ข David	Robinson
โ€“ Big	data,	multi-model
architect/developer
Questions?
Thank	you!

More Related Content

What's hot (20)

PDF
Start Flying with Python & Apache TinkerPop
Jason Plurad
ย 
PDF
Community-Driven Graphs with JanusGraph
Jason Plurad
ย 
PDF
Graph Computing with JanusGraph
Jason Plurad
ย 
PDF
Graph Computing with JanusGraph
Jason Plurad
ย 
PDF
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
ย 
PPTX
Powers of Ten Redux
Jason Plurad
ย 
PPTX
Janus graph lookingbackwardreachingforward
Demai Ni
ย 
PDF
Graph Computing with Apache TinkerPop
Jason Plurad
ย 
PDF
Dataflow in 104corp - AWS UserGroup TW 2018
Gavin Lin
ย 
PDF
Dataflow in 104corp - DataConTW2018
Gavin Lin
ย 
PPTX
Presto@Netflix Presto Meetup 03-19-15
Zhenxiao Luo
ย 
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
ย 
PDF
Superset druid realtime
arupmalakar
ย 
PDF
Exploring Graph Use Cases with JanusGraph
Jason Plurad
ย 
PPTX
Big Data Pipeline and Analytics Platform
Sudhir Tonse
ย 
PDF
Vyacheslav Zholudev โ€“ Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
ย 
PPTX
Presto Talk @ Hadoop Summit'15
Nezih Yigitbasi
ย 
PPTX
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
ย 
PDF
The Evolution of Apache Kylin by Luke Han
Luke Han
ย 
PDF
Presto@Uber
Zhenxiao Luo
ย 
Start Flying with Python & Apache TinkerPop
Jason Plurad
ย 
Community-Driven Graphs with JanusGraph
Jason Plurad
ย 
Graph Computing with JanusGraph
Jason Plurad
ย 
Graph Computing with JanusGraph
Jason Plurad
ย 
JanusGraph: Looking Backward, Reaching Forward
Jason Plurad
ย 
Powers of Ten Redux
Jason Plurad
ย 
Janus graph lookingbackwardreachingforward
Demai Ni
ย 
Graph Computing with Apache TinkerPop
Jason Plurad
ย 
Dataflow in 104corp - AWS UserGroup TW 2018
Gavin Lin
ย 
Dataflow in 104corp - DataConTW2018
Gavin Lin
ย 
Presto@Netflix Presto Meetup 03-19-15
Zhenxiao Luo
ย 
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
ย 
Superset druid realtime
arupmalakar
ย 
Exploring Graph Use Cases with JanusGraph
Jason Plurad
ย 
Big Data Pipeline and Analytics Platform
Sudhir Tonse
ย 
Vyacheslav Zholudev โ€“ Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
ย 
Presto Talk @ Hadoop Summit'15
Nezih Yigitbasi
ย 
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
ย 
The Evolution of Apache Kylin by Luke Han
Luke Han
ย 
Presto@Uber
Zhenxiao Luo
ย 

Viewers also liked (20)

PDF
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Caleb Jones
ย 
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
Andrea Iacono
ย 
PDF
Titan: The Rise of Big Graph Data
Marko Rodriguez
ย 
PDF
Titan: Big Graph Data with Cassandra
Matthias Broecheler
ย 
PDF
Quantum Processes in Graph Computing
Marko Rodriguez
ย 
PDF
Titan: Scaling Graphs and TinkerPop3
Matthias Broecheler
ย 
PPTX
Neo, Titan & Cassandra
johnrjenson
ย 
PDF
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax
ย 
PDF
Faunus: Graph Analytics Engine
Marko Rodriguez
ย 
PDF
Traversing Graph Databases with Gremlin
Marko Rodriguez
ย 
PPTX
Introduction to Gremlin
Max De Marzi
ย 
PPT
Graph Processing Applications @ HUG
Praveen Sripati
ย 
PPTX
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
ย 
PPTX
Flink. Pure Streaming
Indizen Technologies
ย 
PDF
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax Academy
ย 
PDF
TinkerPop: a story of graphs, DBs, and graph DBs
Joshua Shinavier
ย 
PDF
The Gremlin in the Graph
Marko Rodriguez
ย 
PDF
Gremlin: A Graph-Based Programming Language
Marko Rodriguez
ย 
PDF
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Aravind Babu
ย 
PDF
Adding Value through graph analysis using Titan and Faunus
Matthias Broecheler
ย 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Caleb Jones
ย 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Andrea Iacono
ย 
Titan: The Rise of Big Graph Data
Marko Rodriguez
ย 
Titan: Big Graph Data with Cassandra
Matthias Broecheler
ย 
Quantum Processes in Graph Computing
Marko Rodriguez
ย 
Titan: Scaling Graphs and TinkerPop3
Matthias Broecheler
ย 
Neo, Titan & Cassandra
johnrjenson
ย 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax
ย 
Faunus: Graph Analytics Engine
Marko Rodriguez
ย 
Traversing Graph Databases with Gremlin
Marko Rodriguez
ย 
Introduction to Gremlin
Max De Marzi
ย 
Graph Processing Applications @ HUG
Praveen Sripati
ย 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Stefan Marr
ย 
Flink. Pure Streaming
Indizen Technologies
ย 
DataStax: What's New in Apache TinkerPop - the Graph Computing Framework
DataStax Academy
ย 
TinkerPop: a story of graphs, DBs, and graph DBs
Joshua Shinavier
ย 
The Gremlin in the Graph
Marko Rodriguez
ย 
Gremlin: A Graph-Based Programming Language
Marko Rodriguez
ย 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Aravind Babu
ย 
Adding Value through graph analysis using Titan and Faunus
Matthias Broecheler
ย 
Ad

Similar to Graph Processing with Apache TinkerPop (20)

PDF
ACM DBPL Keynote: The Graph Traversal Machine and Language
Marko Rodriguez
ย 
PPTX
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
ย 
PDF
BUILDING WHILE FLYING
Kamal Shannak
ย 
PDF
TinkerPop 2020
Joshua Shinavier
ย 
PDF
Microservices, containers, and machine learning
Paco Nathan
ย 
PPTX
Large Scale Graph Analytics with JanusGraph
DataWorks Summit
ย 
PDF
Introduction to TitanDB
Knoldus Inc.
ย 
PDF
DataDay 2023 Presentation - Notes
Max De Marzi
ย 
PDF
(ATS6-PLAT03) What's behind Discngine collections
BIOVIA
ย 
PDF
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
Joshua Shinavier
ย 
PDF
Know your dependencies
Janos Szendi-Varga
ย 
PDF
Dgraph: Graph database for production environment
openCypher
ย 
PDF
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
DATAVERSITY
ย 
PDF
Graph Algorithms - Map-Reduce Graph Processing
Jason J Pulikkottil
ย 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
ย 
PDF
From zero to gremlin hero - Part I
GraphRM
ย 
PDF
Scylla Summit 2016: Graph Processing with Titan and Scylla
ScyllaDB
ย 
PDF
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
ย 
PDF
Roadmap for Enterprise Graph Strategy
Neo4j
ย 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
ย 
ACM DBPL Keynote: The Graph Traversal Machine and Language
Marko Rodriguez
ย 
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
ย 
BUILDING WHILE FLYING
Kamal Shannak
ย 
TinkerPop 2020
Joshua Shinavier
ย 
Microservices, containers, and machine learning
Paco Nathan
ย 
Large Scale Graph Analytics with JanusGraph
DataWorks Summit
ย 
Introduction to TitanDB
Knoldus Inc.
ย 
DataDay 2023 Presentation - Notes
Max De Marzi
ย 
(ATS6-PLAT03) What's behind Discngine collections
BIOVIA
ย 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
Joshua Shinavier
ย 
Know your dependencies
Janos Szendi-Varga
ย 
Dgraph: Graph database for production environment
openCypher
ย 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
DATAVERSITY
ย 
Graph Algorithms - Map-Reduce Graph Processing
Jason J Pulikkottil
ย 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
ย 
From zero to gremlin hero - Part I
GraphRM
ย 
Scylla Summit 2016: Graph Processing with Titan and Scylla
ScyllaDB
ย 
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
ย 
Roadmap for Enterprise Graph Strategy
Neo4j
ย 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
ย 
Ad

Recently uploaded (20)

PPTX
B2C EXTRANET | EXTRANET WEBSITE | EXTRANET INTEGRATION
philipnathen82
ย 
PDF
>Nitro Pro Crack 14.36.1.0 + Keygen Free Download [Latest]
utfefguu
ย 
PDF
AI Software Development Process, Strategies and Challenges
Net-Craft.com
ย 
PDF
>Wondershare Filmora Crack Free Download 2025
utfefguu
ย 
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
ย 
PPTX
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
ย 
PPT
Information Communication Technology Concepts
LOIDAALMAZAN3
ย 
PDF
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
ย 
PDF
Rewards and Recognition (2).pdf
ethan Talor
ย 
PDF
Laboratory Workflows Digitalized and live in 90 days with Scifeonยดs SAPPA P...
info969686
ย 
PDF
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
ย 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
PDF
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
PPTX
How Can Recruitment Management Software Improve Hiring Efficiency?
HireME
ย 
PDF
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
ย 
PPTX
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
ย 
PPTX
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
PDF
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
ย 
B2C EXTRANET | EXTRANET WEBSITE | EXTRANET INTEGRATION
philipnathen82
ย 
>Nitro Pro Crack 14.36.1.0 + Keygen Free Download [Latest]
utfefguu
ย 
AI Software Development Process, Strategies and Challenges
Net-Craft.com
ย 
>Wondershare Filmora Crack Free Download 2025
utfefguu
ย 
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
ย 
CONCEPT OF PROGRAMMING in language .pptx
tamim41
ย 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
ย 
Information Communication Technology Concepts
LOIDAALMAZAN3
ย 
How DeepSeek Beats ChatGPT: Cost Comparison and Key Differences
sumitpurohit810
ย 
Rewards and Recognition (2).pdf
ethan Talor
ย 
Laboratory Workflows Digitalized and live in 90 days with Scifeonยดs SAPPA P...
info969686
ย 
Alur Perkembangan Software dan Jaringan Komputer
ssuser754303
ย 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
ย 
The Rise of Sustainable Mobile App Solutions by New York Development Firms
ostechnologies16
ย 
How Can Recruitment Management Software Improve Hiring Efficiency?
HireME
ย 
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
ย 
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
ย 
Wondershare Filmora Crack 14.5.18 + Key Full Download [Latest 2025]
HyperPc soft
ย 
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
ย 

Graph Processing with Apache TinkerPop