SlideShare a Scribd company logo
Graph Algebra
Graph operations in the language of linear algebra
1
Graph representation
1 2
3
2
Graph representation
Graph on top of:

1. tables (JanusGraph as on disk storage)
2. documents (ArangoDB)
Formal graph structure:
1. adjacency list (Neo4J, JanusGraph)
2. adjacency matrix (RedisGraph)
3
Adjacency matrix
0 1 1
0 0 1
0 0 0
A[i,j] = 1 if entity i is connected to j
0 otherwise.
4
Binary matrix
• 1 bit per cell
• Matrix addition binary OR

• Matrix multiplication binary AND
5
Binary matrix
1 bit per matrix cell
1,000,000 X 1,000,000
One trillion bits = 125GB
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
……………………………………………………….
6
Real world graphs
Most real world graphs are sparse
Facebook’s friendship graph
2 billion users
338 friends for user on average
2,000,000,000 * 338 / 2,000,000,000^2
0.000000169% utilisation
7
Sparse matrix
• Tracks nonzeros

• Assume zero for untracked entries
8
GraphBLAS
• Standard building blocks for graph algorithms in the
language of linear algebra

• Sparse Matrix-Matrix multiply

• Sparse Vector-Matrix multiply
9
SuiteSparse:GraphBLAS
Graph algorithms via sparse linear algebra over semirings
via	traditional	Breadth-First-Search:	
				for	each	i	in	current	level	
										for	each	edge	(i,j)	
														if	j	is	new	
																	add	j	to	next	level	...
Find	next	BFS	level:	just	one	masked	matrix-vector	multiply
Tim Davis, Texas A&M University
via	semiring:	
			y<mask>=A*x
SuiteSparse:GraphBLAS
• traversing	nodes	and	edges	one	a	time:	no	scope	for	library	optimization	
• linear	algebra:	“bulk”	work	can	be	given	to	a	library	
• let	the	experts	write	the	library	kernels:	fast,	robust,	portable	performance	
• composable	linear	algebra:	associative,	distributive,	(AB)T=BTAT,	...
Tim Davis, Texas A&M University
Why	GraphBLAS?
Outline
Graph algorithms in the language of linear algebra
Consider C=A*B on a semiring
Semiring: add and multiply operators, and additive identity
Example: with OR-AND semiring: A and B are adjacency matrices of two graphs
C=A*B: contains edge (i, j) if nodes i and j share any neighbor in common
Shortest paths via MIN-PLUS semiring
Graph object is opaque; can exploit lazy evaluation
The GraphBLAS Spec: graphblas.org
SuiteSparse:GraphBLAS implementation and performance
Why graph algorithms with linear algebra?
powerful way of expressing graph algorithms with large, “bulk” operations on
adjaceny matrices. No need to chase nodes and edges.
linear algebra with semirings: composable operations, like (AB)C = A(BC)
lower software complexity: let the experts write the core graph kernels
simple object for complex problems: a sparse matrix with any data type, including
user-defined
security: encrypt/decrypt via linear algebra and binary operators
mathematically well-defined graph object, closed under operations
performance: serial, parallel, GPU, ... let the library optimize large “bulk”
graph/matrix operators
RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph
Breadth-first search example
A(i, j) = 1 for edge (j, i)
A is binary; dot (.) is zero for clarity.
. . . 1 . . .
1 . . . . . .
. . . 1 . 1 1
1 . . . . . 1
. 1 . . . . 1
. . 1 . 1 . .
. 1 . . . . .
Breadth-first search: initializations
v = zeros (n,1) ; // result
q = false (n,1) ; // current level
q (source) = true ;
v: q:
. .
. .
. .
. 1
. .
. .
. .
GrB assign (v, q, NULL, level, GrB ALL, n, NULL)
v <q> = level ; // assign level
v: q:
. .
. .
. .
1 1
. .
. .
. .
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
first part of q<!v>=A*q:
t = A*q ;
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
second part of q<!v>=A*q:
q = false (n,1) ;
q <!v> = t ;
v: t=A*q: q<!v>=t
. 1 1
. . .
. 1 1
1 . .
. . .
. . .
. . .
GrB assign (v, q, NULL, level, GrB ALL, n, NULL)
v <q> = level ; // assign level
v: q:
2 1
. .
2 1
1 .
. .
. .
. .
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
first part of q<!v>=A*q:
t = A*q ;
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
second part of q<!v>=A*q:
q = false (n,1) ;
q <!v> = t ;
v: t=A*q: q<!v>=t
2 . .
. 1 1
2 . .
1 1 .
. . .
. 1 1
. . .
GrB assign (v, q, NULL, level, GrB ALL, n, NULL)
v <q> = level ; // assign level
v: q:
2 .
3 1
2 .
1 .
. .
3 1
. .
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
first part of q<!v>=A*q:
t = A*q ;
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
second part of q<!v>=A*q:
q = false (n,1) ;
q <!v> = t ;
v: t=A*q: q<!v>=t
2 . .
3 . .
2 1 .
1 . .
. 1 1
3 . .
. 1 1
GrB assign (v, q, NULL, level, GrB ALL, n, NULL)
v <q> = level ; // assign level
v: q:
2 .
3 .
2 .
1 .
4 1
3 .
4 1
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
first part of q<!v>=A*q:
t = A*q ;
GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc)
second part of q<!v>=A*q:
q = false (n,1) ;
q <!v> = t ;
v: t=A*q: q<!v>=t
2 . .
3 . .
2 1 .
1 1 .
4 1 .
3 1 .
4 . .
GraphBLAS operations: overview
operation MATLAB GraphBLAS
analog extras
matrix multiplication C=A*B 960 built-in semirings
element-wise, set union C=A+B any operator
element-wise, set intersection C=A.*B any operator
reduction to vector or scalar s=sum(A) any operator
apply unary operator C=-A C=f(A)
transpose C=A’
submatrix extraction C=A(I,J)
submatrix assignment C(I,J)=A zombies and pending tuples
C=A*B with 960 built-in semirings, and each matrix one of 11 types: GraphBLAS has
960 ⇥ 113 = 1, 277, 760 built-in versions of matrix multiply. MATLAB has 4. Arbitrary
user-defined types, operators, monoids, and semirings can be created at run time.
GraphBLAS objects
GrB_Type 11 built-in types, “any” user-defined type
GrB_UnaryOp unary operator such as z = x
GrB_BinaryOp binary operator such as z = x + y
GrB_Monoid associative operator like z = x + y with identity 0
GrB_Semiring a multiply operator and additive monoid
GrB_Vector like an n-by-1 matrix
GrB_Matrix a sparse m-by-n matrix
GrB_Descriptor parameter settings
all objects opaque; allows for internal optimization
matrices in compressed-sparse column (CSC) form, with sorted indices
non-blocking mode; matrix can have pending operations
all operations can take an optional mask: like a bulk if statement, ChMi = ...
and an optional accumulator operator: C = C ...
GraphBLAS operations
GrB_mxm matrix-matrix multiply ChMi = C AB
GrB_vxm vector-matrix multiply w0
hm0
i = w0
u0
A
GrB_mxv matrix-vector multiply whmi = w Au
GrB_eWiseMult element-wise, ChMi = C (A ⌦ B)
set union whmi = w (u ⌦ v)
GrB_eWiseAdd element-wise, ChMi = C (A B)
set intersection whmi = w (u v)
GrB_extract extract submatrix ChMi = C A(i, j)
whmi = w u(i)
GrB_assign assign submatrix C(i, j)hMi = C(i, j) A
w(i)hmi = w(i) u
GrB_apply apply unary operator ChMi = C f (A)
whmi = w f (u)
GrB_reduce reduce to vector whmi = w [ j A(:, j)]
reduce to scalar s = s [ ij A(i, j)]
GrB_transpose transpose ChMi = C A0
Operations: C(I,J)=A, submatrix/subgraph assignment
hardest function to implement
modifies C in place
costly to modify the matrix/graph, so operations are left pending
zombies: edges/entries still in graph/matrix but marked for deletion
pending tuples: unsorted list of edges/entries to be added to graph/matrix
Building a graph: all at once
Creating a matrix from list of tuples: fast in GraphBLAS:
for (int k = 0 ; k < nz ; k++)
{
I [k] = simple_rand_i ( ) % nrows ;
J [k] = simple_rand_i ( ) % ncols ;
X [k] = simple_rand_x ( ) ;
}
GrB_Matrix A ;
GrB_Matrix_new (&A, GrB_FP64, nrows, ncols) ;
GrB_Matrix_build (A, I, J, X, nz, GrB_SECOND_FP64) ;
Just as fast in MATLAB:
for k = 1:nz
I (k) = randi (nrows) ;
J (k) = randi (ncols) ;
X (k) = rand ( ) ;
end
A = sparse (I,J,X, nrows,ncols) ;
Building a graph: incremental
One element at a time: fast in GraphBLAS:
GrB_Matrix A ;
GrB_Matrix_new (&A, GrB_FP64, nrows, ncols) ;
for (int k = 0 ; k < nz ; k++)
{
GrB_Index i = simple_rand_i ( ) % nrows ;
GrB_Index j = simple_rand_i ( ) % ncols ;
double x = simple_rand_x ( ) ;
// A (i,j) = x
GrB_Matrix_setElement (A, x, i, j) ;
}
Impossibly slow in MATLAB:
A = sparse (nrows,ncols) ; % an empty sparse matrix
for k = 1:nz
i = randi (nrows) ;
j = randi (ncols) ;
A (i,j) = rand ( ) ;
end
GraphBLAS performance: C(I,J)=A
Submatrix assignment
Example: C is the Freescale2 matrix, 3 million by 3 million with 14.3 million
nonzeros
I = randperm (n,5500)
J = randperm (n,7000)
A = random sparse matrix with 38,500 nonzeros
C(I,J) = A
87 seconds in MATLAB
0.74 seconds in GraphBLAS, without exploiting blocking mode, via GrB_assign
Summary
GraphBLAS: graph algorithms in the language of linear algebra
“Sparse-anything” matrices, including user-defined types
matrix multiplication with any semiring
operations: C=A*B, C=A+B, reduction, transpose, accumulator/mask, submatrix
extraction and assigment
performance: most operations just as fast as MATLAB, submatrix assignment
100x or faster.
Version 2.0.1 available at suitesparse.com, Debian, Ubuntu, Mac HomeBrew, ...
RedisGraph
37
Friend of friend
MATCH (src)-[:friend]->(f)-[:friend]-(fof)
WHERE src.age > 30
RETURN fof
src f fof
friend friend
38
Execution plan
MATCH

(src)-[:friend]->(f)-[:friend]->(fof)
WHERE src.age > 30
RETURN fof
Index scan
Expand
Expand
Project
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
39
Execution plan
Index scan
Expand
Expand
Project
Entity ID 5
40
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
5 connected to 2
41
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
2 connected to 9
42
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
ProjectProject 9
43
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
2 connected to 1
44
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
ProjectProject 1
45
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
2 depleted
46
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
5 depleted
47
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
Index scan
Expand
Expand
Project
Entity ID 8
48
src.age > 30
(src)-[:friend]->(f)
(f)-[:friend]->(fof)
RETURN fof
Execution plan
• Serial

• Random memory access 

• Discovers one entity at a time
49
RedisGraph &
GraphBLAS
50
OpenCypher
to

linear algebra expression
51
MATCH

(src)-[:friend]->(f)-[:friend]->(fof)
WHERE src.age > 30
RETURN fof
=
Age_Filter * Friendship * Friendship
52
1
5
4
2
63
53
0 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 1
Age Filter
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
* *
54
Matrix multiplication
is associative
(A*B)*C = A*(B*C)
55
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
*
1 0 1 0 0 0
1 0 0 0 1 1
1 1 1 1 1 0
1 0 1 0 0 0
1 1 0 0 1 1
0 0 1 0 1 0
Friendships ^2
=
NNZ = 18
56
Age Filter
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
0 0 0 0 0 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
0 0 0 0 0 0
0 1 0 1 0 0
Filtered friendships

src > 30
* =
NNZ = 7
0 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 1
57
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 0
Friendships
0 0 0 0 0 0
0 0 1 0 0 0
1 0 0 0 1 1
0 0 0 0 1 0
0 0 0 0 0 0
0 1 0 1 0 0
Filtered friendships

src > 30
* =
0 0 0 0 0 0
1 0 0 0 1 1
1 1 1 1 1 0
1 0 1 0 0 0
0 0 0 0 0 0
0 0 1 0 1 0
FOF
58
0 0 0 0 0 0
1 0 0 0 1 1
1 1 1 1 1 0
1 0 1 0 0 0
0 0 0 0 0 0
0 0 1 0 1 0
1
5
4
2
63
59
0 0 0 0 0 0
1 0 0 0 1 1
1 1 1 1 1 0
1 0 1 0 0 0
0 0 0 0 0 0
0 0 1 0 1 0
1
5
4
2
63
60
Friend of friend
variable length
MATCH (src)-[:friend*2..4]->(fof)
WHERE src.age > 30
RETURN fof
src F2 fof
friend
F3 F4
61
MATCH (src)-[:friend*2..4]->(fof)
WHERE src.age > 30
RETURN fof
=
Age_Filter * (Friendship^2 + Friendship^3 + Friendship^4)


=
M = AF;

R = 0;

For i=0; i < 3; i++

M = M*F

R = R+M
62
0 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 1
Age filter
1 1 1 0 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 0 1 1
1 1 1 1 1 1
1 0 1 0 1 1
Friendships^2 + Friendships^3
1 1 1 0 1 1
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 0 1 1
0 0 0 0 0 0
1 0 1 0 1 1
Friendships
* =
63
1
5
4
2
63
1 1 1 0 1 1
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 0 1 1
0 0 0 0 0 0
1 0 1 0 1 1
64
Additional algorithms
• Connected Components

• Shortest paths

• Minimum spanning tree
65
Graph distribution
Block multiplication
A*B=C
A B C
A1
A3 A4
A2 B1 B2
B4B3
C1 C2
C3 C4
66
Graph distribution
Block multiplication
A*B=C
A B C
A1
A3 A4
A2 B1 B2
B4B3
A1*B1+

A2*B3
A1*B2+

A2*B4
A3*B1+

A4*B3
A3*B2+

A4*B4
67
Parallelize
• CuSPARSE - GPU

• OpenMP - CPU
68
Benchmarks
69
Benchmarking graph databases on the problem of community detection paper

Reports a comprehensive comparative evaluation

between three popular graph databases, Titan, OrientDB and Neo4j.

For evaluation they’ve used real data derived from the SNAP dataset collection.

All experiments were run on an Intel Core i7 at 3.5Ghz with 16GB of main memory

and a 1.4 TB hard disk, the OS being Ubuntu Linux 12.04 (64bit).

We’ve performed the same benchmarks against RedisGraph, using inferior hardware.
Benchmarks
70
Massive Insertion Workload (MIW)
Create the graph database and configure it for massive loading.

Populate it with a particular dataset.

Measure the time for the creation of the whole graph.
All the measurements are in seconds

Dataset contains 1134890 nodes and 2987624 edges
RedisGraph
Titan
OrientDB
Neo4j
0 75 150 225 300
24.69
252.15
104.27
0.53
Benchmarks
71
Query Workload FindNeighbours (FN)

finds the neighbours of all nodes
All the measurements are in seconds

Dataset contains 1134890 nodes and 2987624 edges
RedisGraph
Titan
OrientDB
Neo4j
0 7.5 15 22.5 30
4.51
9.34
20.71
0.05
Benchmarks
72
Query Workload FindAdjacentNodes (FA)

finds the adjacent nodes of all edges.
All the measurements are in seconds

Dataset contains 1134890 nodes and 2987624 edges
RedisGraph
Titan
OrientDB
Neo4j
0 12.5 25 37.5 50
1.46
6.15
42.82
0.05
Benchmarks
73
Query Workload FindShortestPath (FS)

Finds the shortest path between the first node and 100 randomly picked nodes.
All the measurements are in seconds

Dataset contains 1134890 nodes and 2987624 edges
RedisGraph
Titan
OrientDB
Neo4j
0 7.5 15 22.5 30
0.08
23.47
24.87
0.001
Thank You
@roilipman

davis@tamu.edu
74
Ad

Recommended

PDF
RedisConf17 - Redis Graph
Redis Labs
 
PDF
Intro to Cassandra
DataStax Academy
 
PDF
Introduction to Cassandra
Gokhan Atil
 
PPTX
Ceph and Openstack in a Nutshell
Karan Singh
 
PDF
Map reduce vs spark
Tudor Lapusan
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PPTX
Query Compilation in Impala
Cloudera, Inc.
 
PPTX
Appache Cassandra
nehabsairam
 
PPTX
Kafka Retry and DLQ
George Teo
 
PDF
Indexing and Performance Tuning
MongoDB
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
PPTX
Introduction to Redis
Maarten Smeets
 
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugabyteDB
 
PDF
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
PDF
Dreaming Infrastructure
kyhpudding
 
PDF
Optimizing S3 Write-heavy Spark workloads
datamantra
 
PPTX
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
Spark
Amir Payberah
 
PPTX
Apache Spark overview
DataArt
 
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
PPTX
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
KEY
Redis overview for Software Architecture Forum
Christopher Spring
 
PPT
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
PPTX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
PDF
Spark overview
Lisa Hua
 
PPT
Constraints In Sql
Anurag
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PPTX
Intro to matlab
Norhan Mohamed
 
PPT
Intro matlab and convolution islam
Islam Alabbasy
 

More Related Content

What's hot (20)

PPTX
Kafka Retry and DLQ
George Teo
 
PDF
Indexing and Performance Tuning
MongoDB
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
PPTX
Introduction to Redis
Maarten Smeets
 
PDF
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugabyteDB
 
PDF
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
PDF
Dreaming Infrastructure
kyhpudding
 
PDF
Optimizing S3 Write-heavy Spark workloads
datamantra
 
PPTX
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
Spark
Amir Payberah
 
PPTX
Apache Spark overview
DataArt
 
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
PPTX
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
KEY
Redis overview for Software Architecture Forum
Christopher Spring
 
PPT
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
PPTX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
PDF
Spark overview
Lisa Hua
 
PPT
Constraints In Sql
Anurag
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Kafka Retry and DLQ
George Teo
 
Indexing and Performance Tuning
MongoDB
 
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Introduction to Redis
Maarten Smeets
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugabyteDB
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Dreaming Infrastructure
kyhpudding
 
Optimizing S3 Write-heavy Spark workloads
datamantra
 
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Apache Spark overview
DataArt
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Redis overview for Software Architecture Forum
Christopher Spring
 
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
Spark overview
Lisa Hua
 
Constraints In Sql
Anurag
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 

Similar to RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph (20)

PPTX
Intro to matlab
Norhan Mohamed
 
PPT
Intro matlab and convolution islam
Islam Alabbasy
 
DOCX
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
mydrynan
 
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Austin Benson
 
PDF
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
PPTX
Fundamentals of Image Processing & Computer Vision with MATLAB
Ali Ghanbarzadeh
 
PDF
Python grass
Margherita Di Leo
 
PDF
Alpine Spark Implementation - Technical
alpinedatalabs
 
PDF
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
PPT
Matlab1
guest8ba004
 
PPT
CS 354 Transformation, Clipping, and Culling
Mark Kilgard
 
PDF
Subtle Asynchrony by Jeff Hammond
Patrick Diehl
 
PDF
Revision1schema C programming
Kho コー。イエー。イエン
 
PDF
Problemas resueltos de funciones lineales ccesa007
Demetrio Ccesa Rayme
 
PPT
Introduction to MATLAB
Damian T. Gordon
 
PDF
MATLAB Questions and Answers.pdf
ahmed8651
 
PPTX
mat lab introduction and basics to learn
pavan373
 
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Austin Benson
 
PPTX
matlab presentation fro engninering students
SyedSadiq73
 
PPT
CS 354 Pixel Updating
Mark Kilgard
 
Intro to matlab
Norhan Mohamed
 
Intro matlab and convolution islam
Islam Alabbasy
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
mydrynan
 
Tall-and-skinny Matrix Computations in MapReduce (ICME MR 2013)
Austin Benson
 
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
Fundamentals of Image Processing & Computer Vision with MATLAB
Ali Ghanbarzadeh
 
Python grass
Margherita Di Leo
 
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Matlab1
guest8ba004
 
CS 354 Transformation, Clipping, and Culling
Mark Kilgard
 
Subtle Asynchrony by Jeff Hammond
Patrick Diehl
 
Revision1schema C programming
Kho コー。イエー。イエン
 
Problemas resueltos de funciones lineales ccesa007
Demetrio Ccesa Rayme
 
Introduction to MATLAB
Damian T. Gordon
 
MATLAB Questions and Answers.pdf
ahmed8651
 
mat lab introduction and basics to learn
pavan373
 
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Austin Benson
 
matlab presentation fro engninering students
SyedSadiq73
 
CS 354 Pixel Updating
Mark Kilgard
 
Ad

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Redis Day Bangalore 2020 - Session state caching with redis
Redis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Labs
 
Ad

Recently uploaded (20)

PPTX
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
PDF
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PDF
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
PDF
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
PDF
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
PDF
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
PDF
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
The Growing Value and Application of FME & GenAI
Safe Software
 

RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph

  • 1. Graph Algebra Graph operations in the language of linear algebra 1
  • 3. Graph representation Graph on top of:
 1. tables (JanusGraph as on disk storage) 2. documents (ArangoDB) Formal graph structure: 1. adjacency list (Neo4J, JanusGraph) 2. adjacency matrix (RedisGraph) 3
  • 4. Adjacency matrix 0 1 1 0 0 1 0 0 0 A[i,j] = 1 if entity i is connected to j 0 otherwise. 4
  • 5. Binary matrix • 1 bit per cell • Matrix addition binary OR
 • Matrix multiplication binary AND 5
  • 6. Binary matrix 1 bit per matrix cell 1,000,000 X 1,000,000 One trillion bits = 125GB ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. ………………………………………………………. 6
  • 7. Real world graphs Most real world graphs are sparse Facebook’s friendship graph 2 billion users 338 friends for user on average 2,000,000,000 * 338 / 2,000,000,000^2 0.000000169% utilisation 7
  • 8. Sparse matrix • Tracks nonzeros • Assume zero for untracked entries 8
  • 9. GraphBLAS • Standard building blocks for graph algorithms in the language of linear algebra • Sparse Matrix-Matrix multiply • Sparse Vector-Matrix multiply 9
  • 10. SuiteSparse:GraphBLAS Graph algorithms via sparse linear algebra over semirings via traditional Breadth-First-Search: for each i in current level for each edge (i,j) if j is new add j to next level ... Find next BFS level: just one masked matrix-vector multiply Tim Davis, Texas A&M University via semiring: y<mask>=A*x
  • 11. SuiteSparse:GraphBLAS • traversing nodes and edges one a time: no scope for library optimization • linear algebra: “bulk” work can be given to a library • let the experts write the library kernels: fast, robust, portable performance • composable linear algebra: associative, distributive, (AB)T=BTAT, ... Tim Davis, Texas A&M University Why GraphBLAS?
  • 12. Outline Graph algorithms in the language of linear algebra Consider C=A*B on a semiring Semiring: add and multiply operators, and additive identity Example: with OR-AND semiring: A and B are adjacency matrices of two graphs C=A*B: contains edge (i, j) if nodes i and j share any neighbor in common Shortest paths via MIN-PLUS semiring Graph object is opaque; can exploit lazy evaluation The GraphBLAS Spec: graphblas.org SuiteSparse:GraphBLAS implementation and performance
  • 13. Why graph algorithms with linear algebra? powerful way of expressing graph algorithms with large, “bulk” operations on adjaceny matrices. No need to chase nodes and edges. linear algebra with semirings: composable operations, like (AB)C = A(BC) lower software complexity: let the experts write the core graph kernels simple object for complex problems: a sparse matrix with any data type, including user-defined security: encrypt/decrypt via linear algebra and binary operators mathematically well-defined graph object, closed under operations performance: serial, parallel, GPU, ... let the library optimize large “bulk” graph/matrix operators
  • 15. Breadth-first search example A(i, j) = 1 for edge (j, i) A is binary; dot (.) is zero for clarity. . . . 1 . . . 1 . . . . . . . . . 1 . 1 1 1 . . . . . 1 . 1 . . . . 1 . . 1 . 1 . . . 1 . . . . .
  • 16. Breadth-first search: initializations v = zeros (n,1) ; // result q = false (n,1) ; // current level q (source) = true ; v: q: . . . . . . . 1 . . . . . .
  • 17. GrB assign (v, q, NULL, level, GrB ALL, n, NULL) v <q> = level ; // assign level v: q: . . . . . . 1 1 . . . . . .
  • 18. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) first part of q<!v>=A*q: t = A*q ;
  • 19. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) second part of q<!v>=A*q: q = false (n,1) ; q <!v> = t ; v: t=A*q: q<!v>=t . 1 1 . . . . 1 1 1 . . . . . . . . . . .
  • 20. GrB assign (v, q, NULL, level, GrB ALL, n, NULL) v <q> = level ; // assign level v: q: 2 1 . . 2 1 1 . . . . . . .
  • 21. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) first part of q<!v>=A*q: t = A*q ;
  • 22. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) second part of q<!v>=A*q: q = false (n,1) ; q <!v> = t ; v: t=A*q: q<!v>=t 2 . . . 1 1 2 . . 1 1 . . . . . 1 1 . . .
  • 23. GrB assign (v, q, NULL, level, GrB ALL, n, NULL) v <q> = level ; // assign level v: q: 2 . 3 1 2 . 1 . . . 3 1 . .
  • 24. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) first part of q<!v>=A*q: t = A*q ;
  • 25. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) second part of q<!v>=A*q: q = false (n,1) ; q <!v> = t ; v: t=A*q: q<!v>=t 2 . . 3 . . 2 1 . 1 . . . 1 1 3 . . . 1 1
  • 26. GrB assign (v, q, NULL, level, GrB ALL, n, NULL) v <q> = level ; // assign level v: q: 2 . 3 . 2 . 1 . 4 1 3 . 4 1
  • 27. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) first part of q<!v>=A*q: t = A*q ;
  • 28. GrB mxv (q, v, NULL, GxB LOR LAND BOOL, A, q, desc) second part of q<!v>=A*q: q = false (n,1) ; q <!v> = t ; v: t=A*q: q<!v>=t 2 . . 3 . . 2 1 . 1 1 . 4 1 . 3 1 . 4 . .
  • 29. GraphBLAS operations: overview operation MATLAB GraphBLAS analog extras matrix multiplication C=A*B 960 built-in semirings element-wise, set union C=A+B any operator element-wise, set intersection C=A.*B any operator reduction to vector or scalar s=sum(A) any operator apply unary operator C=-A C=f(A) transpose C=A’ submatrix extraction C=A(I,J) submatrix assignment C(I,J)=A zombies and pending tuples C=A*B with 960 built-in semirings, and each matrix one of 11 types: GraphBLAS has 960 ⇥ 113 = 1, 277, 760 built-in versions of matrix multiply. MATLAB has 4. Arbitrary user-defined types, operators, monoids, and semirings can be created at run time.
  • 30. GraphBLAS objects GrB_Type 11 built-in types, “any” user-defined type GrB_UnaryOp unary operator such as z = x GrB_BinaryOp binary operator such as z = x + y GrB_Monoid associative operator like z = x + y with identity 0 GrB_Semiring a multiply operator and additive monoid GrB_Vector like an n-by-1 matrix GrB_Matrix a sparse m-by-n matrix GrB_Descriptor parameter settings all objects opaque; allows for internal optimization matrices in compressed-sparse column (CSC) form, with sorted indices non-blocking mode; matrix can have pending operations all operations can take an optional mask: like a bulk if statement, ChMi = ... and an optional accumulator operator: C = C ...
  • 31. GraphBLAS operations GrB_mxm matrix-matrix multiply ChMi = C AB GrB_vxm vector-matrix multiply w0 hm0 i = w0 u0 A GrB_mxv matrix-vector multiply whmi = w Au GrB_eWiseMult element-wise, ChMi = C (A ⌦ B) set union whmi = w (u ⌦ v) GrB_eWiseAdd element-wise, ChMi = C (A B) set intersection whmi = w (u v) GrB_extract extract submatrix ChMi = C A(i, j) whmi = w u(i) GrB_assign assign submatrix C(i, j)hMi = C(i, j) A w(i)hmi = w(i) u GrB_apply apply unary operator ChMi = C f (A) whmi = w f (u) GrB_reduce reduce to vector whmi = w [ j A(:, j)] reduce to scalar s = s [ ij A(i, j)] GrB_transpose transpose ChMi = C A0
  • 32. Operations: C(I,J)=A, submatrix/subgraph assignment hardest function to implement modifies C in place costly to modify the matrix/graph, so operations are left pending zombies: edges/entries still in graph/matrix but marked for deletion pending tuples: unsorted list of edges/entries to be added to graph/matrix
  • 33. Building a graph: all at once Creating a matrix from list of tuples: fast in GraphBLAS: for (int k = 0 ; k < nz ; k++) { I [k] = simple_rand_i ( ) % nrows ; J [k] = simple_rand_i ( ) % ncols ; X [k] = simple_rand_x ( ) ; } GrB_Matrix A ; GrB_Matrix_new (&A, GrB_FP64, nrows, ncols) ; GrB_Matrix_build (A, I, J, X, nz, GrB_SECOND_FP64) ; Just as fast in MATLAB: for k = 1:nz I (k) = randi (nrows) ; J (k) = randi (ncols) ; X (k) = rand ( ) ; end A = sparse (I,J,X, nrows,ncols) ;
  • 34. Building a graph: incremental One element at a time: fast in GraphBLAS: GrB_Matrix A ; GrB_Matrix_new (&A, GrB_FP64, nrows, ncols) ; for (int k = 0 ; k < nz ; k++) { GrB_Index i = simple_rand_i ( ) % nrows ; GrB_Index j = simple_rand_i ( ) % ncols ; double x = simple_rand_x ( ) ; // A (i,j) = x GrB_Matrix_setElement (A, x, i, j) ; } Impossibly slow in MATLAB: A = sparse (nrows,ncols) ; % an empty sparse matrix for k = 1:nz i = randi (nrows) ; j = randi (ncols) ; A (i,j) = rand ( ) ; end
  • 35. GraphBLAS performance: C(I,J)=A Submatrix assignment Example: C is the Freescale2 matrix, 3 million by 3 million with 14.3 million nonzeros I = randperm (n,5500) J = randperm (n,7000) A = random sparse matrix with 38,500 nonzeros C(I,J) = A 87 seconds in MATLAB 0.74 seconds in GraphBLAS, without exploiting blocking mode, via GrB_assign
  • 36. Summary GraphBLAS: graph algorithms in the language of linear algebra “Sparse-anything” matrices, including user-defined types matrix multiplication with any semiring operations: C=A*B, C=A+B, reduction, transpose, accumulator/mask, submatrix extraction and assigment performance: most operations just as fast as MATLAB, submatrix assignment 100x or faster. Version 2.0.1 available at suitesparse.com, Debian, Ubuntu, Mac HomeBrew, ...
  • 38. Friend of friend MATCH (src)-[:friend]->(f)-[:friend]-(fof) WHERE src.age > 30 RETURN fof src f fof friend friend 38
  • 39. Execution plan MATCH
 (src)-[:friend]->(f)-[:friend]->(fof) WHERE src.age > 30 RETURN fof Index scan Expand Expand Project src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof 39
  • 40. Execution plan Index scan Expand Expand Project Entity ID 5 40 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 41. Execution plan Index scan Expand Expand Project 5 connected to 2 41 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 42. Execution plan Index scan Expand Expand Project 2 connected to 9 42 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 43. Execution plan Index scan Expand Expand ProjectProject 9 43 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 44. Execution plan Index scan Expand Expand Project 2 connected to 1 44 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 45. Execution plan Index scan Expand Expand ProjectProject 1 45 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 46. Execution plan Index scan Expand Expand Project 2 depleted 46 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 47. Execution plan Index scan Expand Expand Project 5 depleted 47 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 48. Execution plan Index scan Expand Expand Project Entity ID 8 48 src.age > 30 (src)-[:friend]->(f) (f)-[:friend]->(fof) RETURN fof
  • 49. Execution plan • Serial • Random memory access • Discovers one entity at a time 49
  • 52. MATCH
 (src)-[:friend]->(f)-[:friend]->(fof) WHERE src.age > 30 RETURN fof = Age_Filter * Friendship * Friendship 52
  • 54. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Age Filter 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships * * 54
  • 56. 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships * 1 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 Friendships ^2 = NNZ = 18 56
  • 57. Age Filter 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 Filtered friendships
 src > 30 * = NNZ = 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 57
  • 58. 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 Friendships 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 Filtered friendships
 src > 30 * = 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 FOF 58
  • 59. 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 5 4 2 63 59
  • 60. 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 5 4 2 63 60
  • 61. Friend of friend variable length MATCH (src)-[:friend*2..4]->(fof) WHERE src.age > 30 RETURN fof src F2 fof friend F3 F4 61
  • 62. MATCH (src)-[:friend*2..4]->(fof) WHERE src.age > 30 RETURN fof = Age_Filter * (Friendship^2 + Friendship^3 + Friendship^4) 
 = M = AF;
 R = 0;
 For i=0; i < 3; i++
 M = M*F
 R = R+M 62
  • 63. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Age filter 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 Friendships^2 + Friendships^3 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 1 1 Friendships * = 63
  • 64. 1 5 4 2 63 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 1 0 1 1 64
  • 65. Additional algorithms • Connected Components • Shortest paths • Minimum spanning tree 65
  • 66. Graph distribution Block multiplication A*B=C A B C A1 A3 A4 A2 B1 B2 B4B3 C1 C2 C3 C4 66
  • 67. Graph distribution Block multiplication A*B=C A B C A1 A3 A4 A2 B1 B2 B4B3 A1*B1+
 A2*B3 A1*B2+
 A2*B4 A3*B1+
 A4*B3 A3*B2+
 A4*B4 67
  • 68. Parallelize • CuSPARSE - GPU • OpenMP - CPU 68
  • 69. Benchmarks 69 Benchmarking graph databases on the problem of community detection paper Reports a comprehensive comparative evaluation
 between three popular graph databases, Titan, OrientDB and Neo4j. For evaluation they’ve used real data derived from the SNAP dataset collection. All experiments were run on an Intel Core i7 at 3.5Ghz with 16GB of main memory
 and a 1.4 TB hard disk, the OS being Ubuntu Linux 12.04 (64bit). We’ve performed the same benchmarks against RedisGraph, using inferior hardware.
  • 70. Benchmarks 70 Massive Insertion Workload (MIW) Create the graph database and configure it for massive loading. Populate it with a particular dataset. Measure the time for the creation of the whole graph. All the measurements are in seconds
 Dataset contains 1134890 nodes and 2987624 edges RedisGraph Titan OrientDB Neo4j 0 75 150 225 300 24.69 252.15 104.27 0.53
  • 71. Benchmarks 71 Query Workload FindNeighbours (FN)
 finds the neighbours of all nodes All the measurements are in seconds
 Dataset contains 1134890 nodes and 2987624 edges RedisGraph Titan OrientDB Neo4j 0 7.5 15 22.5 30 4.51 9.34 20.71 0.05
  • 72. Benchmarks 72 Query Workload FindAdjacentNodes (FA)
 finds the adjacent nodes of all edges. All the measurements are in seconds
 Dataset contains 1134890 nodes and 2987624 edges RedisGraph Titan OrientDB Neo4j 0 12.5 25 37.5 50 1.46 6.15 42.82 0.05
  • 73. Benchmarks 73 Query Workload FindShortestPath (FS)
 Finds the shortest path between the first node and 100 randomly picked nodes. All the measurements are in seconds
 Dataset contains 1134890 nodes and 2987624 edges RedisGraph Titan OrientDB Neo4j 0 7.5 15 22.5 30 0.08 23.47 24.87 0.001