SlideShare a Scribd company logo
Graph Processing
  Applications
praveensripati@gmail.com

www.thecloudavenue.com

    @praveensripati
Agenda

Introduction to Graphs

     Representing graphs

     Different types of graphs

     Algorithms in graphs

What constitutes a graph application

     Graph databases (examples and how they work)

     Graph computing engines (examples and how they work)

Questions & Answers
What are/aren't Graphs in this context?




         YES                   NO
How is a graph represented?
                                               4




                 1              2              3              6



                                                                               Vertex

                                                   5
                                                                      Edge

A collection of vertices connected to each other using edges, with both vertices and edges
having properties. A vertex can be a person, place, account or any item which needs to be
tracked.
W
                                                                                  Sh hom

                           n ds
                                ?      A social graph                               ee s
                                                                                      ta ho
                                                                                        l t ul
                                                                                           o d
                      f rie                                                                 be I r
                 's                                                                           fri eco
              run                                                Deepak
                                                                                                 en m
        reA                                                                                        ds m
    h oa                                                            4                                wi en
W                                                                                                      th d
                                                                                                         ?

                                             Friend              Relative
                                    Friend                                   Friend




                                                        Friend
                               1               2                     3      Bob       6   Sheetal
      Name:Arun                               Tom
       Age : 25
       Sex : M                                                    Friend Relation : Collegue
                                             Collegue
                                                                                                       Vertex
                                                                        5
                                                                                                Edge
Properties                                                         Prajval
Facebook Recruiting Competition
                     @
                 w           The challenge is to recommend missing links in a social
              vie
         inter ok?           network. Participants will be presented with an external
    t an cebo                anonymized, directed social graph (no, not Facebook, keep
  an Fa                      guessing) from which some edges have been deleted, and
W
                             asked to make ranked predictions for each user in the test set
                             of which other users they would want to follow.

                                             What is Kaggle?
                         4                   Kaggle is an innovative solution for
                                             statistical/analytics outsourcing. We are the
                                             leading platform for predictive modeling
                                             competitions. Companies, governments and
 1            2          3            6      researchers present datasets and problems - the
                                             world's best data scientists then compete to
                                             produce the best solutions. At the end of a
                                             competition, the competition host pays prize
                                             money in exchange for the intellectual property
                         5
                                             behind the winning model.

                               http://www.kaggle.com/c/FacebookRecruiting
I
                                                                           th wou
                   r tes
                        t
                 ho een ta?
                                A spatial graph                              e
                                                                                pl ld l
                                                                                  a
               s                                                             sh ce ike
           t he etw lcut                                                         or s, to
       t is e b Ca                                  New Delhi                      te wh co
                                                                                     st ic v
    ha tanc and                                                                         pa h er
   W is re
     D alo                                                4                               th is all
       g                                                                                    ? th
                                                                                                 e
   B an                         450 km
                                                                      600 km
                                                     250 km

                              350 km            450 km
                          1              2                 3 Lucknow      6    Kolkotta
   Name:Bangalore                      Mumbai
Populataion : 25,00,000                                  850 km
 Area : 35,000 SqKm                                                Distance : 700 km
                                                                                              Vertex
                                  800 km
                                                              5
                                                                                       Edge
      Properties                                         Chennai
How to represent a Graph for computing?
                                                                            3, 6
.... as an adjacency list for sparse graph                              4

1 -> 2,4,5
2 -> 3
3 -> 5                                  2, 4, 5           3                     5
4 -> 3.6
5 ->                                         1            2             3             6
6 -> 5
                                                                                      5
.... as an adjacency matrix for dense graph

       1     2    3     4     5    6
                                                                            5
  1    0     1    0     1     1    0
  2    0     0    1     0     0    0              A graph with few edges is sparse,
                                                       many edges is dense.
  3    0     0    0     0     1    0
  4    0     0    1     0     0    0
  5    0     0    0     0     0    0              Obviously, the web with billions
                                                  of pages cannot be represented
  6    0     0    0     0     1    0                   as an adjaceny matrix.
Different Graphs

 Social graph (Facebook, LinkedIn etc)

 Spacial graph (Google Maps, MapQuest, FedEx etc)

 Web graph (PageRank, Recomendations etc)

 Computer network graph (Optimal network layout
etc)

 Financial graph (Fraud detection, Currency Flow
etc)

 Data representations (Lists etc)

 Chemistry (to represent genomes/molucules)

 And others
Some of the Graph Algorithms

    Shortest path (Finding the shortest path from A to B)

    Minimal Spanning Tree (Cheapest way to connect objects, so that each
    object is connected to another – can be used in internet, cable wiring etc)





    Graph center (placing a warehouse, hospital in a city, so that all the
    locations can be reached easily)

    Bipartite Matching (Matching in a dating site, job to employee and others)

    Finding Planar Graph (as in the case of circuit designs).

                      http://www.graph-magics.com/practic_use.php
Graph Applications


                  Applications




                                                  Hama
                                   Giraph



Graph Databases                  Graph processing frameworks
How to store a Graph?
                                      Sim
                                      an ple, b
                                        de
Option 1 : In a flat file as               asy ut no
                                                to t effi
                                                  ma cie
       1- 4,5,6                                      inta nt
                                                          in.
       4- 2,5,6

Where vertex 1 is connected to vertex 4,5,6 and so on



Option 2 : In a relational database using referencing
tables or join tables.



Option 3 : Using a specialized database designed only
and only for graphs.
Comparing Graph with Relational DB
                 ld
             wou ring
        one r sto
    ich fer fo ata?
Wh pre h d              In a DB of 1,000,000 users finding friends-of-friends
          p
y ou Gra                         for 1,000 users at various depths.


     Depth                             Execution Time – MySQL             Execution Time –Neo4j
     2                                 0.016                              0.010
     3                                 30.267                             0.168
     4                                 1,543.505                          1.359
     5                                 Not Finished in 1 Hour             2.132




              http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
So, what is a Graph DB?
A graph database is any storage system that
provides `index free adjacency`.                                          3, 6
                                                                     4



                                       2, 4, 5          3                    5
                                          1             2             3              6

                                                                                 5



                                                                         5
Every element (node or edge) has a direct pointer to it's adjacent element.

No Index lookup : We can determine which vertex is adjacent wo which other vertex
without lookup an index-tree.
So, what is a Graph DB? (.....)

                      n
                 p tio s.
           th e o raph
         is g g
    h DB istin
         s
 rap per
G en
 wh
So, what is a Graph DB? (.....)


                          Key Value Store like Amazon Dynamo.
Data Size




                                     Columnar Databases like Cassandra, HBase.


                                               Document Databases like MongoDB,
                                               CouchDB..

                                                        Graph Databases like Neo4J
                            ily
                            m
                          fa
                        L
                      Q
                    oS
                    N
                t he




                                  Data Complexity
             of
             rt
            Pa
Graph DB Bindings (~JDBC API)
//connect to the database
//begin transaction

Node firstNode;
Node secondNode;
Relationship relationship;

firstNode = graphDb.createNode();
firstNode.setProperty( "message", "Hello, " );
secondNode = graphDb.createNode();
secondNode.setProperty( "message", "World!" );

relationship = firstNode.createRelationshipTo( secondNode,
RelTypes.KNOWS );
relationship.setProperty( "message", "brave Neo4j " );

//end the transaction
//close the connection to the database


           http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
Graph Adhoc Query (~SQL)

START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof



 john                    fof
 Node[4]{name:"John"}    Node[2]{name:"Maria"}
 Node[4]{name:"John"}    Node[3]{name:"Steve"}




                  http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Different Graph Databases
                                                      FlockDB from
                                                      Twitter

                           Allegrograph



GraphBase




                                                   From
                                                   Objectivity




     http://en.wikipedia.org/wiki/Graph_database
What is a Graph Computing Engine?

 Algorithms




                 Graph Computing                                     OutputFormat
                 Engine                                             Output Location




                 Graph engines come with some built-in graph
 InputFormat     processing algorithms, but also provide an easy to use
Input Location   API to build new algorithms and extend the framework.

                 http://incubator.apache.org/giraph/apidocs/index.html
                 http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
Different Graph Computing Engines

Memory based graphs like (graph size < local machine ram)
     - jung.sourceforge.net
     - igraph.sourceforge.net
     - metworkx.lanl.gov

Disk based graphs like (graph size < local hard disk size)
       - Neo4j
       - Infinite Graph – objectivity.com
       - sparsity-technologies.com/dex

Cluster based graphs like (depends on the cluster specs)
                                                                                            l
       - Apache Hama                                                                     de
                                                                                       mo l
       - Apache Giraph                                                        SP llel) ege
                                                                             B a r
       - GoldenORB
                                                                      d  on Par le p
                                                                    se ous oog
                                                                 Ba ron f G
                                                                    h      o
                                                                y nc pirit
                                                           l k S he s
                                                       ( Bu in t
Bulk Synchronous Parallel

Some quick facts

• An alternate computing model to MapReduce (Not all problems can be solved with
  MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and
  vice versa.

  Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the
  Pregel Paper (extensively used for PageRank)

  Good for

  - Processing big data with complicated relationships, eg., graph and networks.
  - Iterative and Recursive scientific computations
  - Continious Event Processing (CEP)




         http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
                         http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
What is Bulk Synchronous Parallel?


                                                                       Super Step 1



                                                                       Super Step 2




                                                                       Super Step 3




            http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/
    http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
Hama vs Giraph
                        Derived                           Derived

                                Google Pregel **


                                                            Giraph


                  Hama                                        BSP


                   BSP                                  MapReduce



                                       HDFS

** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Hama vs Giraph (.....)

                    Hama                                                    Giraph
Pure BSP engine.                                     Uses BSP, but BSP API is not exposed.
Matrix, Graph, Network and other                     Just for Graph processing.
procesing.
Jobs are run as a BSP Job on HDFS.                   Jobs as run as MapReduce on Hadoop.

Both of them are derived from on `Pregel : A System for Large-Scale Graph
Processing` paper published by Google. Both have been recently promoted from
Incubator to Apache Top Level Project.
Both of them have a few graph algorithms implemented and also provide a very easy
API to implement new Graph algorithms.




        ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
Page Rank in Hama

           PageRank Algorithm assigns numerical
           weightage to each element of a hyperlinked set of
           documents

           .
           bin/hama jar ../hama-0.4.0-examples.jar pagerank
           <input path> <output path> [damping factor]
           [epsilon error] [tasks]


           Input                        Output

           Site1tSite2tSite3          Site1 0.5
           Site2tSite3                 Site2 1.3
           Site3                        Site3 1.2




 http://wiki.apache.org/hama/PageRank
What's next?
Deep dive into

       - Both Graph databases and frameworks with a Demo.
       - Bulk Syncronous Parallel procssing model.




Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and
Databases are emerging and are an easy entry to contribute to in Apache.

Would suggest to subscribe/follow the mailing lists in Apache and try to get
familiar and contribute to them.
Q&A
Graph Processing Applications @ HUG

More Related Content

PPT
All about drawing Graphs
PPTX
Big deal big data
PPT
Where does hadoop come handy
PDF
Graph Processing with Titan and Scylla
PDF
Faunus: Graph Analytics Engine
PDF
Commonwealth Caribbean Criminal Practice and Procedure
PPTX
PDF
Graph Processing with Apache TinkerPop
All about drawing Graphs
Big deal big data
Where does hadoop come handy
Graph Processing with Titan and Scylla
Faunus: Graph Analytics Engine
Commonwealth Caribbean Criminal Practice and Procedure
Graph Processing with Apache TinkerPop

Viewers also liked (15)

PPT
Domain and range
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
PDF
Quantum Processes in Graph Computing
PPTX
Visual Mapping of Clickstream Data
PPTX
Reading Graphs & Charts
PPT
Cataloging of nonbook materials edited
PPTX
Interpreting charts and graphs
PPTX
Writing Objectives & Problem Statements
PPT
Dictionary Skills
PDF
Titan: The Rise of Big Graph Data
PDF
Titan: Big Graph Data with Cassandra
PPT
17. Trees and Graphs
PPT
Describing graphs
PDF
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
 
PPSX
Writing research objectives
Domain and range
Graphs are everywhere! Distributed graph computing with Spark GraphX
Quantum Processes in Graph Computing
Visual Mapping of Clickstream Data
Reading Graphs & Charts
Cataloging of nonbook materials edited
Interpreting charts and graphs
Writing Objectives & Problem Statements
Dictionary Skills
Titan: The Rise of Big Graph Data
Titan: Big Graph Data with Cassandra
17. Trees and Graphs
Describing graphs
2014 Threat Detection Checklist: Six ways to tell a criminal from a customer
 
Writing research objectives
Ad

Similar to Graph Processing Applications @ HUG (20)

PDF
Undirected graphs
PDF
Kickoff research project TU Ilmenau
KEY
Graphs in the Database: Rdbms In The Social Networks Age
KEY
Analyzing FEC Data with NEO4J
PDF
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
PDF
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
KEY
Neo4j & (J) Ruby Presentation JRubyConf.EU
PDF
Graphs - Chris Dixon & Matt Gattis
PDF
Neo4j -- or why graph dbs kick ass
PDF
NoSQL with Graphs
PDF
On the Spectral Evolution of Large Networks (PhD Thesis by Jérôme Kunegis)
KEY
Spring Data Neo4j Intro SpringOne 2012
PDF
Network analysis methods for assessment & measurement
PPTX
Network Analysis (SNA/ONA) Methods for Assessment & Measurement
PDF
Gephi short introduction
PDF
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
PPTX
GRAPH THEORY OF NUMBER THEOREM IN DISCRETE MATH
PDF
Eifrem neo4j
PDF
Networking: City University London Researchers' Development Day
PDF
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Undirected graphs
Kickoff research project TU Ilmenau
Graphs in the Database: Rdbms In The Social Networks Age
Analyzing FEC Data with NEO4J
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
Neo4j & (J) Ruby Presentation JRubyConf.EU
Graphs - Chris Dixon & Matt Gattis
Neo4j -- or why graph dbs kick ass
NoSQL with Graphs
On the Spectral Evolution of Large Networks (PhD Thesis by Jérôme Kunegis)
Spring Data Neo4j Intro SpringOne 2012
Network analysis methods for assessment & measurement
Network Analysis (SNA/ONA) Methods for Assessment & Measurement
Gephi short introduction
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
GRAPH THEORY OF NUMBER THEOREM IN DISCRETE MATH
Eifrem neo4j
Networking: City University London Researchers' Development Day
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Getting Started with Data Integration: FME Form 101
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Tartificialntelligence_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Getting Started with Data Integration: FME Form 101
20250228 LYD VKU AI Blended-Learning.pptx
Group 1 Presentation -Planning and Decision Making .pptx
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)

Graph Processing Applications @ HUG

  • 1. Graph Processing Applications [email protected] www.thecloudavenue.com @praveensripati
  • 2. Agenda Introduction to Graphs Representing graphs Different types of graphs Algorithms in graphs What constitutes a graph application Graph databases (examples and how they work) Graph computing engines (examples and how they work) Questions & Answers
  • 3. What are/aren't Graphs in this context? YES NO
  • 4. How is a graph represented? 4 1 2 3 6 Vertex 5 Edge A collection of vertices connected to each other using edges, with both vertices and edges having properties. A vertex can be a person, place, account or any item which needs to be tracked.
  • 5. W Sh hom n ds ? A social graph ee s ta ho l t ul o d f rie be I r 's fri eco run Deepak en m reA ds m h oa 4 wi en W th d ? Friend Relative Friend Friend Friend 1 2 3 Bob 6 Sheetal Name:Arun Tom Age : 25 Sex : M Friend Relation : Collegue Collegue Vertex 5 Edge Properties Prajval
  • 6. Facebook Recruiting Competition @ w The challenge is to recommend missing links in a social vie inter ok? network. Participants will be presented with an external t an cebo anonymized, directed social graph (no, not Facebook, keep an Fa guessing) from which some edges have been deleted, and W asked to make ranked predictions for each user in the test set of which other users they would want to follow. What is Kaggle? 4 Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for predictive modeling competitions. Companies, governments and 1 2 3 6 researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property 5 behind the winning model. http://www.kaggle.com/c/FacebookRecruiting
  • 7. I th wou r tes t ho een ta? A spatial graph e pl ld l a s sh ce ike t he etw lcut or s, to t is e b Ca New Delhi te wh co st ic v ha tanc and pa h er W is re D alo 4 th is all g ? th e B an 450 km 600 km 250 km 350 km 450 km 1 2 3 Lucknow 6 Kolkotta Name:Bangalore Mumbai Populataion : 25,00,000 850 km Area : 35,000 SqKm Distance : 700 km Vertex 800 km 5 Edge Properties Chennai
  • 8. How to represent a Graph for computing? 3, 6 .... as an adjacency list for sparse graph 4 1 -> 2,4,5 2 -> 3 3 -> 5 2, 4, 5 3 5 4 -> 3.6 5 -> 1 2 3 6 6 -> 5 5 .... as an adjacency matrix for dense graph 1 2 3 4 5 6 5 1 0 1 0 1 1 0 2 0 0 1 0 0 0 A graph with few edges is sparse, many edges is dense. 3 0 0 0 0 1 0 4 0 0 1 0 0 0 5 0 0 0 0 0 0 Obviously, the web with billions of pages cannot be represented 6 0 0 0 0 1 0 as an adjaceny matrix.
  • 9. Different Graphs Social graph (Facebook, LinkedIn etc) Spacial graph (Google Maps, MapQuest, FedEx etc) Web graph (PageRank, Recomendations etc) Computer network graph (Optimal network layout etc) Financial graph (Fraud detection, Currency Flow etc) Data representations (Lists etc) Chemistry (to represent genomes/molucules) And others
  • 10. Some of the Graph Algorithms  Shortest path (Finding the shortest path from A to B)  Minimal Spanning Tree (Cheapest way to connect objects, so that each object is connected to another – can be used in internet, cable wiring etc)  Graph center (placing a warehouse, hospital in a city, so that all the locations can be reached easily)  Bipartite Matching (Matching in a dating site, job to employee and others)  Finding Planar Graph (as in the case of circuit designs). http://www.graph-magics.com/practic_use.php
  • 11. Graph Applications Applications Hama Giraph Graph Databases Graph processing frameworks
  • 12. How to store a Graph? Sim an ple, b de Option 1 : In a flat file as asy ut no to t effi ma cie 1- 4,5,6 inta nt in. 4- 2,5,6 Where vertex 1 is connected to vertex 4,5,6 and so on Option 2 : In a relational database using referencing tables or join tables. Option 3 : Using a specialized database designed only and only for graphs.
  • 13. Comparing Graph with Relational DB ld wou ring one r sto ich fer fo ata? Wh pre h d In a DB of 1,000,000 users finding friends-of-friends p y ou Gra for 1,000 users at various depths. Depth Execution Time – MySQL Execution Time –Neo4j 2 0.016 0.010 3 30.267 0.168 4 1,543.505 1.359 5 Not Finished in 1 Hour 2.132 http://www.neotechnology.com/2012/06/how-much-faster-is-a-graph-database-really/
  • 14. So, what is a Graph DB? A graph database is any storage system that provides `index free adjacency`. 3, 6 4 2, 4, 5 3 5 1 2 3 6 5 5 Every element (node or edge) has a direct pointer to it's adjacent element. No Index lookup : We can determine which vertex is adjacent wo which other vertex without lookup an index-tree.
  • 15. So, what is a Graph DB? (.....) n p tio s. th e o raph is g g h DB istin s rap per G en wh
  • 16. So, what is a Graph DB? (.....) Key Value Store like Amazon Dynamo. Data Size Columnar Databases like Cassandra, HBase. Document Databases like MongoDB, CouchDB.. Graph Databases like Neo4J ily m fa L Q oS N t he Data Complexity of rt Pa
  • 17. Graph DB Bindings (~JDBC API) //connect to the database //begin transaction Node firstNode; Node secondNode; Relationship relationship; firstNode = graphDb.createNode(); firstNode.setProperty( "message", "Hello, " ); secondNode = graphDb.createNode(); secondNode.setProperty( "message", "World!" ); relationship = firstNode.createRelationshipTo( secondNode, RelTypes.KNOWS ); relationship.setProperty( "message", "brave Neo4j " ); //end the transaction //close the connection to the database http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded-hello-world.html
  • 18. Graph Adhoc Query (~SQL) START john=node:node_auto_index(name = 'John') MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof john fof Node[4]{name:"John"} Node[2]{name:"Maria"} Node[4]{name:"John"} Node[3]{name:"Steve"} http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
  • 19. Different Graph Databases FlockDB from Twitter Allegrograph GraphBase From Objectivity http://en.wikipedia.org/wiki/Graph_database
  • 20. What is a Graph Computing Engine? Algorithms Graph Computing OutputFormat Engine Output Location Graph engines come with some built-in graph InputFormat processing algorithms, but also provide an easy to use Input Location API to build new algorithms and extend the framework. http://incubator.apache.org/giraph/apidocs/index.html http://incubator.apache.org/hama/docs/r0.3.0/api/index.html
  • 21. Different Graph Computing Engines Memory based graphs like (graph size < local machine ram) - jung.sourceforge.net - igraph.sourceforge.net - metworkx.lanl.gov Disk based graphs like (graph size < local hard disk size) - Neo4j - Infinite Graph – objectivity.com - sparsity-technologies.com/dex Cluster based graphs like (depends on the cluster specs) l - Apache Hama de mo l - Apache Giraph SP llel) ege B a r - GoldenORB d on Par le p se ous oog Ba ron f G h o y nc pirit l k S he s ( Bu in t
  • 22. Bulk Synchronous Parallel Some quick facts • An alternate computing model to MapReduce (Not all problems can be solved with MapReduce efficiently). Also, any MR algorithm can be simulated on BSP and vice versa. Developed by Leslie Valinat during the 1980s. Was resurrected by Google in the Pregel Paper (extensively used for PageRank) Good for - Processing big data with complicated relationships, eg., graph and networks. - Iterative and Recursive scientific computations - Continious Event Processing (CEP) http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html http://arxiv.org/abs/1203.2081 – Comparing MR vs BSP
  • 23. What is Bulk Synchronous Parallel? Super Step 1 Super Step 2 Super Step 3 http://en.wikipedia.org/wiki/Bulk_synchronous_parallel/ http://blog.octo.com/en/introduction-to-large-scale-graph-processing/
  • 24. Hama vs Giraph Derived Derived Google Pregel ** Giraph Hama BSP BSP MapReduce HDFS ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 25. Hama vs Giraph (.....) Hama Giraph Pure BSP engine. Uses BSP, but BSP API is not exposed. Matrix, Graph, Network and other Just for Graph processing. procesing. Jobs are run as a BSP Job on HDFS. Jobs as run as MapReduce on Hadoop. Both of them are derived from on `Pregel : A System for Large-Scale Graph Processing` paper published by Google. Both have been recently promoted from Incubator to Apache Top Level Project. Both of them have a few graph algorithms implemented and also provide a very easy API to implement new Graph algorithms. ** http://googleresearch.blogspot.in/2009/06/large-scale-graph-computing-at-google.html
  • 26. Page Rank in Hama PageRank Algorithm assigns numerical weightage to each element of a hyperlinked set of documents . bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks] Input Output Site1tSite2tSite3 Site1 0.5 Site2tSite3 Site2 1.3 Site3 Site3 1.2 http://wiki.apache.org/hama/PageRank
  • 27. What's next? Deep dive into - Both Graph databases and frameworks with a Demo. - Bulk Syncronous Parallel procssing model. Hadoop, Hive, Pig and others are too crowded. Graph Frameworks and Databases are emerging and are an easy entry to contribute to in Apache. Would suggest to subscribe/follow the mailing lists in Apache and try to get familiar and contribute to them.
  • 28. Q&A